[Core] NumSharp ndim support up to int.MaxValue vs 64 of numpy (NPY_MAXDIMS=64)

## Overview

NumPy explicitly limits arrays to 64 dimensions via `NPY_MAXDIMS = 64` (bumped from 32 in NumPy 2.0). NumSharp has no explicit limit, but investigation reveals implicit limits from `stackalloc` usage that cap practical ndim at **~385,000 dimensions**.

This issue tracks whether NumSharp should:
1. Keep the implicit ~385K limit (6,000x more than NumPy)
2. Add an explicit `MAXDIMS` constant matching NumPy's 64
3. Remove stackalloc bottlenecks to support higher ndim

## Why NumPy Limits to 64 Dimensions

**Short answer**: Historical/practical reasons, not fundamental necessity. NumPy wants to eventually **remove the limit entirely**.

### The Real Reasons

1. **Stack allocation for scratch space** - NumPy's C code uses fixed-size arrays on the stack:
   ```c
   npy_intp shape[NPY_MAXDIMS];  // Static allocation
   ```
   This is fast but requires knowing max size at compile time.

2. **ABI stability** - Changing `NPY_MAXDIMS` breaks binary compatibility with compiled extensions. The iterator macros are particularly problematic.

3. **Historical accident** - Original limit was 32 (speculation: `2*32` for 32-bit systems). Bumped to 64 in NumPy 2.0 for 64-bit era.

4. **Sentinel value abuse** - `axis=MAXDIMS` was used internally to mean `axis=None`. NumPy 2.0 introduced `NPY_AXIS_RAVEL` to fix this.

### NumPy's Long-Term Goal

From the [PR #25149](https://github.com/numpy/numpy/pull/25149) discussion:
> "The long-term goal is to remove the limit completely (i.e. remove NPY_MAXDIMS as a public constant)"

They're being pragmatic - some iterator paths are still limited to 32 dims due to legacy code.

### Implication for NumSharp

NumSharp's ~385K limit from `stackalloc` is actually the **same pattern** as NumPy's `NPY_MAXDIMS` - both are stack allocation limits. The difference:
- NumPy: Explicit compile-time constant (64)
- NumSharp: Implicit runtime stack size (~385K)

If NumPy's goal is to remove limits entirely, NumSharp could do the same by replacing `stackalloc` with heap/pooled allocations.

## Current Implicit Limits in NumSharp

### Stackalloc Bottlenecks (cause StackOverflowException)

| Location | Allocation | Tested Limit |
|----------|------------|--------------|
| `Default.Broadcasting.cs:411,463` | `stackalloc int[nd]` | **~385,600** |
| `ILKernelGenerator.Scan.cs:1499-1500` | `stackalloc int[ndim]` x2 | ~900,000 |
| `ILKernelGenerator.Reduction.Axis*.cs` | `stackalloc int[ndim-1]` | ~385,000 |
| `Default.NonZero.cs:225` | `stackalloc int[ndim-1]` | ~380,000 |
| `NDArray.Indexing.Selection.Getter.cs:582` | `stackalloc int[srcDims]` | ~385,000 |

**Bottleneck**: `AreBroadcastable()` at ~385,600 dimensions. Any broadcasting operation hits this limit.

### Other Limits (not blocking int.MaxValue for ndim specifically)

| Limit | Type | Notes |
|-------|------|-------|
| `Array.MaxLength` | 2,147,483,591 | .NET limit for `int[]` dimensions/strides arrays |
| `int size` field | Silent overflow | Uses `unchecked`, wraps at 2^31 total elements |
| `int offset` field | 2^31 addressable | Limits element addressing, not ndim |
| Stride overflow | Silent corruption | `strides[0]=0` when dims overflow (tested: 35 dims of size 2) |

## Evidence

Tested empirically:
```
ndim=385,600: AreBroadcastable OK
ndim=386,000: StackOverflowException at DefaultEngine.AreBroadcastable
```

Stack usage: 385,600 × 4 bytes = 1.54 MB (main thread stack ~1.5-2 MB on Windows)

## Comparison

| Library | Max ndim | Type |
|---------|----------|------|
| NumPy | 64 | Explicit (`NPY_MAXDIMS`) |
| NumSharp | ~385,000 | Implicit (stackalloc) |

NumSharp supports **6,000x more dimensions** than NumPy.

## Options

### Option A: Keep Current Behavior
- Pro: Already supports vastly more dims than NumPy (385K vs 64)
- Con: Implicit limit, StackOverflowException is unfriendly error

### Option B: Add Explicit MAXDIMS (like NumPy)
- Add `public const int MAXDIMS = 64;` or higher
- Validate in Shape constructor, throw `ArgumentException`
- Pro: Clear error, matches NumPy semantics
- Con: Breaking change if anyone uses >64 dims

### Option C: Remove Stackalloc Bottlenecks
- Replace `stackalloc int[nd]` with heap allocation or pooled arrays
- Pro: Supports Array.MaxLength (~2.1 billion) dimensions theoretically
- Con: Performance regression for common cases, still limited by memory

### Option D: Hybrid (Recommended)
- Add `MAXDIMS` constant but set it high (e.g., 1024 or 65536)
- Replace stackalloc with conditional: `stackalloc` for small ndim, heap for large
- Pro: Best of both worlds - fast for common cases, no arbitrary limit
- Con: More complex
- Aligns with NumPy's long-term goal of removing limits

## Recommendation

**Option D (Hybrid)** - Aligns with NumPy's stated goal to eventually remove `NPY_MAXDIMS` entirely:

1. Replace `stackalloc int[nd]` with:
   ```csharp
   Span<int> buffer = nd <= 64 ? stackalloc int[64] : new int[nd];
   ```
2. This gives stack performance for typical cases (≤64 dims) while supporting arbitrary ndim
3. No artificial limit, no StackOverflowException

## References

- [NumPy Issue #5744: Increase maximum number of array dimensions?](https://github.com/numpy/numpy/issues/5744)
- [NumPy PR #25149: API: bump MAXDIMS/MAXARGS to 64](https://github.com/numpy/numpy/pull/25149)
- [NumPy Issue #24855: C-API changes for NumPy 2.0](https://github.com/numpy/numpy/issues/24855)
- [NumPy 2.0 migration guide](https://numpy.org/devdocs/numpy_2_0_migration_guide.html)
- [NumPy NEP-53: C ABI Evolution](https://numpy.org/neps/nep-0053-c-abi-evolution.html)

Location	Allocation	Tested Limit
`Default.Broadcasting.cs:411,463`	`stackalloc int[nd]`	~385,600
`ILKernelGenerator.Scan.cs:1499-1500`	`stackalloc int[ndim]` x2	~900,000
`ILKernelGenerator.Reduction.Axis*.cs`	`stackalloc int[ndim-1]`	~385,000
`Default.NonZero.cs:225`	`stackalloc int[ndim-1]`	~380,000
`NDArray.Indexing.Selection.Getter.cs:582`	`stackalloc int[srcDims]`	~385,000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] NumSharp ndim support up to int.MaxValue vs 64 of numpy (NPY_MAXDIMS=64) #591

Overview

Why NumPy Limits to 64 Dimensions

The Real Reasons

NumPy's Long-Term Goal

Implication for NumSharp

Current Implicit Limits in NumSharp

Stackalloc Bottlenecks (cause StackOverflowException)

Other Limits (not blocking int.MaxValue for ndim specifically)

Evidence

Comparison

Options

Option A: Keep Current Behavior

Option B: Add Explicit MAXDIMS (like NumPy)

Option C: Remove Stackalloc Bottlenecks

Option D: Hybrid (Recommended)

Recommendation

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Limit	Type	Notes
`Array.MaxLength`	2,147,483,591	.NET limit for `int[]` dimensions/strides arrays
`int size` field	Silent overflow	Uses `unchecked`, wraps at 2^31 total elements
`int offset` field	2^31 addressable	Limits element addressing, not ndim
Stride overflow	Silent corruption	`strides[0]=0` when dims overflow (tested: 35 dims of size 2)

Library	Max ndim	Type
NumPy	64	Explicit (`NPY_MAXDIMS`)
NumSharp	~385,000	Implicit (stackalloc)

[Core] NumSharp ndim support up to int.MaxValue vs 64 of numpy (NPY_MAXDIMS=64) #591

Description

Overview

Why NumPy Limits to 64 Dimensions

The Real Reasons

NumPy's Long-Term Goal

Implication for NumSharp

Current Implicit Limits in NumSharp

Stackalloc Bottlenecks (cause StackOverflowException)

Other Limits (not blocking int.MaxValue for ndim specifically)

Evidence

Comparison

Options

Option A: Keep Current Behavior

Option B: Add Explicit MAXDIMS (like NumPy)

Option C: Remove Stackalloc Bottlenecks

Option D: Hybrid (Recommended)

Recommendation

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions