Summary
NumSharp contains approximately ~2,700 NPTypeCode switch/case occurrences across 66 files, resulting in ~5,700 lines of repetitive type-dispatched code. This issue tracks the migration of these patterns to IL-generated kernels, reducing code size, improving maintainability, and enabling SIMD optimization.
Problem Statement
The current codebase uses extensive switch (typecode) { case NPTypeCode.X: ... } patterns to handle NumSharp's 12 supported types:
Boolean, Byte, Int16, UInt16, Int32, UInt32, Int64, UInt64, Char, Single, Double, Decimal
This results in:
- Code bloat: 12 nearly-identical branches per operation
- Maintenance burden: Changes must be replicated across all type branches
- Regen dependency: Many files use
#if _REGEN template generation
- Missed SIMD opportunities: Scalar loops where vectorization is possible
High-Impact Files
| File |
NPTypeCode Cases |
Category |
Utilities/Converts.cs |
516 |
Type Conversion |
UnmanagedMemoryBlock.Casting.cs |
342 |
Type Casting |
Utilities/ArrayConvert.cs |
221 |
Array Conversion |
Backends/NPTypeCode.cs |
161 |
Extension Methods |
Unmanaged/ArraySlice.cs |
130 |
Slice Operations |
DefaultEngine.ReductionOp.cs |
69 |
Reductions |
Default.ClipNDArray.cs |
66 |
Clip with NDArray |
UnmanagedStorage.cs |
52 |
Storage Operations |
Migration Priority
P0: Type Casting (Est. 4000 LOC reduction)
UnmanagedMemoryBlock.Casting.cs - 12×12 nested switch, 291 for-loops
ArrayConvert.cs - 12×12 nested switch, 172 for-loops
- Target: Single IL kernel per type-pair, SIMD widening/narrowing
P1: Indexing Operations (Est. 600 LOC reduction)
NDArray.Indexing.Selection.Getter.cs - 12-type dispatch
NDArray.Indexing.Selection.Setter.cs - 12-type dispatch
- Target: IL gather/scatter kernels
P2: Math Operations (Est. 400 LOC reduction)
np.linspace.cs - 12 per-type loops → IL sequence generation with SIMD
np.repeat.cs - 12 per-type loops → IL fill kernel with SIMD
np.all.cs / np.any.cs axis path → IL axis reduction with early-exit
P3: Reduction Fallbacks (Est. 200 LOC reduction)
Default.Reduction.CumAdd.cs - 10-type fallback switch
Default.Reduction.CumMul.cs - 10-type fallback switch
P4: Dispatch Cleanup (Est. 500 LOC reduction)
Files that already have IL kernels but retain verbose type dispatch:
Default.Clip.cs - 3 × 11-type switches
Default.ClipNDArray.cs - 6 × 11-type switches
DefaultEngine.BinaryOp.cs / UnaryOp.cs / CompareOp.cs - Scalar dispatch chains
Success Metrics
| Metric |
Before |
Target |
| NPTypeCode switch cases |
~2,700 |
<500 |
| Lines of type-dispatch code |
~5,700 |
~1,000 |
| Regen template files |
~20 |
~5 |
| SIMD coverage for casting |
0% |
80%+ |
Implementation Approach
// Before: 144 separate loop implementations
case NPTypeCode.Int32:
var src = (int*)source.Address;
switch (outType) {
case NPTypeCode.Double:
for (int i = 0; i < len; i++) dst[i] = (double)src[i];
break;
// ... 11 more
}
break;
// ... 11 more input types
// After: Single IL-generated kernel
var kernel = ILKernelGenerator.GetCastKernel(srcType, dstType);
kernel(srcPtr, dstPtr, count);
Files to Skip
| File |
Reason |
np.random.shuffle.cs |
Random access patterns defeat SIMD |
np.random.randint.cs |
RNG is bottleneck, not type dispatch |
MultiIterator.cs |
Iterator infrastructure, type dispatch acceptable |
NPTypeCode.cs |
Extension methods, not compute loops |
Converts.cs |
Low-level converters called from IL |
Related
- Generic Math Migration (
docs/GENERIC_MATH_DESIGN.md)
- Full analysis:
docs/ISSUE_IL_MIGRATION.md
Summary
NumSharp contains approximately ~2,700 NPTypeCode switch/case occurrences across 66 files, resulting in ~5,700 lines of repetitive type-dispatched code. This issue tracks the migration of these patterns to IL-generated kernels, reducing code size, improving maintainability, and enabling SIMD optimization.
Problem Statement
The current codebase uses extensive
switch (typecode) { case NPTypeCode.X: ... }patterns to handle NumSharp's 12 supported types:This results in:
#if _REGENtemplate generationHigh-Impact Files
Utilities/Converts.csUnmanagedMemoryBlock.Casting.csUtilities/ArrayConvert.csBackends/NPTypeCode.csUnmanaged/ArraySlice.csDefaultEngine.ReductionOp.csDefault.ClipNDArray.csUnmanagedStorage.csMigration Priority
P0: Type Casting (Est. 4000 LOC reduction)
UnmanagedMemoryBlock.Casting.cs- 12×12 nested switch, 291 for-loopsArrayConvert.cs- 12×12 nested switch, 172 for-loopsP1: Indexing Operations (Est. 600 LOC reduction)
NDArray.Indexing.Selection.Getter.cs- 12-type dispatchNDArray.Indexing.Selection.Setter.cs- 12-type dispatchP2: Math Operations (Est. 400 LOC reduction)
np.linspace.cs- 12 per-type loops → IL sequence generation with SIMDnp.repeat.cs- 12 per-type loops → IL fill kernel with SIMDnp.all.cs/np.any.csaxis path → IL axis reduction with early-exitP3: Reduction Fallbacks (Est. 200 LOC reduction)
Default.Reduction.CumAdd.cs- 10-type fallback switchDefault.Reduction.CumMul.cs- 10-type fallback switchP4: Dispatch Cleanup (Est. 500 LOC reduction)
Files that already have IL kernels but retain verbose type dispatch:
Default.Clip.cs- 3 × 11-type switchesDefault.ClipNDArray.cs- 6 × 11-type switchesDefaultEngine.BinaryOp.cs/UnaryOp.cs/CompareOp.cs- Scalar dispatch chainsSuccess Metrics
Implementation Approach
Files to Skip
np.random.shuffle.csnp.random.randint.csMultiIterator.csNPTypeCode.csConverts.csRelated
docs/GENERIC_MATH_DESIGN.md)docs/ISSUE_IL_MIGRATION.md