Overview
Replace legacy Marshal.AllocHGlobal/FreeHGlobal with the modern NativeMemory API (.NET 6+) across all unmanaged allocation sites, enabling aligned allocation for future SIMD vectorization and zero-initialized allocation for np.zeros.
Problem
All unmanaged memory allocation in NumSharp goes through Marshal.AllocHGlobal/FreeHGlobal at 5 call sites in 2 files:
| File |
Call |
Purpose |
UnmanagedMemoryBlock1.cs:31` |
Marshal.AllocHGlobal(new IntPtr(bytes)) |
Primary array allocation |
UnmanagedMemoryBlock1.cs:995` |
Marshal.FreeHGlobal(Address) |
Deallocation in Disposer |
StackedMemoryPool.cs:90 |
Marshal.AllocHGlobal(SingleSize) |
Pool overflow allocation |
StackedMemoryPool.cs:169 |
Marshal.FreeHGlobal(addr) |
Pool cleanup |
StackedMemoryPool.cs:238 |
individualyAllocated.ForEach(Marshal.FreeHGlobal) |
Pool disposal |
Marshal.AllocHGlobal wraps LocalAlloc on Windows (Win32 legacy) and malloc on Unix — no alignment guarantees beyond platform default (8 or 16 bytes), no zero-init option, and IntPtr return type requires casting.
Proposal
Replace with equivalent NativeMemory calls:
// Drop-in replacement:
var ptr = (IntPtr)NativeMemory.Alloc((nuint)bytes);
NativeMemory.Free((void*)ptr);
// Aligned (enables future SIMD):
var ptr = (IntPtr)NativeMemory.AlignedAlloc((nuint)bytes, alignment: 32);
NativeMemory.AlignedFree((void*)ptr);
// Zero-initialized (optimized np.zeros):
var ptr = (IntPtr)NativeMemory.AllocZeroed((nuint)bytes);
Evidence
NativeMemory.AlignedAlloc allows 32-byte (AVX2) or 64-byte (AVX-512) alignment — prerequisite for SIMD vectorization of arithmetic loops
NativeMemory.AllocZeroed delegates to calloc / OS zero-page mapping — potentially faster than Alloc + manual Unsafe.InitBlock
NativeMemory.Alloc returns void* directly, avoiding IntPtr round-trip in a codebase that immediately casts to T*
- The
Disposer class already dispatches on AllocationType enum — clean extension point
Scope / Non-goals
- In scope: Replace 5 allocation sites, add benchmarks, optional
AllocZeroed fast path
- Not in scope: SIMD vectorization of arithmetic loops (separate effort), changing
StackedMemoryPool pooling strategy, NativeMemory.AlignedRealloc
Benchmark / Performance
Must benchmark before merging. The allocation hot path affects every NDArray creation.
| Benchmark |
What to measure |
| Allocation throughput |
NativeMemory.Alloc vs Marshal.AllocHGlobal at small (<1KB), medium (1KB-1MB), large (>1MB) sizes |
| Aligned overhead |
AlignedAlloc(32) vs Alloc for the same sizes |
| Zero-init |
AllocZeroed vs Alloc + Unsafe.InitBlock / Span.Clear |
| Pool interaction |
StackedMemoryPool.Take/Return with both APIs |
| End-to-end |
np.arange(N), np.zeros(N), a + b for representative array sizes |
Breaking changes
None — internal implementation detail, no public API changes.
Related issues
Overview
Replace legacy
Marshal.AllocHGlobal/FreeHGlobalwith the modernNativeMemoryAPI (.NET 6+) across all unmanaged allocation sites, enabling aligned allocation for future SIMD vectorization and zero-initialized allocation fornp.zeros.Problem
All unmanaged memory allocation in NumSharp goes through
Marshal.AllocHGlobal/FreeHGlobalat 5 call sites in 2 files:UnmanagedMemoryBlock1.cs:31`Marshal.AllocHGlobal(new IntPtr(bytes))UnmanagedMemoryBlock1.cs:995`Marshal.FreeHGlobal(Address)DisposerStackedMemoryPool.cs:90Marshal.AllocHGlobal(SingleSize)StackedMemoryPool.cs:169Marshal.FreeHGlobal(addr)StackedMemoryPool.cs:238individualyAllocated.ForEach(Marshal.FreeHGlobal)Marshal.AllocHGlobalwrapsLocalAllocon Windows (Win32 legacy) andmallocon Unix — no alignment guarantees beyond platform default (8 or 16 bytes), no zero-init option, andIntPtrreturn type requires casting.Proposal
Replace with equivalent
NativeMemorycalls:Marshal.AllocHGlobal→NativeMemory.AllocinUnmanagedMemoryBlock1.cs`Marshal.FreeHGlobal→NativeMemory.FreeinDisposerStackedMemoryPool.cs(3 sites)AllocationTypeenum if needed (new variant or replaceAllocHGlobalwholesale)AllocZeroedfast path fornp.zeros/np.zeros_likeNumSharp.BenchmarkEvidence
NativeMemory.AlignedAllocallows 32-byte (AVX2) or 64-byte (AVX-512) alignment — prerequisite for SIMD vectorization of arithmetic loopsNativeMemory.AllocZeroeddelegates tocalloc/ OS zero-page mapping — potentially faster thanAlloc+ manualUnsafe.InitBlockNativeMemory.Allocreturnsvoid*directly, avoidingIntPtrround-trip in a codebase that immediately casts toT*Disposerclass already dispatches onAllocationTypeenum — clean extension pointScope / Non-goals
AllocZeroedfast pathStackedMemoryPoolpooling strategy,NativeMemory.AlignedReallocBenchmark / Performance
Must benchmark before merging. The allocation hot path affects every
NDArraycreation.NativeMemory.AllocvsMarshal.AllocHGlobalat small (<1KB), medium (1KB-1MB), large (>1MB) sizesAlignedAlloc(32)vsAllocfor the same sizesAllocZeroedvsAlloc+Unsafe.InitBlock/Span.ClearStackedMemoryPool.Take/Returnwith both APIsnp.arange(N),np.zeros(N),a + bfor representative array sizesBreaking changes
None — internal implementation detail, no public API changes.
Related issues
NativeMemoryrequires net6.0+)