Modernize unmanaged allocation: Marshal.AllocHGlobal → NativeMemory

## Overview

Replace legacy `Marshal.AllocHGlobal`/`FreeHGlobal` with the modern `NativeMemory` API (.NET 6+) across all unmanaged allocation sites, enabling aligned allocation for future SIMD vectorization and zero-initialized allocation for `np.zeros`.

## Problem

All unmanaged memory allocation in NumSharp goes through `Marshal.AllocHGlobal`/`FreeHGlobal` at **5 call sites in 2 files**:

| File | Call | Purpose |
|------|------|---------|
| `UnmanagedMemoryBlock`1.cs:31` | `Marshal.AllocHGlobal(new IntPtr(bytes))` | Primary array allocation |
| `UnmanagedMemoryBlock`1.cs:995` | `Marshal.FreeHGlobal(Address)` | Deallocation in `Disposer` |
| `StackedMemoryPool.cs:90` | `Marshal.AllocHGlobal(SingleSize)` | Pool overflow allocation |
| `StackedMemoryPool.cs:169` | `Marshal.FreeHGlobal(addr)` | Pool cleanup |
| `StackedMemoryPool.cs:238` | `individualyAllocated.ForEach(Marshal.FreeHGlobal)` | Pool disposal |

`Marshal.AllocHGlobal` wraps `LocalAlloc` on Windows (Win32 legacy) and `malloc` on Unix — no alignment guarantees beyond platform default (8 or 16 bytes), no zero-init option, and `IntPtr` return type requires casting.

## Proposal

Replace with equivalent `NativeMemory` calls:

```csharp
// Drop-in replacement:
var ptr = (IntPtr)NativeMemory.Alloc((nuint)bytes);
NativeMemory.Free((void*)ptr);

// Aligned (enables future SIMD):
var ptr = (IntPtr)NativeMemory.AlignedAlloc((nuint)bytes, alignment: 32);
NativeMemory.AlignedFree((void*)ptr);

// Zero-initialized (optimized np.zeros):
var ptr = (IntPtr)NativeMemory.AllocZeroed((nuint)bytes);
```

- [ ] Replace `Marshal.AllocHGlobal` → `NativeMemory.Alloc` in `UnmanagedMemoryBlock`1.cs`
- [ ] Replace `Marshal.FreeHGlobal` → `NativeMemory.Free` in `Disposer`
- [ ] Replace alloc/free in `StackedMemoryPool.cs` (3 sites)
- [ ] Update `AllocationType` enum if needed (new variant or replace `AllocHGlobal` wholesale)
- [ ] Add `AllocZeroed` fast path for `np.zeros` / `np.zeros_like`
- [ ] Add allocation benchmarks to `NumSharp.Benchmark`
- [ ] Verify all tests pass

## Evidence

- `NativeMemory.AlignedAlloc` allows 32-byte (AVX2) or 64-byte (AVX-512) alignment — prerequisite for SIMD vectorization of arithmetic loops
- `NativeMemory.AllocZeroed` delegates to `calloc` / OS zero-page mapping — potentially faster than `Alloc` + manual `Unsafe.InitBlock`
- `NativeMemory.Alloc` returns `void*` directly, avoiding `IntPtr` round-trip in a codebase that immediately casts to `T*`
- The `Disposer` class already dispatches on `AllocationType` enum — clean extension point

## Scope / Non-goals

- **In scope**: Replace 5 allocation sites, add benchmarks, optional `AllocZeroed` fast path
- **Not in scope**: SIMD vectorization of arithmetic loops (separate effort), changing `StackedMemoryPool` pooling strategy, `NativeMemory.AlignedRealloc`

## Benchmark / Performance

Must benchmark before merging. The allocation hot path affects every `NDArray` creation.

| Benchmark | What to measure |
|-----------|----------------|
| Allocation throughput | `NativeMemory.Alloc` vs `Marshal.AllocHGlobal` at small (<1KB), medium (1KB-1MB), large (>1MB) sizes |
| Aligned overhead | `AlignedAlloc(32)` vs `Alloc` for the same sizes |
| Zero-init | `AllocZeroed` vs `Alloc` + `Unsafe.InitBlock` / `Span.Clear` |
| Pool interaction | `StackedMemoryPool.Take`/`Return` with both APIs |
| End-to-end | `np.arange(N)`, `np.zeros(N)`, `a + b` for representative array sizes |

## Breaking changes

None — internal implementation detail, no public API changes.

## Related issues

- #531 — .NET 8/10 TFM upgrade (prerequisite — `NativeMemory` requires net6.0+)

File	Call	Purpose
`UnmanagedMemoryBlock`1.cs:31`	`Marshal.AllocHGlobal(new IntPtr(bytes))`	Primary array allocation
`UnmanagedMemoryBlock`1.cs:995`	`Marshal.FreeHGlobal(Address)`	Deallocation in `Disposer`
`StackedMemoryPool.cs:90`	`Marshal.AllocHGlobal(SingleSize)`	Pool overflow allocation
`StackedMemoryPool.cs:169`	`Marshal.FreeHGlobal(addr)`	Pool cleanup
`StackedMemoryPool.cs:238`	`individualyAllocated.ForEach(Marshal.FreeHGlobal)`	Pool disposal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Modernize unmanaged allocation: Marshal.AllocHGlobal → NativeMemory #528

Overview

Problem

Proposal

Evidence

Scope / Non-goals

Benchmark / Performance

Breaking changes

Related issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Benchmark	What to measure
Allocation throughput	`NativeMemory.Alloc` vs `Marshal.AllocHGlobal` at small (<1KB), medium (1KB-1MB), large (>1MB) sizes
Aligned overhead	`AlignedAlloc(32)` vs `Alloc` for the same sizes
Zero-init	`AllocZeroed` vs `Alloc` + `Unsafe.InitBlock` / `Span.Clear`
Pool interaction	`StackedMemoryPool.Take`/`Return` with both APIs
End-to-end	`np.arange(N)`, `np.zeros(N)`, `a + b` for representative array sizes

Uh oh!

Modernize unmanaged allocation: Marshal.AllocHGlobal → NativeMemory #528

Description

Overview

Problem

Proposal

Evidence

Scope / Non-goals

Benchmark / Performance

Breaking changes

Related issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions