Fixes remote and pagemap#7
Merged
Merged
Conversation
The encoding in the top bits for the size class did not respect kernel pointers. Using an intptr_t means, we can use a signed shift to maintain the kernel pointers.
Guarantee the page map is page aligned. Fix public API.
Member
Author
|
David could you check if this works for the kernel? |
jayakasadev
referenced
this pull request
in jayakasadev/snmalloc
Jun 10, 2026
Introduces the StackWalker abstraction described in .claude/research/heap-profiling/stack-walker.md as a new PAL header (pal_stack_walker.h, included from pal/pal.h). This is the first concrete piece of Phase 2.1 of the heap-profiling milestone (ClickUp 86ahzwhq5). Walker capabilities: - FramePointerWalker: pure dependent-load loop with per-frame validation (alignment, strict-monotonic FP, stack-range, sentinel null-FP). Reads fp[0] (saved FP) and fp[1] (saved LR) from canonical aarch64/x86_64 frame headers. On aarch64, unconditionally strips Pointer-Authentication Code bits from the saved LR via ptrauth_strip on Apple and xpaclri (HINT #7) elsewhere -- both decode to a NOP on cores without FEAT_PAuth, so cost is zero on non-PAC hardware. - POD thread_local stack-bounds cache populated lazily via pthread_get_stackaddr_np on macOS and pthread_getattr_np on Linux. Zero-initialised; no constructor, no __cxa_thread_atexit, no malloc on first access -- the only construction pattern provably reentrancy-safe from inside an allocator's sample path. - NullStackWalker fallback for unsupported targets (Windows, FreeBSD, OpenEnclave, CHERI/Morello, non-x86_64/aarch64). Returns 0 frames. - Async-signal-safe: no malloc, no locks, no syscalls, no TLS construction. Graceful degradation on broken FP chains. - Selection at compile time via preprocessor macros. No CMake option in this commit (deferred -- see "what's NOT done" below). - A free function snmalloc::profile::stack_walk() wraps the default walker for callers that don't need to pick one explicitly. Supported arches: x86_64 + aarch64 on Linux + macOS. Microbenchmark (src/test/perf/stack_walker_bench/): - Recursive call-chain builder with NOINLINE + tail-call-prevention asm-barriers. Sweeps depths 2/4/8/16/32, takes min of 5 repeats per depth, reports total ns / ns-per-iter / ns-per-frame and a two-point slope estimate. - Auto-discovered by the existing perf harness; added to TESTLIB_ONLY_TESTS so it shares an object library across fast/check flavours. - Asserts ns/frame < 50 (5x headroom over the ~10 ns/frame design target). Skipped under --smoke and Debug builds. - Measured on Apple Silicon M-series: ~0.5-1.0 ns/frame steady state (deepest depth 35 captured frames, total ~21 us / 1M iterations = 20.6 ns/iter, slope 0.53 ns/frame). Well under the design target. What is NOT done in this commit: - The walker is NOT wired into any allocator path. No SNMALLOC_PROFILE gating exists yet; that lives in Phase 3. - The matching CMake plumbing -- a SNMALLOC_PROFILE_STACK_WALKER option (fp / null / auto) and -fno-omit-frame-pointer injection for snmalloc TUs -- is left for a follow-up. The header today is controlled by SNMALLOC_PROFILE_STACK_WALKER_FP / SNMALLOC_PROFILE_STACK_WALKER_NULL preprocessor overrides plus an arch/OS auto-detection default. - Stack-capture-at-sample-hit (ClickUp 86ahzwhq5's sibling 86ahzwhmh) is NOT included; it requires the Sampler from Phase 2.2. Files: - src/snmalloc/pal/pal_stack_walker.h (new, header-only) - src/snmalloc/pal/pal.h (one #include line) - src/test/perf/stack_walker_bench/stack_walker_bench.cc (new) - CMakeLists.txt (one-word addition to TESTLIB_ONLY_TESTS) ClickUp: 86ahzwhq5
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR makes the pagemap have OS_PAGE_SIZE alignment, and fixes the surrounding API.
Additionally, it fixes the encoding of sizeclasses into the top bits of remote allocation, to use the bottom bits, and shift the pointers up. This allows a right signed shift to preserve the sign extended address space.