Skip to content

Fixes remote and pagemap#7

Merged
davidchisnall merged 3 commits into
masterfrom
mjp41/refactor
Jan 18, 2019
Merged

Fixes remote and pagemap#7
davidchisnall merged 3 commits into
masterfrom
mjp41/refactor

Conversation

@mjp41

@mjp41 mjp41 commented Jan 18, 2019

Copy link
Copy Markdown
Member

This PR makes the pagemap have OS_PAGE_SIZE alignment, and fixes the surrounding API.

Additionally, it fixes the encoding of sizeclasses into the top bits of remote allocation, to use the bottom bits, and shift the pointers up. This allows a right signed shift to preserve the sign extended address space.

mjp41 added 2 commits January 18, 2019 14:30
The encoding in the top bits for the size class did not respect kernel
pointers.  Using an intptr_t means, we can use a signed shift to
maintain the kernel pointers.
Guarantee the page map is page aligned.  Fix public API.
@mjp41 mjp41 requested a review from davidchisnall January 18, 2019 15:29
@mjp41

mjp41 commented Jan 18, 2019

Copy link
Copy Markdown
Member Author

David could you check if this works for the kernel?

Comment thread src/mem/largealloc.h Outdated

@davidchisnall davidchisnall left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@davidchisnall davidchisnall merged commit 20b84b9 into master Jan 18, 2019
@davidchisnall davidchisnall deleted the mjp41/refactor branch January 18, 2019 17:11
jayakasadev referenced this pull request in jayakasadev/snmalloc Jun 10, 2026
Introduces the StackWalker abstraction described in
.claude/research/heap-profiling/stack-walker.md as a new PAL header
(pal_stack_walker.h, included from pal/pal.h). This is the first concrete
piece of Phase 2.1 of the heap-profiling milestone (ClickUp 86ahzwhq5).

Walker capabilities:
- FramePointerWalker: pure dependent-load loop with per-frame validation
  (alignment, strict-monotonic FP, stack-range, sentinel null-FP). Reads
  fp[0] (saved FP) and fp[1] (saved LR) from canonical aarch64/x86_64
  frame headers. On aarch64, unconditionally strips Pointer-Authentication
  Code bits from the saved LR via ptrauth_strip on Apple and xpaclri
  (HINT #7) elsewhere -- both decode to a NOP on cores without
  FEAT_PAuth, so cost is zero on non-PAC hardware.
- POD thread_local stack-bounds cache populated lazily via
  pthread_get_stackaddr_np on macOS and pthread_getattr_np on Linux.
  Zero-initialised; no constructor, no __cxa_thread_atexit, no malloc on
  first access -- the only construction pattern provably reentrancy-safe
  from inside an allocator's sample path.
- NullStackWalker fallback for unsupported targets (Windows, FreeBSD,
  OpenEnclave, CHERI/Morello, non-x86_64/aarch64). Returns 0 frames.
- Async-signal-safe: no malloc, no locks, no syscalls, no TLS
  construction. Graceful degradation on broken FP chains.
- Selection at compile time via preprocessor macros. No CMake option in
  this commit (deferred -- see "what's NOT done" below).
- A free function snmalloc::profile::stack_walk() wraps the default
  walker for callers that don't need to pick one explicitly.

Supported arches: x86_64 + aarch64 on Linux + macOS.

Microbenchmark (src/test/perf/stack_walker_bench/):
- Recursive call-chain builder with NOINLINE + tail-call-prevention
  asm-barriers. Sweeps depths 2/4/8/16/32, takes min of 5 repeats per
  depth, reports total ns / ns-per-iter / ns-per-frame and a two-point
  slope estimate.
- Auto-discovered by the existing perf harness; added to
  TESTLIB_ONLY_TESTS so it shares an object library across fast/check
  flavours.
- Asserts ns/frame < 50 (5x headroom over the ~10 ns/frame design
  target). Skipped under --smoke and Debug builds.
- Measured on Apple Silicon M-series: ~0.5-1.0 ns/frame steady state
  (deepest depth 35 captured frames, total ~21 us / 1M iterations =
  20.6 ns/iter, slope 0.53 ns/frame). Well under the design target.

What is NOT done in this commit:
- The walker is NOT wired into any allocator path. No SNMALLOC_PROFILE
  gating exists yet; that lives in Phase 3.
- The matching CMake plumbing -- a SNMALLOC_PROFILE_STACK_WALKER
  option (fp / null / auto) and -fno-omit-frame-pointer injection for
  snmalloc TUs -- is left for a follow-up. The header today is
  controlled by SNMALLOC_PROFILE_STACK_WALKER_FP /
  SNMALLOC_PROFILE_STACK_WALKER_NULL preprocessor overrides plus an
  arch/OS auto-detection default.
- Stack-capture-at-sample-hit (ClickUp 86ahzwhq5's sibling 86ahzwhmh)
  is NOT included; it requires the Sampler from Phase 2.2.

Files:
- src/snmalloc/pal/pal_stack_walker.h (new, header-only)
- src/snmalloc/pal/pal.h (one #include line)
- src/test/perf/stack_walker_bench/stack_walker_bench.cc (new)
- CMakeLists.txt (one-word addition to TESTLIB_ONLY_TESTS)

ClickUp: 86ahzwhq5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants