Skip to content

WIP: BatchIt#637

Closed
nwf-msr wants to merge 17 commits into
microsoft:mainfrom
nwf-msr:202309-msgpass
Closed

WIP: BatchIt#637
nwf-msr wants to merge 17 commits into
microsoft:mainfrom
nwf-msr:202309-msgpass

Conversation

@nwf-msr

@nwf-msr nwf-msr commented Sep 21, 2023

Copy link
Copy Markdown
Contributor

As part of the assessment of #634, but also perhaps more generally useful. Opinions welcome.

@nwf-msr nwf-msr requested a review from mjp41 September 21, 2023 03:14
@mjp41

mjp41 commented Sep 24, 2023

Copy link
Copy Markdown
Member

@nwf-msr

nwf-msr commented Sep 25, 2023

Copy link
Copy Markdown
Contributor Author

Looks like it is leaking in some case https://github.com/microsoft/snmalloc/actions/runs/6279469686/job/17055222812?pr=637#step:7:149

Whoops; I had the loop termination conditions wrong. They're fixed now, I think. Let's see if CI agrees.

@nwf-msr nwf-msr changed the title WIP: msgpass test WIP: msgpass test and related changes Nov 16, 2023
@nwf-msr

nwf-msr commented Nov 16, 2023

Copy link
Copy Markdown
Contributor Author

After discussions with @mjp41 yesterday, I've introduced a notion of "tweakable obfuscation" and have made all the intra-slab free lists' backwards signatures use the address of the slab metadata as the "tweak". The next step would be to remove the per-thread keys and have everyone use a common global key (probably not RemoteAllocator::key_global!) and apply the same tweaking. This opens the door to sending threads being able to build up segments of slab free lists that can be spliced in by the recipient in O(1) rather than O(n).

@nwf-msr

nwf-msr commented Dec 14, 2023

Copy link
Copy Markdown
Contributor Author

I've (at long last) got things flying end to end with a very simple "cache" on the sending side -- a single open ring -- but I think some review and investigation is a good idea. Here's what mimalloc-bench makes of the current state of things in terms of time
image
and memory
image

@nwf-msr nwf-msr changed the title WIP: msgpass test and related changes WIP: BatchIt May 23, 2024
@nwf-msr

nwf-msr commented May 23, 2024

Copy link
Copy Markdown
Contributor Author

We should:

  1. figure out how to make the caching layer optional, which probably just means some more
    std::conditional_t use.
  2. get someone with chops to assess the "tweaked obfuscation" changes.
  3. offer to randomize deallocator cache construction order (Matt writes: "as we are building a ring, we can add to the start or the end, so perhaps we could at least build an unpredictable order in the ring")
  4. offer probabilistic premature eviction from the deallocator caches to further thwart attempts to control free-list order

@mjp41

mjp41 commented Jun 13, 2024

Copy link
Copy Markdown
Member

Two things to address:

  • Randomisation - this might break some of the randomisation, can we use the ways to build multiple queues for the same slab.
  • Can we disable this feature as feature flag and constexpr/conditional_t, so we can analyse performance more in the future.

@nwf-msr

nwf-msr commented Jun 13, 2024

Copy link
Copy Markdown
Contributor Author

Just rebasing after #659 landed. Todo-s remain to be addressed.

@nwf-msr

nwf-msr commented Jun 13, 2024

Copy link
Copy Markdown
Contributor Author

And the novelty of [[no_unique_address]] continues to sting. Hm.

@nwf-msr

nwf-msr commented Jun 19, 2024

Copy link
Copy Markdown
Contributor Author

Start addressing to-dos, specifically being able to turn BatchIt off: rewrite history to have always had a RemoteDeallocCacheBatching structure that encapsulates the client-side logic. We can pair this with the current RemoteMessage structure, then add parallel non-batching implementations of the client-side logic and the RemoteMessage internals.

@nwf-msr

nwf-msr commented Jun 19, 2024

Copy link
Copy Markdown
Contributor Author

OK, well, apparently MAX_CAPACITY_BITS needs to be at least 17 - 4 = 13, were it to be universal: 64-bit cross-builds under qemu use a MIN_CHUNK_SIZE of 17 and a smallest sizeclass of 16 bytes.

But that breaks all the 32-bit builds, because MAX_SMALL_SIZECLASS_BITS is 16 and MIN_OBJECT_COUNT is sometimes 13, and 16 + ceil(log_2(13)) + 13 is 33.

I'm not sure why some other Windows builds are still broken.

@nwf-msr

nwf-msr commented Sep 4, 2024

Copy link
Copy Markdown
Contributor Author

OK, here's a version with, I think, all todo-s at least prototyped. BatchIt can be turned off (see the "Make BatchIt optional" commit and/or CMakeLists.txt) and it now attempts to play nicely with mitigations(random_preserve) by stochastically evicting batches during construction.

Plumb its use around remoteallocator and remotecache
This prepares the recipient to process a batched message.
Exercise recipient machinery by having the senders collect adjacent frees to
the same slab into a batch.
This might involve multiple (I think at most two, at the moment) transitions in
the slab lifecycle state machine.  Towards that end, return indicators to the
caller that the slow path must be taken and how many objects of the original
set have not yet been counted as returned.
We can, as Matt so kindly reminds me, go get them from the pagemap.  Since we
need this value only when closing a ring, the read from over there is probably
not very onerous.  (We could also get the slab pointer from an object in the
ring, but we need that whenever inserting into the cache, so it's probably more
sensible to store that locally?)
Move ring set bits and associativity knobs to allocconfig and expose them via
CMake.  If associtivity is zero, use non-batched implementations of the
RemoteMessage and RemoteDeallocCacheBatching classes.

By default, kick BatchIt on when we have enough room in the minimum allocation
size to do it.  Exactly how much space is enough is a function of which
mitigations we have enabled and whether or not we are compiling with C++20.
There's no need for a full pointer here, it'd just make the structure larger on
CHERI.
In order not to thwart `mitigations(random_preserve)` too much, if it's on in
combination with BatchIt, roll the dice every time we append to a batch to
decide if we should stochastically evict this batch.  By increasing the number
of batches, we allow the recipient allocator increased opportunity to randomly
stripe batches across the two `freelist::Builder` segments associated with each
slab.
@nwf-msr

nwf-msr commented Sep 12, 2024

Copy link
Copy Markdown
Contributor Author

Rebased after merging #675. Trees are identical, just commit objects differ.

@nwf nwf mentioned this pull request Sep 18, 2024
@nwf-msr nwf-msr closed this Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants