speed optimizations#2266
Conversation
|
@mpuccio Could you review the code changes? :-) |
|
To give some more background. On my local machine, I can achieve on the data provided by @davidrohr a throughput of 18 MB/s. While this may sound little, we do float (or smaller) memcpy operations. This means assuming all float, we do 2 x 4.5 million 4 byte memcpy per second. This is because the in-memory representation is different from the disk-one of course. So we have widely spread memory regions which are all large (baskets) from which we take 4 bytes each and copy it so some other structure. I'd assume this will produce only cache misses... As the aim of the merger is to provide larger baskets to speed up subsequent reading not much can be done about this strategy. The optimizations here, skip this part if we already have large enough trees (> 10 MB) and no manipulation of the data needs to be done... I verified that the data in the output trees of one file is identical. |
|
Hm, if you have such eratic patterns, perhaps you could use the |
|
Thanks for the suggestion. I will try this as well. For now, I made this ready for review. |
|
Hi @jgrosseo, no, the PR looks good to me. I still have to solve the issue with the hypertriton AO2Ds merging, but since this PR does not change things in this respect, it can go on. |
Factor 4 increase of merging speed on pass 3
However, this needs to be carefully checked first.