perf: fast FixedSizeListArray canonicalization for chunked arrays#8161
Conversation
Canonicalizing a `ChunkedArray` of `FixedSizeList` chunks previously fell through to the generic builder path, copying every element into a fresh contiguous buffer. Because each chunk's `elements` child is already exactly `list_size * chunk.len()` long and starts at the first list, the children can be reused directly as the chunks of a combined `ChunkedArray` of `elements`, mirroring the existing `swizzle_list_chunks` approach for `List`. Route `DType::FixedSizeList` through `_canonicalize` and add `swizzle_fixed_size_list_chunks`, which builds the combined elements and wraps them in a single `FixedSizeListArray` in O(nchunks) without copying element data. Adds a `chunked_fsl_canonicalize` divan benchmark and a regression test. The benchmark shows the swizzle is effectively O(1) in list size (e.g. 1024 elements/list across 32 chunks drops from ~175 ms to ~3 µs). Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | chunked_varbinview_opt_canonical_into[(1000, 10)] |
225.4 µs | 188.1 µs | +19.85% |
| ❌ | Simulation | chunked_varbinview_canonical_into[(100, 100)] |
273.1 µs | 307.8 µs | -11.28% |
| 🆕 | Simulation | canonicalize[1024, 2] |
N/A | 18.7 µs | N/A |
| 🆕 | Simulation | canonicalize[1024, 8] |
N/A | 20.4 µs | N/A |
| 🆕 | Simulation | canonicalize[16, 2] |
N/A | 23.9 µs | N/A |
| 🆕 | Simulation | canonicalize[256, 2] |
N/A | 18.9 µs | N/A |
| 🆕 | Simulation | canonicalize[256, 32] |
N/A | 24.7 µs | N/A |
| 🆕 | Simulation | canonicalize[1024, 32] |
N/A | 24.6 µs | N/A |
| 🆕 | Simulation | canonicalize[16, 32] |
N/A | 25.6 µs | N/A |
| 🆕 | Simulation | canonicalize[16, 8] |
N/A | 20.6 µs | N/A |
| 🆕 | Simulation | canonicalize[256, 8] |
N/A | 20.4 µs | N/A |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing claude/admiring-keller-4Sz1N (e0e8c96) with develop (30103b8)
Benchmarks: PolarSignals ProfilingVortex (geomean): 0.994x ➖ How to read Verdict and Engines
datafusion / vortex-file-compressed (0.994x ➖, 0↑ 1↓)
No file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.025x ➖, 0↑ 1↓)
datafusion / vortex-compact (0.995x ➖, 0↑ 0↓)
datafusion / parquet (0.989x ➖, 1↑ 0↓)
duckdb / vortex-file-compressed (0.999x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.019x ➖, 0↑ 0↓)
duckdb / parquet (1.005x ➖, 0↑ 0↓)
No file size changes detected. Full attributed analysis
|
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.194x ❌, 0↑ 15↓)
datafusion / vortex-compact (1.240x ❌, 0↑ 22↓)
datafusion / parquet (1.181x ❌, 0↑ 16↓)
datafusion / arrow (1.461x ❌, 0↑ 22↓)
duckdb / vortex-file-compressed (1.168x ❌, 0↑ 19↓)
duckdb / vortex-compact (1.134x ❌, 0↑ 19↓)
duckdb / parquet (1.111x ❌, 0↑ 15↓)
duckdb / duckdb (1.113x ❌, 0↑ 13↓)
No file size changes detected. Full attributed analysis
|
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.013x ➖, 0↑ 3↓)
datafusion / vortex-compact (1.009x ➖, 0↑ 5↓)
datafusion / parquet (1.004x ➖, 0↑ 2↓)
duckdb / vortex-file-compressed (1.011x ➖, 0↑ 4↓)
duckdb / vortex-compact (1.000x ➖, 0↑ 1↓)
duckdb / parquet (1.013x ➖, 0↑ 5↓)
duckdb / duckdb (1.005x ➖, 2↑ 1↓)
No file size changes detected. Full attributed analysis
|
Benchmarks: FineWeb S3Verdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.999x ➖, 1↑ 0↓)
datafusion / vortex-compact (1.006x ➖, 0↑ 0↓)
datafusion / parquet (0.969x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.963x ➖, 1↑ 0↓)
duckdb / vortex-compact (0.971x ➖, 0↑ 0↓)
duckdb / parquet (1.017x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) How to read Verdict and Engines
duckdb / vortex-file-compressed (1.030x ➖, 0↑ 1↓)
duckdb / vortex-compact (1.025x ➖, 0↑ 0↓)
duckdb / parquet (1.045x ➖, 0↑ 1↓)
No file size changes detected. Full attributed analysis
|
Benchmarks: Random AccessVortex (geomean): 1.002x ➖ How to read Verdict and Engines
unknown / unknown (1.037x ➖, 2↑ 2↓)
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.009x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.005x ➖, 0↑ 0↓)
datafusion / parquet (1.001x ➖, 0↑ 0↓)
datafusion / arrow (0.985x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.005x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.999x ➖, 0↑ 0↓)
duckdb / parquet (0.992x ➖, 0↑ 0↓)
duckdb / duckdb (1.004x ➖, 0↑ 0↓)
No file size changes detected. Full attributed analysis
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.945x ➖, 8↑ 2↓)
datafusion / parquet (0.960x ➖, 2↑ 0↓)
duckdb / vortex-file-compressed (0.967x ➖, 5↑ 1↓)
duckdb / parquet (0.961x ➖, 3↑ 0↓)
duckdb / duckdb (0.979x ➖, 0↑ 0↓)
File Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
Full attributed analysis
|
Benchmarks: Appian on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.002x ➖, 0↑ 0↓)
datafusion / parquet (0.997x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.028x ➖, 0↑ 0↓)
duckdb / parquet (1.007x ➖, 0↑ 0↓)
duckdb / duckdb (1.030x ➖, 0↑ 0↓)
File Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
Full attributed analysis
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.980x ➖, 4↑ 4↓)
datafusion / vortex-compact (1.030x ➖, 0↑ 3↓)
datafusion / parquet (0.830x ➖, 7↑ 1↓)
duckdb / vortex-file-compressed (0.994x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.019x ➖, 0↑ 1↓)
duckdb / parquet (0.927x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: CompressionVortex (geomean): 0.990x ➖ How to read Verdict and Engines
unknown / unknown (1.001x ➖, 5↑ 6↓)
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.066x ➖, 0↑ 3↓)
datafusion / vortex-compact (1.137x ➖, 0↑ 4↓)
datafusion / parquet (1.195x ➖, 0↑ 7↓)
duckdb / vortex-file-compressed (1.015x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.955x ➖, 0↑ 0↓)
duckdb / parquet (1.149x ➖, 0↑ 3↓)
Full attributed analysis
|
Summary
This PR adds support for canonicalizing
ChunkedArrayofFixedSizeListArraychunks, similar to existing support forListandStructarrays. The canonicalization process combines multipleFixedSizeListArraychunks into a single array with aChunkedArrayof elements, enabling efficient processing of chunked fixed-size list data.Changes
Core canonicalization logic (
canonical.rs):swizzle_fixed_size_list_chunks()function that combines multipleFixedSizeListArraychunks by reusing their element children directly as chunks of a combined elements array_canonicalize()to handleDType::FixedSizeListcaseFixedSizeListArrayandFixedSizeListArrayExtVTable routing (
mod.rs):execute()method to routeFixedSizeListArraychunks through the canonicalization path (alongsideStruct,List, andVariant)Testing:
pack_fixed_size_lists()unit test verifying that multipleFixedSizeListArraychunks are correctly combined into a canonical formBenchmarking:
chunked_fsl_canonicalizebenchmark to measure canonicalization performance across different chunk counts (2, 8, 32) and list sizes (16, 256, 1024 elements)Testing
pack_fixed_size_lists()that verifies canonicalization of chunked fixed-size lists produces correct resultshttps://claude.ai/code/session_01TY5GScKyQ135hphErzLhpV