feat(parquet): intra-file early stopping via statistics + dynamic filters by zhuqi-lucas · Pull Request #22450 · apache/datafusion

zhuqi-lucas · 2026-05-22T03:22:37Z

Which issue does this PR close?

Rationale for this change

DataFusion already prunes parquet at three granularities — file
(EarlyStoppingStream + FilePruner), row group at scan-startup
(PruningPredicate → RowGroupAccessPlanFilter), and row inside an
open RG (RowFilter).

There's a gap in the middle: once row-group pruning runs at file open, that
decision is frozen because any dynamic filter is still lit(true) at
that point. As TopK tightens its threshold at runtime, subsequent RGs in
the already-opened file keep getting decoded even when their stats already
prove they cannot beat the threshold. This is the dominant cost for
ORDER BY ... LIMIT queries on multi-RG files where file-level pruning
can't help (single large file, or scrambled-RG multi-file).

See the issue for a full architectural diagram and a concrete trace
showing where the wasted I/O / decompression / decode lives.

What changes are included in this PR?

A single decoder paused at row-group boundaries, with a pruner consulted
between row groups and the decoder rebuilt via into_builder() to skip
the row groups the pruner just rejected. Three coordinated pieces:

RowGroupPruner (datafusion/datasource-parquet/src/push_decoder.rs)
mirrors FilePruner at row-group granularity. It uses the
DynamicFilterTracker API from feat(physical-expr): DynamicFilterTracker for cheap dynamic-filter change detection #22460 to subscribe once to every
not-yet-complete dynamic filter in the predicate; tracker.changed()
is a single atomic load — no tree traversal per check. The cached
PruningPredicate is rebuilt only when a watched filter has actually
moved, then evaluated against the next pending row group's statistics
via the existing RowGroupPruningStatistics adapter. Predicate
construction errors and predicate evaluation errors are counted into
two separate metrics so a flaky predicate path can never silently
drop data.
Single-decoder iteration model
(PushDecoderStreamState::transition). The opener builds one
ParquetPushDecoder from the prepared access plan, and the stream
uses arrow-rs 59's ParquetRecordBatchReader iterator to pause at
row-group boundaries. At each boundary the pruner is consulted
against the head of rg_plan (the remaining row-group indices). If
the pruner proves the head RG unwinnable, that index is dropped from
the plan and the decoder is rebuilt via
decoder.into_builder().with_row_groups(remaining).build() so the
skipped RGs are bypassed entirely — no decode, no row-filter eval.
Already-fetched buffered bytes for downstream RGs carry across the
rebuild.
Gate: build the pruner only when the predicate actually moves.
The opener creates a RowGroupPruner only when
DynamicFilterTracking::classify(&predicate) reports Watching (at
least one not-yet-complete dynamic filter) and more than one row
group remains in the access plan. Static or already-complete
predicates were fully consumed by prune_by_statistics at file open,
so re-evaluating them per RG boundary would be wasted work.

The earlier multi-decoder design (PendingDecoderRun,
ParquetAccessPlan::split_runs, force_per_row_group) is removed —
arrow-rs 59's into_builder + with_row_groups makes a single decoder
strictly more capable.

Observability

New Count metric row_groups_pruned_dynamic_filter on
ParquetFileMetrics surfaces the runtime saving.
New dynamic_rg_pruning=eligible marker on ParquetSource's
EXPLAIN (fmt_extra Default + Verbose) signals plan-time
eligibility, emitted whenever the predicate has a still-watching
dynamic portion. Eligible rather than true because the static
plan can't predict the runtime outcome.

Benchmarks (`benchmarks/sort_pushdown_inexact`, 5 iterations)

Query	main	this PR	Δ
Q1 `ORDER BY l_orderkey DESC LIMIT 100`	6.99 ms	3.80 ms	−46%
Q2 `ORDER BY l_orderkey DESC LIMIT 1000`	3.29 ms	1.33 ms	−60%
Q3 `SELECT * ... DESC LIMIT 100`	11.17 ms	9.91 ms	−11%
Q4 `SELECT * ... DESC LIMIT 1000`	9.28 ms	7.95 ms	−14%

Narrow-projection queries gain the most — their per-RG cost is dominated
by metadata + sort-column read, which this PR eliminates for unwinnable
RGs. Wide-projection queries gain less because the kept RG's
all-column decode dominates total time, but still see meaningful
savings.

Are these changes tested?

Three layers:

6 unit tests:
- 3 in push_decoder.rs::tests: RowGroupPruner basic pruning,
  tracker-driven dynamic-filter updates, fallback when the predicate
  has no analyzable bounds.
- 3 in source.rs::tests: dynamic_rg_pruning=eligible marker
  present on dynamic predicate, absent on static predicate, absent
  when there is no predicate at all.
3 integration tests in
datafusion/core/tests/parquet/dynamic_row_group_pruning.rs:
asserts row_groups_pruned_dynamic_filter >= 1 end-to-end on a 5-RG
ORDER BY DESC LIMIT 5 scan; a regression test for the
prepare_access_plan reorder bug that uses ORDER BY ASC against a
file written in descending value order so the sort-pushdown reorder
is exercised; and a quiet-without-TopK test that asserts the metric
stays at 0 (no spurious firing).
New SLT
datafusion/sqllogictest/test_files/dynamic_row_group_pruning.slt:
asserts both EXPLAIN surfaces — plain EXPLAIN shows
dynamic_rg_pruning=eligible, and EXPLAIN ANALYZE pins
row_groups_pruned_dynamic_filter=4 (five RGs, four pruned at
runtime).

cargo clippy --all-targets --all-features -- -D warnings clean.

Are there any user-facing changes?

Two visible additions, both opt-in via existing dynamic-filter
infrastructure:

New row_groups_pruned_dynamic_filter counter visible in
EXPLAIN ANALYZE for queries whose plan carries a
DynamicFilterPhysicalExpr (today: only TopK with
enable_topk_dynamic_filter_pushdown=true, which is the default).
New dynamic_rg_pruning=eligible marker visible in EXPLAIN
output for the same queries.

No config changes, no API breakage, no behavior change for queries
without a dynamic predicate.

Two CI failures on PR apache#22450: 1. **cargo doc** — broken intra-doc link in `ParquetFileMetrics::row_groups_pruned_dynamic_filter`. Switch from `[\`row_groups_pruned_statistics\`]` to `[\`Self::row_groups_pruned_statistics\`]` so rustdoc can resolve it. 2. **sqllogictest substrait round-trip** — adding `dynamic_rg_pruning=eligible` to ParquetSource's `fmt_extra` output shifted every `EXPLAIN` line that already showed a `DynamicFilter` predicate. Add the marker to 13 SLT expectations: - clickbench, explain_analyze, limit, limit_pruning, dynamic_filter_pushdown_config, preserve_file_partitioning, projection_pushdown, push_down_filter_parquet, push_down_filter_regression, repartition_subset_satisfaction, sort_pushdown, statistics_registry, topk - 134 marker insertions total, all on `DataSourceExec:` lines whose predicate contains `DynamicFilter [`. Two summary-level analyze tests also need the new `row_groups_pruned_dynamic_filter=0` counter in their metrics block (`limit_pruning.slt`, `dynamic_filter_pushdown_config.slt`). Dev-level analyze output elides zero-valued counters so the other files don't need it. No behavior change beyond what was already in the previous commit.

Dandandan · 2026-05-22T05:25:39Z

run benchmarks

adriangbot · 2026-05-22T05:28:44Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4515274607-274-nhjxx 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/topk-rg-level-dynamic-pruning (691926f) to 077f08a (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-22T05:28:44Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4515274607-275-nknmm 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/topk-rg-level-dynamic-pruning (691926f) to 077f08a (merge-base) diff using: tpcds
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-22T05:28:51Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4515274607-276-gkpr8 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/topk-rg-level-dynamic-pruning (691926f) to 077f08a (merge-base) diff using: tpch
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-22T05:45:06Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_topk-rg-level-dynamic-pruning
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃ feat_topk-rg-level-dynamic-pruning ┃    Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1  │ 37.91 / 39.47 ±2.07 / 43.42 ms │     37.82 / 38.74 ±0.94 / 40.41 ms │ no change │
│ QQuery 2  │ 19.53 / 19.69 ±0.21 / 20.10 ms │     19.91 / 20.59 ±0.57 / 21.21 ms │ no change │
│ QQuery 3  │ 34.33 / 35.15 ±0.46 / 35.69 ms │     32.22 / 34.22 ±1.50 / 35.76 ms │ no change │
│ QQuery 4  │ 16.91 / 17.09 ±0.17 / 17.40 ms │     17.03 / 17.66 ±0.67 / 18.79 ms │ no change │
│ QQuery 5  │ 41.39 / 41.63 ±0.22 / 42.01 ms │     39.52 / 40.94 ±0.75 / 41.54 ms │ no change │
│ QQuery 6  │ 15.93 / 15.99 ±0.06 / 16.09 ms │     15.90 / 16.11 ±0.16 / 16.33 ms │ no change │
│ QQuery 7  │ 46.71 / 49.37 ±3.17 / 55.06 ms │     45.70 / 47.23 ±1.55 / 49.93 ms │ no change │
│ QQuery 8  │ 43.93 / 44.87 ±0.82 / 46.09 ms │     44.03 / 44.42 ±0.37 / 45.10 ms │ no change │
│ QQuery 9  │ 48.69 / 50.08 ±1.04 / 51.92 ms │     48.60 / 49.70 ±0.91 / 51.25 ms │ no change │
│ QQuery 10 │ 63.20 / 63.42 ±0.21 / 63.74 ms │     62.93 / 63.36 ±0.39 / 64.02 ms │ no change │
│ QQuery 11 │ 13.16 / 13.34 ±0.16 / 13.64 ms │     13.04 / 13.27 ±0.26 / 13.77 ms │ no change │
│ QQuery 12 │ 23.70 / 24.54 ±0.93 / 26.30 ms │     23.40 / 24.02 ±0.42 / 24.44 ms │ no change │
│ QQuery 13 │ 33.54 / 35.52 ±1.26 / 37.06 ms │     33.30 / 35.22 ±1.09 / 36.63 ms │ no change │
│ QQuery 14 │ 24.96 / 25.10 ±0.09 / 25.20 ms │     24.90 / 25.37 ±0.64 / 26.62 ms │ no change │
│ QQuery 15 │ 30.72 / 30.88 ±0.08 / 30.95 ms │     30.34 / 30.93 ±0.49 / 31.80 ms │ no change │
│ QQuery 16 │ 14.44 / 14.65 ±0.16 / 14.84 ms │     14.67 / 14.84 ±0.24 / 15.30 ms │ no change │
│ QQuery 17 │ 72.04 / 73.15 ±1.03 / 74.89 ms │     74.86 / 75.90 ±0.62 / 76.76 ms │ no change │
│ QQuery 18 │ 61.21 / 62.59 ±1.05 / 63.75 ms │     62.12 / 63.07 ±0.64 / 64.05 ms │ no change │
│ QQuery 19 │ 33.14 / 33.64 ±0.83 / 35.29 ms │     33.47 / 33.73 ±0.34 / 34.40 ms │ no change │
│ QQuery 20 │ 36.90 / 37.57 ±0.77 / 38.86 ms │     37.22 / 37.47 ±0.24 / 37.89 ms │ no change │
│ QQuery 21 │ 56.14 / 57.72 ±1.20 / 59.48 ms │     53.82 / 55.70 ±1.59 / 58.18 ms │ no change │
│ QQuery 22 │ 23.10 / 23.81 ±0.50 / 24.61 ms │     23.31 / 23.99 ±0.93 / 25.83 ms │ no change │
└───────────┴────────────────────────────────┴────────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                                 ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                                 │ 809.27ms │
│ Total Time (feat_topk-rg-level-dynamic-pruning)   │ 806.50ms │
│ Average Time (HEAD)                               │  36.78ms │
│ Average Time (feat_topk-rg-level-dynamic-pruning) │  36.66ms │
│ Queries Faster                                    │        0 │
│ Queries Slower                                    │        0 │
│ Queries with No Change                            │       22 │
│ Queries with Failure                              │        0 │
└───────────────────────────────────────────────────┴──────────┘

Resource Usage

tpch — base (merge-base)

Metric	Value
Wall time	5.0s
Peak memory	5.5 GiB
Avg memory	5.1 GiB
CPU user	29.4s
CPU sys	1.9s
Peak spill	0 B

tpch — branch

Metric	Value
Wall time	5.0s
Peak memory	5.5 GiB
Avg memory	5.0 GiB
CPU user	29.6s
CPU sys	1.8s
Peak spill	0 B

File an issue against this benchmark runner

zhuqi-lucas · 2026-05-22T05:46:29Z

run benchmark sort_pushdown_inexact

adriangbot · 2026-05-22T05:46:53Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_topk-rg-level-dynamic-pruning
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃    feat_topk-rg-level-dynamic-pruning ┃    Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1  │           5.69 / 6.30 ±0.88 / 8.05 ms │           5.78 / 6.31 ±0.95 / 8.21 ms │ no change │
│ QQuery 2  │        80.75 / 81.16 ±0.31 / 81.58 ms │        80.16 / 80.81 ±0.40 / 81.25 ms │ no change │
│ QQuery 3  │        29.38 / 29.57 ±0.13 / 29.71 ms │        28.91 / 29.15 ±0.16 / 29.38 ms │ no change │
│ QQuery 4  │     512.73 / 515.37 ±2.82 / 520.75 ms │     509.87 / 514.41 ±2.49 / 516.87 ms │ no change │
│ QQuery 5  │        50.96 / 51.70 ±0.61 / 52.66 ms │        50.75 / 51.13 ±0.44 / 51.94 ms │ no change │
│ QQuery 6  │        35.28 / 35.77 ±0.38 / 36.31 ms │        35.30 / 35.86 ±0.37 / 36.34 ms │ no change │
│ QQuery 7  │     109.30 / 110.73 ±1.72 / 112.86 ms │     108.59 / 109.56 ±0.94 / 110.99 ms │ no change │
│ QQuery 8  │        36.33 / 36.65 ±0.31 / 37.03 ms │        36.32 / 37.02 ±0.53 / 37.95 ms │ no change │
│ QQuery 9  │        53.25 / 55.51 ±2.05 / 58.81 ms │        53.15 / 54.23 ±0.72 / 54.96 ms │ no change │
│ QQuery 10 │        80.67 / 82.08 ±1.67 / 85.32 ms │        81.02 / 81.34 ±0.21 / 81.55 ms │ no change │
│ QQuery 11 │     314.18 / 317.53 ±4.18 / 325.76 ms │     310.62 / 316.44 ±4.99 / 322.48 ms │ no change │
│ QQuery 12 │        28.43 / 28.63 ±0.20 / 28.94 ms │        28.49 / 28.69 ±0.17 / 28.94 ms │ no change │
│ QQuery 13 │     125.87 / 126.80 ±1.06 / 128.82 ms │     126.23 / 126.71 ±0.54 / 127.68 ms │ no change │
│ QQuery 14 │     502.60 / 505.33 ±1.50 / 506.71 ms │     502.81 / 505.10 ±1.82 / 507.78 ms │ no change │
│ QQuery 15 │        60.38 / 61.69 ±1.71 / 65.03 ms │        60.49 / 61.18 ±0.66 / 62.17 ms │ no change │
│ QQuery 16 │           6.30 / 6.52 ±0.26 / 7.03 ms │           6.34 / 6.51 ±0.16 / 6.81 ms │ no change │
│ QQuery 17 │        80.24 / 81.16 ±0.67 / 82.22 ms │        80.30 / 80.98 ±0.58 / 81.73 ms │ no change │
│ QQuery 18 │     152.37 / 153.04 ±0.40 / 153.48 ms │     151.31 / 152.10 ±0.68 / 153.13 ms │ no change │
│ QQuery 19 │        40.79 / 41.20 ±0.25 / 41.56 ms │        41.17 / 41.46 ±0.21 / 41.79 ms │ no change │
│ QQuery 20 │        34.92 / 35.59 ±0.65 / 36.69 ms │        35.35 / 35.91 ±0.30 / 36.17 ms │ no change │
│ QQuery 21 │        16.69 / 16.98 ±0.24 / 17.33 ms │        16.78 / 17.01 ±0.23 / 17.45 ms │ no change │
│ QQuery 22 │        61.65 / 62.39 ±0.46 / 63.01 ms │        61.77 / 62.59 ±1.17 / 64.87 ms │ no change │
│ QQuery 23 │     480.28 / 482.80 ±1.89 / 485.52 ms │     479.56 / 482.95 ±3.05 / 487.66 ms │ no change │
│ QQuery 24 │     236.03 / 239.91 ±6.29 / 252.43 ms │     233.29 / 235.80 ±2.41 / 239.92 ms │ no change │
│ QQuery 25 │     114.42 / 114.91 ±0.67 / 116.14 ms │     112.30 / 114.90 ±1.52 / 116.61 ms │ no change │
│ QQuery 26 │        70.92 / 71.14 ±0.34 / 71.82 ms │        69.91 / 70.46 ±0.30 / 70.78 ms │ no change │
│ QQuery 27 │           6.42 / 6.56 ±0.16 / 6.87 ms │           6.47 / 6.64 ±0.22 / 7.08 ms │ no change │
│ QQuery 28 │        57.25 / 60.78 ±1.85 / 62.74 ms │        57.93 / 61.09 ±1.61 / 62.34 ms │ no change │
│ QQuery 29 │      98.46 / 100.38 ±2.61 / 105.53 ms │      98.98 / 101.66 ±3.64 / 108.85 ms │ no change │
│ QQuery 30 │        30.12 / 30.48 ±0.31 / 31.00 ms │        29.94 / 30.30 ±0.30 / 30.83 ms │ no change │
│ QQuery 31 │     111.59 / 113.79 ±2.44 / 118.38 ms │     111.55 / 112.71 ±1.71 / 116.05 ms │ no change │
│ QQuery 32 │        20.35 / 20.93 ±0.34 / 21.38 ms │        20.17 / 20.46 ±0.27 / 20.79 ms │ no change │
│ QQuery 33 │        38.68 / 39.14 ±0.35 / 39.58 ms │        38.31 / 38.57 ±0.20 / 38.80 ms │ no change │
│ QQuery 34 │           9.29 / 9.58 ±0.29 / 9.98 ms │          9.20 / 9.57 ±0.36 / 10.20 ms │ no change │
│ QQuery 35 │        80.49 / 81.05 ±0.48 / 81.93 ms │        80.76 / 81.51 ±0.49 / 82.23 ms │ no change │
│ QQuery 36 │           5.75 / 5.91 ±0.17 / 6.25 ms │           5.89 / 6.01 ±0.15 / 6.30 ms │ no change │
│ QQuery 37 │           6.78 / 6.95 ±0.11 / 7.06 ms │           6.72 / 6.94 ±0.27 / 7.42 ms │ no change │
│ QQuery 38 │        68.29 / 69.85 ±1.23 / 71.85 ms │        68.57 / 69.09 ±0.38 / 69.67 ms │ no change │
│ QQuery 39 │        98.14 / 98.47 ±0.32 / 99.05 ms │        97.99 / 98.65 ±0.51 / 99.53 ms │ no change │
│ QQuery 40 │        22.81 / 23.33 ±0.79 / 24.91 ms │        23.09 / 23.27 ±0.15 / 23.51 ms │ no change │
│ QQuery 41 │        11.03 / 11.79 ±1.11 / 13.97 ms │        11.24 / 11.71 ±0.44 / 12.29 ms │ no change │
│ QQuery 42 │        23.77 / 24.22 ±0.29 / 24.69 ms │        24.09 / 24.42 ±0.37 / 25.05 ms │ no change │
│ QQuery 43 │           4.64 / 4.76 ±0.17 / 5.09 ms │           4.80 / 4.91 ±0.17 / 5.24 ms │ no change │
│ QQuery 44 │        10.51 / 10.58 ±0.07 / 10.71 ms │        10.60 / 10.85 ±0.17 / 11.09 ms │ no change │
│ QQuery 45 │        39.70 / 40.69 ±0.70 / 41.49 ms │        40.41 / 40.86 ±0.33 / 41.33 ms │ no change │
│ QQuery 46 │        12.83 / 13.15 ±0.27 / 13.55 ms │        12.69 / 12.87 ±0.16 / 13.12 ms │ no change │
│ QQuery 47 │     229.18 / 232.49 ±2.73 / 236.09 ms │     228.68 / 231.85 ±1.91 / 233.47 ms │ no change │
│ QQuery 48 │     102.74 / 103.73 ±0.80 / 104.88 ms │     103.30 / 103.95 ±0.91 / 105.69 ms │ no change │
│ QQuery 49 │        78.52 / 79.33 ±0.62 / 80.41 ms │        78.96 / 79.36 ±0.45 / 80.18 ms │ no change │
│ QQuery 50 │        59.63 / 60.33 ±0.37 / 60.68 ms │        59.40 / 59.73 ±0.22 / 60.07 ms │ no change │
│ QQuery 51 │        92.53 / 95.36 ±1.81 / 97.37 ms │       92.09 / 95.60 ±4.02 / 102.85 ms │ no change │
│ QQuery 52 │        23.93 / 24.34 ±0.37 / 25.02 ms │        23.81 / 24.28 ±0.38 / 24.74 ms │ no change │
│ QQuery 53 │        29.46 / 29.71 ±0.16 / 29.96 ms │        29.04 / 29.30 ±0.20 / 29.57 ms │ no change │
│ QQuery 54 │        53.72 / 54.35 ±0.36 / 54.67 ms │        54.44 / 55.05 ±0.44 / 55.75 ms │ no change │
│ QQuery 55 │        23.52 / 24.34 ±1.07 / 26.44 ms │        23.21 / 23.56 ±0.24 / 23.91 ms │ no change │
│ QQuery 56 │        39.01 / 39.29 ±0.26 / 39.78 ms │        38.43 / 38.84 ±0.28 / 39.20 ms │ no change │
│ QQuery 57 │     178.37 / 180.09 ±1.83 / 183.56 ms │     175.76 / 176.93 ±1.12 / 178.61 ms │ no change │
│ QQuery 58 │     118.44 / 118.92 ±0.37 / 119.44 ms │     117.20 / 117.99 ±0.54 / 118.58 ms │ no change │
│ QQuery 59 │     117.61 / 119.84 ±2.42 / 123.96 ms │     117.87 / 118.49 ±0.72 / 119.91 ms │ no change │
│ QQuery 60 │        39.05 / 39.84 ±0.52 / 40.49 ms │        38.74 / 39.22 ±0.42 / 39.82 ms │ no change │
│ QQuery 61 │        12.52 / 12.63 ±0.10 / 12.81 ms │        12.61 / 12.79 ±0.24 / 13.25 ms │ no change │
│ QQuery 62 │        46.89 / 47.53 ±0.38 / 47.98 ms │        46.32 / 46.78 ±0.48 / 47.68 ms │ no change │
│ QQuery 63 │        29.70 / 31.05 ±2.04 / 35.10 ms │        29.42 / 29.66 ±0.25 / 30.09 ms │ no change │
│ QQuery 64 │     462.94 / 468.99 ±7.22 / 482.49 ms │     458.19 / 462.97 ±5.14 / 471.10 ms │ no change │
│ QQuery 65 │     146.94 / 150.20 ±2.58 / 152.60 ms │     149.10 / 151.28 ±1.81 / 153.79 ms │ no change │
│ QQuery 66 │        78.92 / 81.62 ±3.98 / 89.53 ms │        78.42 / 80.72 ±2.20 / 84.79 ms │ no change │
│ QQuery 67 │     245.43 / 251.17 ±5.20 / 259.23 ms │     248.87 / 251.06 ±2.35 / 255.17 ms │ no change │
│ QQuery 68 │        12.91 / 13.11 ±0.22 / 13.53 ms │        13.06 / 13.23 ±0.15 / 13.51 ms │ no change │
│ QQuery 69 │        76.13 / 79.82 ±4.99 / 89.27 ms │        76.72 / 77.40 ±0.76 / 78.84 ms │ no change │
│ QQuery 70 │     106.87 / 110.44 ±3.00 / 115.47 ms │     105.01 / 109.92 ±6.93 / 123.68 ms │ no change │
│ QQuery 71 │        35.58 / 35.96 ±0.27 / 36.41 ms │        35.37 / 36.07 ±0.57 / 36.94 ms │ no change │
│ QQuery 72 │ 2132.94 / 2183.71 ±42.17 / 2236.73 ms │ 2111.52 / 2158.63 ±35.37 / 2214.83 ms │ no change │
│ QQuery 73 │           9.03 / 9.24 ±0.22 / 9.65 ms │           9.11 / 9.30 ±0.17 / 9.63 ms │ no change │
│ QQuery 74 │     177.63 / 180.03 ±2.51 / 183.70 ms │     175.98 / 181.10 ±5.65 / 191.23 ms │ no change │
│ QQuery 75 │     145.51 / 146.86 ±1.25 / 149.13 ms │     146.38 / 147.93 ±1.47 / 150.33 ms │ no change │
│ QQuery 76 │        35.58 / 35.88 ±0.29 / 36.38 ms │        35.34 / 35.87 ±0.46 / 36.52 ms │ no change │
│ QQuery 77 │        59.95 / 60.42 ±0.49 / 61.35 ms │        60.26 / 60.70 ±0.42 / 61.43 ms │ no change │
│ QQuery 78 │     188.05 / 191.81 ±4.01 / 199.22 ms │     187.70 / 191.78 ±3.19 / 195.27 ms │ no change │
│ QQuery 79 │        67.60 / 68.26 ±0.66 / 69.06 ms │        66.72 / 67.34 ±0.46 / 68.11 ms │ no change │
│ QQuery 80 │     100.87 / 101.21 ±0.28 / 101.63 ms │     100.03 / 101.28 ±1.64 / 104.53 ms │ no change │
│ QQuery 81 │        24.09 / 24.30 ±0.12 / 24.43 ms │        24.27 / 24.51 ±0.17 / 24.74 ms │ no change │
│ QQuery 82 │        16.37 / 16.55 ±0.19 / 16.91 ms │        16.57 / 16.72 ±0.17 / 17.03 ms │ no change │
│ QQuery 83 │        36.90 / 38.82 ±2.29 / 43.14 ms │        37.15 / 37.48 ±0.40 / 38.16 ms │ no change │
│ QQuery 84 │        43.61 / 43.96 ±0.34 / 44.57 ms │        44.04 / 45.58 ±1.86 / 49.12 ms │ no change │
│ QQuery 85 │     135.59 / 136.80 ±1.39 / 139.37 ms │     136.81 / 137.88 ±0.89 / 139.18 ms │ no change │
│ QQuery 86 │        24.79 / 25.64 ±0.98 / 27.53 ms │        25.15 / 25.53 ±0.22 / 25.76 ms │ no change │
│ QQuery 87 │        68.82 / 70.07 ±0.82 / 71.02 ms │        69.29 / 70.06 ±0.61 / 71.17 ms │ no change │
│ QQuery 88 │        61.33 / 61.95 ±0.45 / 62.63 ms │        62.63 / 63.33 ±0.57 / 64.00 ms │ no change │
│ QQuery 89 │        35.25 / 35.73 ±0.26 / 36.03 ms │        35.55 / 36.03 ±0.26 / 36.27 ms │ no change │
│ QQuery 90 │        16.65 / 16.84 ±0.17 / 17.15 ms │        16.98 / 17.23 ±0.20 / 17.48 ms │ no change │
│ QQuery 91 │        52.24 / 54.16 ±2.38 / 58.84 ms │        52.30 / 53.65 ±1.75 / 57.02 ms │ no change │
│ QQuery 92 │        29.88 / 30.31 ±0.50 / 31.18 ms │        29.84 / 30.43 ±0.40 / 30.97 ms │ no change │
│ QQuery 93 │        50.10 / 50.89 ±0.46 / 51.43 ms │        50.33 / 51.35 ±0.74 / 52.58 ms │ no change │
│ QQuery 94 │        37.79 / 38.44 ±0.65 / 39.69 ms │        37.56 / 38.59 ±0.57 / 39.14 ms │ no change │
│ QQuery 95 │        85.10 / 85.88 ±0.49 / 86.60 ms │        84.23 / 85.24 ±0.88 / 86.74 ms │ no change │
│ QQuery 96 │        24.11 / 24.27 ±0.22 / 24.69 ms │        24.29 / 24.50 ±0.19 / 24.83 ms │ no change │
│ QQuery 97 │        45.50 / 46.04 ±0.35 / 46.55 ms │        46.05 / 46.31 ±0.25 / 46.74 ms │ no change │
│ QQuery 98 │        41.76 / 42.66 ±0.54 / 43.28 ms │        42.44 / 43.15 ±0.62 / 43.97 ms │ no change │
│ QQuery 99 │        70.01 / 70.78 ±0.61 / 71.75 ms │        69.97 / 70.68 ±0.58 / 71.49 ms │ no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                 ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                 │ 10498.86ms │
│ Total Time (feat_topk-rg-level-dynamic-pruning)   │ 10448.87ms │
│ Average Time (HEAD)                               │   106.05ms │
│ Average Time (feat_topk-rg-level-dynamic-pruning) │   105.54ms │
│ Queries Faster                                    │          0 │
│ Queries Slower                                    │          0 │
│ Queries with No Change                            │         99 │
│ Queries with Failure                              │          0 │
└───────────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric	Value
Wall time	55.0s
Peak memory	6.9 GiB
Avg memory	6.1 GiB
CPU user	234.8s
CPU sys	6.0s
Peak spill	0 B

tpcds — branch

Metric	Value
Wall time	55.0s
Peak memory	7.0 GiB
Avg memory	6.3 GiB
CPU user	232.0s
CPU sys	5.8s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-05-22T05:47:32Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4515459477-277-x4d6g 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/topk-rg-level-dynamic-pruning (691926f) to 077f08a (merge-base) diff using: sort_pushdown_inexact
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-22T05:50:29Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_topk-rg-level-dynamic-pruning
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃    feat_topk-rg-level-dynamic-pruning ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.13 / 4.63 ±6.95 / 18.54 ms │          1.11 / 4.59 ±6.90 / 18.39 ms │     no change │
│ QQuery 1  │        12.84 / 12.94 ±0.08 / 13.04 ms │        12.42 / 12.73 ±0.18 / 12.95 ms │     no change │
│ QQuery 2  │        35.38 / 35.81 ±0.30 / 36.23 ms │        35.63 / 35.87 ±0.28 / 36.35 ms │     no change │
│ QQuery 3  │        30.33 / 30.94 ±0.92 / 32.76 ms │        30.38 / 30.89 ±0.60 / 32.06 ms │     no change │
│ QQuery 4  │     221.08 / 222.52 ±1.32 / 224.77 ms │     217.25 / 223.77 ±4.47 / 230.65 ms │     no change │
│ QQuery 5  │     271.34 / 273.02 ±1.66 / 276.10 ms │     267.27 / 271.69 ±2.28 / 273.41 ms │     no change │
│ QQuery 6  │           1.16 / 1.31 ±0.22 / 1.75 ms │           1.18 / 1.32 ±0.21 / 1.73 ms │     no change │
│ QQuery 7  │        13.82 / 14.01 ±0.16 / 14.24 ms │        14.29 / 14.44 ±0.14 / 14.67 ms │     no change │
│ QQuery 8  │     319.01 / 323.19 ±3.92 / 329.01 ms │     342.38 / 356.25 ±7.86 / 364.96 ms │  1.10x slower │
│ QQuery 9  │     446.98 / 451.24 ±3.38 / 455.61 ms │    463.18 / 475.35 ±10.89 / 489.79 ms │  1.05x slower │
│ QQuery 10 │        68.91 / 69.79 ±0.81 / 71.12 ms │        70.01 / 71.13 ±0.70 / 71.92 ms │     no change │
│ QQuery 11 │        80.17 / 81.70 ±1.04 / 83.41 ms │        80.46 / 81.98 ±1.59 / 84.88 ms │     no change │
│ QQuery 12 │     262.57 / 271.05 ±6.00 / 278.82 ms │     263.13 / 272.47 ±6.80 / 283.05 ms │     no change │
│ QQuery 13 │    360.50 / 369.16 ±10.13 / 388.45 ms │     357.78 / 364.59 ±3.90 / 369.19 ms │     no change │
│ QQuery 14 │     276.41 / 281.73 ±4.61 / 289.96 ms │     280.05 / 286.88 ±6.92 / 298.91 ms │     no change │
│ QQuery 15 │    264.25 / 276.30 ±11.55 / 292.24 ms │    294.81 / 307.60 ±14.15 / 333.58 ms │  1.11x slower │
│ QQuery 16 │    627.27 / 659.89 ±17.60 / 680.12 ms │    616.34 / 643.05 ±19.90 / 673.10 ms │     no change │
│ QQuery 17 │    624.54 / 639.67 ±10.36 / 653.36 ms │     602.95 / 613.57 ±9.74 / 631.66 ms │     no change │
│ QQuery 18 │ 1231.71 / 1264.30 ±32.09 / 1324.58 ms │ 1230.87 / 1249.03 ±13.77 / 1269.77 ms │     no change │
│ QQuery 19 │        29.95 / 33.49 ±3.93 / 38.81 ms │        27.88 / 34.23 ±7.63 / 44.83 ms │     no change │
│ QQuery 20 │     531.85 / 542.39 ±8.77 / 552.84 ms │     518.59 / 522.88 ±3.22 / 527.36 ms │     no change │
│ QQuery 21 │     590.04 / 600.69 ±8.53 / 615.62 ms │     589.86 / 595.16 ±4.54 / 603.56 ms │     no change │
│ QQuery 22 │  1047.46 / 1051.15 ±3.07 / 1054.60 ms │ 1085.28 / 1112.76 ±21.95 / 1146.31 ms │  1.06x slower │
│ QQuery 23 │ 3143.81 / 3215.59 ±61.28 / 3327.12 ms │ 3009.38 / 3115.16 ±74.14 / 3210.79 ms │     no change │
│ QQuery 24 │        43.61 / 45.10 ±1.49 / 47.69 ms │        41.73 / 49.54 ±6.86 / 58.93 ms │  1.10x slower │
│ QQuery 25 │     116.41 / 117.72 ±0.81 / 118.72 ms │     112.35 / 113.26 ±0.66 / 114.42 ms │     no change │
│ QQuery 26 │        44.00 / 45.32 ±1.39 / 47.91 ms │        41.64 / 42.41 ±0.68 / 43.37 ms │ +1.07x faster │
│ QQuery 27 │     668.27 / 677.75 ±9.71 / 696.11 ms │     666.49 / 674.74 ±7.19 / 686.27 ms │     no change │
│ QQuery 28 │ 3004.40 / 3024.76 ±13.42 / 3039.76 ms │ 3014.79 / 3056.61 ±36.42 / 3111.94 ms │     no change │
│ QQuery 29 │       40.04 / 51.77 ±14.67 / 76.55 ms │       41.49 / 49.56 ±15.74 / 81.03 ms │     no change │
│ QQuery 30 │     325.79 / 329.77 ±3.21 / 335.46 ms │    311.00 / 329.53 ±13.27 / 351.40 ms │     no change │
│ QQuery 31 │     296.89 / 313.26 ±9.84 / 323.06 ms │    278.41 / 291.30 ±11.42 / 310.81 ms │ +1.08x faster │
│ QQuery 32 │     934.13 / 949.51 ±9.91 / 963.93 ms │    914.90 / 929.16 ±15.07 / 958.29 ms │     no change │
│ QQuery 33 │ 1410.05 / 1512.63 ±79.05 / 1623.38 ms │ 1406.57 / 1430.30 ±12.88 / 1443.94 ms │ +1.06x faster │
│ QQuery 34 │ 1422.56 / 1450.90 ±17.81 / 1474.31 ms │ 1424.50 / 1486.88 ±73.50 / 1624.17 ms │     no change │
│ QQuery 35 │    272.88 / 291.31 ±26.72 / 344.13 ms │    276.79 / 319.68 ±35.69 / 379.77 ms │  1.10x slower │
│ QQuery 36 │        61.63 / 71.05 ±7.09 / 80.23 ms │      65.89 / 82.70 ±12.28 / 102.66 ms │  1.16x slower │
│ QQuery 37 │        34.65 / 35.40 ±0.76 / 36.80 ms │        36.38 / 40.19 ±5.70 / 51.48 ms │  1.14x slower │
│ QQuery 38 │        42.14 / 46.54 ±3.98 / 52.83 ms │        41.07 / 43.14 ±1.72 / 45.06 ms │ +1.08x faster │
│ QQuery 39 │     144.74 / 151.58 ±5.96 / 160.59 ms │     151.07 / 153.66 ±2.72 / 158.21 ms │     no change │
│ QQuery 40 │        13.68 / 16.08 ±3.85 / 23.74 ms │        15.04 / 15.47 ±0.36 / 15.96 ms │     no change │
│ QQuery 41 │        13.32 / 13.46 ±0.13 / 13.69 ms │        14.42 / 16.08 ±3.04 / 22.15 ms │  1.19x slower │
│ QQuery 42 │        12.88 / 14.68 ±3.42 / 21.51 ms │        14.07 / 14.26 ±0.12 / 14.40 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                 ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                 │ 19885.11ms │
│ Total Time (feat_topk-rg-level-dynamic-pruning)   │ 19835.88ms │
│ Average Time (HEAD)                               │   462.44ms │
│ Average Time (feat_topk-rg-level-dynamic-pruning) │   461.30ms │
│ Queries Faster                                    │          4 │
│ Queries Slower                                    │          9 │
│ Queries with No Change                            │         30 │
│ Queries with Failure                              │          0 │
└───────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	100.0s
Peak memory	29.8 GiB
Avg memory	22.9 GiB
CPU user	1032.4s
CPU sys	67.0s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	100.0s
Peak memory	30.6 GiB
Avg memory	23.2 GiB
CPU user	1033.3s
CPU sys	67.3s
Peak spill	0 B

File an issue against this benchmark runner

Copilot

Pull request overview

This PR adds runtime row-group pruning for Parquet scans driven by TopK’s dynamic filter, closing the gap where row groups selected at file open couldn’t be re-pruned after the TopK threshold tightens during execution.

Changes:

Introduces a runtime RowGroupPruner that re-evaluates a dynamic predicate at decoder-run boundaries and skips row groups proven unreachable.
Forces per-row-group decoder splitting when the predicate is dynamic so the runtime pruner has a boundary at every RG.
Adds observability: dynamic_rg_pruning=eligible in EXPLAIN and a new metric row_groups_pruned_dynamic_filter in EXPLAIN ANALYZE, plus tests/SLTs updated accordingly.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
datafusion/datasource-parquet/src/push_decoder.rs	Adds `RowGroupPruner`, tracks row-group indices per decoder run, and skips prunable runs at runtime.
datafusion/datasource-parquet/src/opener/mod.rs	Forces per-RG runs for dynamic predicates; wires pending runs + runtime pruner into `PushDecoderStreamState`.
datafusion/datasource-parquet/src/access_plan.rs	Extends `split_runs` with `force_per_row_group` to avoid coalescing runs for dynamic predicates.
datafusion/datasource-parquet/src/source.rs	Adds `dynamic_rg_pruning=eligible` marker in `fmt_extra` and unit tests for the marker.
datafusion/datasource-parquet/src/row_group_filter.rs	Exposes `RowGroupPruningStatistics` to reuse stats adapter for runtime pruning.
datafusion/datasource-parquet/src/metrics.rs	Adds `row_groups_pruned_dynamic_filter` metric to `ParquetFileMetrics`.
datafusion/core/tests/parquet/mod.rs	Adds helper to read `row_groups_pruned_dynamic_filter` from metrics.
datafusion/core/tests/parquet/dynamic_row_group_pruning.rs	New integration tests validating metric fires for TopK and stays quiet otherwise.
datafusion/sqllogictest/test_files/dynamic_row_group_pruning.slt	New SLT covering both `EXPLAIN` marker and `EXPLAIN ANALYZE` metric value.
datafusion/sqllogictest/test_files/topk.slt	Updates expected plans to include `dynamic_rg_pruning=eligible`.
datafusion/sqllogictest/test_files/statistics_registry.slt	Updates expected plans to include `dynamic_rg_pruning=eligible`.
datafusion/sqllogictest/test_files/sort_pushdown.slt	Updates expected plans to include `dynamic_rg_pruning=eligible`.
datafusion/sqllogictest/test_files/repartition_subset_satisfaction.slt	Updates expected plans to include `dynamic_rg_pruning=eligible`.
datafusion/sqllogictest/test_files/push_down_filter_regression.slt	Updates expected plans to include `dynamic_rg_pruning=eligible`.
datafusion/sqllogictest/test_files/push_down_filter_parquet.slt	Updates expected plans/metrics to include `dynamic_rg_pruning=eligible` and (where relevant) the new counter.
datafusion/sqllogictest/test_files/projection_pushdown.slt	Updates expected plans to include `dynamic_rg_pruning=eligible`.
datafusion/sqllogictest/test_files/preserve_file_partitioning.slt	Updates expected plans to include `dynamic_rg_pruning=eligible`.
datafusion/sqllogictest/test_files/limit.slt	Updates expected plans to include `dynamic_rg_pruning=eligible`.
datafusion/sqllogictest/test_files/limit_pruning.slt	Updates expected metrics to include `row_groups_pruned_dynamic_filter=0` plus eligibility marker.
datafusion/sqllogictest/test_files/explain_analyze.slt	Updates expected plans to include `dynamic_rg_pruning=eligible`.
datafusion/sqllogictest/test_files/dynamic_filter_pushdown_config.slt	Updates expected plans/metrics to include eligibility marker and `row_groups_pruned_dynamic_filter=0` where applicable.
datafusion/sqllogictest/test_files/clickbench.slt	Updates expected plans to include `dynamic_rg_pruning=eligible`.

Comments suppressed due to low confidence (1)

datafusion/datasource-parquet/src/access_plan.rs:458

split_runs computes row_group_needs_filter as !fully_matched without considering the needs_filter argument. When force_per_row_group=true and the scan has no row filter (needs_filter=false), this will still mark all runs as needs_filter=true, causing the opener to treat them as filtered runs (e.g. attempting to fetch row filters / applying predicate-cache settings) even though no row-level filter exists. row_group_needs_filter should be derived as needs_filter && !fully_matched so the run metadata stays consistent with the caller’s capabilities.

        for (idx, (access, fully_matched)) in
            row_groups.into_iter().zip(fully_matched).enumerate()
        {
            if !access.should_scan() {
                continue;
            }

            let row_group_needs_filter = !fully_matched;
            // Coalesce consecutive RGs into a run only when (a) they share
            // the same filter requirement and (b) we're not forcing per-RG
            // splitting for runtime pruning.
            let can_coalesce = !force_per_row_group;
            if can_coalesce
                && let Some(run) = runs
                    .last_mut()
                    .filter(|run| run.needs_filter == row_group_needs_filter)
            {
                run.access_plan.set(idx, access);
                if fully_matched {
                    run.access_plan.mark_fully_matched(idx);
                }
            } else {
                let mut run_plan = ParquetAccessPlan::new_none(num_row_groups);
                run_plan.set(idx, access);
                if fully_matched {
                    run_plan.mark_fully_matched(idx);
                }
                runs.push(RowGroupRun::new(row_group_needs_filter, run_plan));
            }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

adriangbot · 2026-05-22T05:57:44Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_topk-rg-level-dynamic-pruning
--------------------
Benchmark sort_pushdown_inexact.json
--------------------
┏━━━━━━━┳━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query ┃ HEAD ┃ feat_topk-rg-level-dynamic-pruning ┃       Change ┃
┡━━━━━━━╇━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ Q1    │ FAIL │                               FAIL │ incomparable │
│ Q2    │ FAIL │                               FAIL │ incomparable │
│ Q3    │ FAIL │                               FAIL │ incomparable │
│ Q4    │ FAIL │                               FAIL │ incomparable │
└───────┴──────┴────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Benchmark Summary                                 ┃        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ Total Time (HEAD)                                 │ 0.00ms │
│ Total Time (feat_topk-rg-level-dynamic-pruning)   │ 0.00ms │
│ Average Time (HEAD)                               │ 0.00ms │
│ Average Time (feat_topk-rg-level-dynamic-pruning) │ 0.00ms │
│ Queries Faster                                    │      0 │
│ Queries Slower                                    │      0 │
│ Queries with No Change                            │      0 │
│ Queries with Failure                              │      4 │
└───────────────────────────────────────────────────┴────────┘

Resource Usage

sort_pushdown_inexact — base (merge-base)

Metric	Value
Wall time	5.0s
Peak memory	3.7 GiB
Avg memory	3.7 GiB
CPU user	0.2s
CPU sys	0.1s
Peak spill	0 B

sort_pushdown_inexact — branch

Metric	Value
Wall time	5.0s
Peak memory	3.7 GiB
Avg memory	3.7 GiB
CPU user	0.1s
CPU sys	0.1s
Peak spill	0 B

File an issue against this benchmark runner

Per Copilot review on apache#22450: `RowGroupPruner` was using a single `predicate_creation_errors` counter for both predicate construction (`build_pruning_predicate`) AND predicate evaluation (`PruningPredicate::prune`) failures. The log message also said "Ignoring error building..." when the failure was during evaluation. This misattributed evaluation failures and made the metric semantics inconsistent with the static row-group pruning path in `RowGroupAccessPlanFilter::prune_by_statistics`, which already separates the two. `RowGroupPruner::new` now takes both counters: - `predicate_creation_errors`: bumped on `build_pruning_predicate` failures. Wired to `prepared.predicate_creation_errors` from the opener — same field the static path uses. - `predicate_evaluation_errors`: bumped on `PruningPredicate::prune` failures. Wired to `prepared.file_metrics.predicate_evaluation_errors` — same field the static `prune_by_statistics` path uses, so the two paths accumulate into a shared counter. The error log message is updated to say "evaluating" so the metric and the log agree.

zhuqi-lucas · 2026-05-22T06:05:02Z

run benchmark sort_pushdown_inexact

adriangbot · 2026-05-22T06:08:17Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4515622680-278-bcnbp 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/topk-rg-level-dynamic-pruning (0828f1b) to a8f03fd (merge-base) diff using: sort_pushdown_inexact
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-22T06:23:45Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_topk-rg-level-dynamic-pruning
--------------------
Benchmark sort_pushdown_inexact.json
--------------------
┏━━━━━━━┳━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query ┃ HEAD ┃ feat_topk-rg-level-dynamic-pruning ┃       Change ┃
┡━━━━━━━╇━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ Q1    │ FAIL │                               FAIL │ incomparable │
│ Q2    │ FAIL │                               FAIL │ incomparable │
│ Q3    │ FAIL │                               FAIL │ incomparable │
│ Q4    │ FAIL │                               FAIL │ incomparable │
└───────┴──────┴────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Benchmark Summary                                 ┃        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ Total Time (HEAD)                                 │ 0.00ms │
│ Total Time (feat_topk-rg-level-dynamic-pruning)   │ 0.00ms │
│ Average Time (HEAD)                               │ 0.00ms │
│ Average Time (feat_topk-rg-level-dynamic-pruning) │ 0.00ms │
│ Queries Faster                                    │      0 │
│ Queries Slower                                    │      0 │
│ Queries with No Change                            │      0 │
│ Queries with Failure                              │      4 │
└───────────────────────────────────────────────────┴────────┘

Resource Usage

sort_pushdown_inexact — base (merge-base)

Metric	Value
Wall time	5.0s
Peak memory	4.1 GiB
Avg memory	4.1 GiB
CPU user	0.1s
CPU sys	0.1s
Peak spill	0 B

sort_pushdown_inexact — branch

Metric	Value
Wall time	5.0s
Peak memory	4.1 GiB
Avg memory	4.1 GiB
CPU user	0.1s
CPU sys	0.1s
Peak spill	0 B

File an issue against this benchmark runner

zhuqi-lucas · 2026-05-22T06:38:23Z

run benchmark topk_tpch

adriangbot · 2026-05-22T06:41:37Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4515872406-280-dfq5d 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/topk-rg-level-dynamic-pruning (0828f1b) to a8f03fd (merge-base) diff using: topk_tpch
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-05-22T06:51:42Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_topk-rg-level-dynamic-pruning
--------------------
Benchmark run_topk_tpch.json
--------------------
┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃                           HEAD ┃ feat_topk-rg-level-dynamic-pruning ┃        Change ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1    │    2.14 / 2.74 ±0.76 / 4.10 ms │        2.12 / 2.79 ±0.68 / 4.02 ms │     no change │
│ Q2    │ 10.66 / 11.36 ±0.68 / 12.23 ms │        2.81 / 3.61 ±0.87 / 4.72 ms │ +3.15x faster │
│ Q3    │ 31.77 / 32.15 ±0.43 / 32.83 ms │     31.71 / 31.92 ±0.16 / 32.18 ms │     no change │
│ Q4    │ 11.83 / 12.29 ±0.77 / 13.82 ms │        3.13 / 3.25 ±0.13 / 3.48 ms │ +3.78x faster │
│ Q5    │  9.94 / 10.14 ±0.18 / 10.46 ms │      9.95 / 10.02 ±0.05 / 10.09 ms │     no change │
│ Q6    │ 17.19 / 17.39 ±0.15 / 17.56 ms │     17.11 / 17.36 ±0.37 / 18.09 ms │     no change │
│ Q7    │ 37.07 / 38.08 ±1.17 / 40.08 ms │     37.00 / 37.41 ±0.37 / 38.07 ms │     no change │
│ Q8    │ 28.13 / 28.59 ±0.60 / 29.71 ms │        6.86 / 7.16 ±0.42 / 7.98 ms │ +3.99x faster │
│ Q9    │ 35.34 / 36.86 ±1.54 / 38.77 ms │        8.36 / 8.50 ±0.08 / 8.60 ms │ +4.34x faster │
│ Q10   │ 54.13 / 55.29 ±1.83 / 58.93 ms │     12.77 / 13.00 ±0.45 / 13.89 ms │ +4.25x faster │
│ Q11   │    3.75 / 3.91 ±0.11 / 4.05 ms │        3.82 / 4.08 ±0.31 / 4.68 ms │     no change │
└───────┴────────────────────────────────┴────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                                 ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                                 │ 248.79ms │
│ Total Time (feat_topk-rg-level-dynamic-pruning)   │ 139.08ms │
│ Average Time (HEAD)                               │  22.62ms │
│ Average Time (feat_topk-rg-level-dynamic-pruning) │  12.64ms │
│ Queries Faster                                    │        5 │
│ Queries Slower                                    │        0 │
│ Queries with No Change                            │        6 │
│ Queries with Failure                              │        0 │
└───────────────────────────────────────────────────┴──────────┘

Resource Usage

topk_tpch — base (merge-base)

Metric	Value
Wall time	5.0s
Peak memory	4.9 GiB
Avg memory	4.5 GiB
CPU user	11.4s
CPU sys	1.1s
Peak spill	0 B

topk_tpch — branch

Metric	Value
Wall time	5.0s
Peak memory	4.4 GiB
Avg memory	4.4 GiB
CPU user	6.5s
CPU sys	0.6s
Peak spill	0 B

File an issue against this benchmark runner

zhuqi-lucas · 2026-05-22T07:11:11Z

#22450 (comment)

┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃                           HEAD ┃ feat_topk-rg-level-dynamic-pruning ┃        Change ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1    │    2.14 / 2.74 ±0.76 / 4.10 ms │        2.12 / 2.79 ±0.68 / 4.02 ms │     no change │
│ Q2    │ 10.66 / 11.36 ±0.68 / 12.23 ms │        2.81 / 3.61 ±0.87 / 4.72 ms │ +3.15x faster │
│ Q3    │ 31.77 / 32.15 ±0.43 / 32.83 ms │     31.71 / 31.92 ±0.16 / 32.18 ms │     no change │
│ Q4    │ 11.83 / 12.29 ±0.77 / 13.82 ms │        3.13 / 3.25 ±0.13 / 3.48 ms │ +3.78x faster │
│ Q5    │  9.94 / 10.14 ±0.18 / 10.46 ms │      9.95 / 10.02 ±0.05 / 10.09 ms │     no change │
│ Q6    │ 17.19 / 17.39 ±0.15 / 17.56 ms │     17.11 / 17.36 ±0.37 / 18.09 ms │     no change │
│ Q7    │ 37.07 / 38.08 ±1.17 / 40.08 ms │     37.00 / 37.41 ±0.37 / 38.07 ms │     no change │
│ Q8    │ 28.13 / 28.59 ±0.60 / 29.71 ms │        6.86 / 7.16 ±0.42 / 7.98 ms │ +3.99x faster │
│ Q9    │ 35.34 / 36.86 ±1.54 / 38.77 ms │        8.36 / 8.50 ±0.08 / 8.60 ms │ +4.34x faster │
│ Q10   │ 54.13 / 55.29 ±1.83 / 58.93 ms │     12.77 / 13.00 ±0.45 / 13.89 ms │ +4.25x faster │
│ Q11   │    3.75 / 3.91 ±0.11 / 4.05 ms │        3.82 / 4.08 ±0.31 / 4.68 ms │     no change │
└───────┴────────────────────────────────┴────────────────────────────────────┴───────────────┘

cc @alamb @adriangb @Dandandan
This is matching my local test, also sort_pushdown_inexact will improve a lot!

Dandandan · 2026-05-22T07:11:22Z

Nice, impressive 🚀🚀🚀

…mic RG pruning Adds a second integration test in `dynamic_row_group_pruning.rs` covering the **page-level `RowSelection`** path adriangb asked about: WHERE `v >= 500` engages the page index, masking out the first 5 of RG 0's 10 pages; `ORDER BY v DESC LIMIT 5` then drives runtime RG pruning, which drops RGs 0..3 in a single `into_builder` rebuild. If the rebuild lost or shifted the page-index-derived row selection, either the result rows would drift or the `page_index_pages_pruned` count would collapse. The test pins both values — pages_pruned >= 5 AND the top-5 descending values (4995..=4999) — so the rebuild's selection-preservation contract is nailed down end-to-end. The previous PR commit (`f7cb5e7bd`) covered the per-row `RowFilter` case; this one covers the page-level `RowSelection` case. Together they exercise both filter-state preservation paths arrow-rs's `into_builder` is responsible for. Tooling: extends `Unit` with a `RowGroupAndPage(rows_per_group, rows_per_page)` variant, and `make_test_file_rg` with an optional `row_per_page` argument that sets `data_page_row_count_limit` + `write_batch_size` so the writer produces multi-page RGs.

zhuqi-lucas · 2026-06-16T13:27:45Z

Addressed all comments besides:

split_runs used to build a different decoder for each row group when there was row groups that fully matched filters, allowing those to skip applying the row filter. I think we're loosing that now?

I tried this before, but it failed for testing, i am trying more for this part.

zhuqi-lucas · 2026-06-16T14:08:46Z

Update on this — I prototyped the per-RG toggle: push fully_matched through PreparedAccessPlan into the per-RG rg_plan entry, then at each row-group boundary call decoder.into_builder().with_row_filter(empty_or_real).build() to flip the filter. Limit budget carries across the rebuild correctly. But it exposed a latent invariant in push_decoder.rs worth flagging.

The current rg_plan assumes a 1:1 correspondence between try_next_reader() results and the head of the plan — we pop_front() once per reader. That holds when every RG in the plan actually yields a reader. It does not hold when page-index pruning eliminates every page of an RG: arrow-rs's try_next_reader silently advances past the empty RG and returns the reader for the next one. Concretely, on limit_pruning.slt's species > 'M' AND s >= 50 ORDER BY species LIMIT 3:

Initial plan [RG1 (not fm), RG2 (fm), RG3 (not fm)], filter installed.
arrow-rs page-index-prunes RG 1 entirely (page_index_pages_pruned=2).
try_next_reader returns the reader for RG 2; my code pops RG 1, leaving plan [RG2, RG3].
Next boundary sees front = RG2 (fm), triggers a toggle to empty filter, and rebuilds with with_row_groups([2, 3]) — the new decoder reads RG 2 again.
Scan output: 6 rows instead of 3 (final query result still correct because TopK ranks them, but the metric is off).

Without the toggle, this off-by-one was harmless — we never used rg_plan.front() to redirect the decoder, so the wrong-head condition was invisible. The toggle made it observable.

Fixing it cleanly needs one of:

An arrow-rs hook to query the decoder's actual next-RG index after try_next_reader (filed as apache/arrow-rs#10148 — peek_next_row_group() on ParquetPushDecoder).
Pre-computing which RGs page-index will fully eliminate at file open and dropping them from rg_plan (replicates arrow-rs internals — fragile).
Tracking position by RG metadata identity rather than queue order.

I'm leaving the per-RG toggle out of this PR and will pick it up as a follow-up once the arrow-rs accessor lands (or by going with option 3 if that takes a while). The practical perf gap is narrow: arrow-rs's page-level page_index_pages_skipped_by_fully_matched already skips per-row filter eval on fully-matched RGs whenever the page index is populated — i.e. for any RG large enough to span multiple data pages, which is the typical production layout. The case this PR doesn't (yet) recover is small RGs where no page index helps, plus predicate columns that the page index doesn't cover.

adriangb · 2026-06-16T15:23:29Z

Addressed all comments besides:

split_runs used to build a different decoder for each row group when there was row groups that fully matched filters, allowing those to skip applying the row filter. I think we're loosing that now?

I tried this before, but it failed for testing, i am trying more for this part.

makes sense, but doesn't that mean this PR is possibly a regression for some queries?

adriangb · 2026-06-16T15:23:38Z

run benchmarks

adriangb · 2026-06-16T15:23:44Z

run benchmark topk_tpch

adriangbot · 2026-06-16T15:26:57Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4720419417-573-n6wh9 6.12.68+ #1 SMP Sat May 2 07:49:07 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/topk-rg-level-dynamic-pruning (6c554cc) to 6176a6d (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-06-16T15:27:09Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4720419417-574-pb9fs 6.12.68+ #1 SMP Sat May 2 07:49:07 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/topk-rg-level-dynamic-pruning (6c554cc) to 6176a6d (merge-base) diff using: tpcds
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-06-16T15:27:13Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4720419417-575-nclfk 6.12.68+ #1 SMP Sat May 2 07:49:07 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/topk-rg-level-dynamic-pruning (6c554cc) to 6176a6d (merge-base) diff using: tpch
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-06-16T15:27:23Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4720420349-576-ttwzc 6.12.68+ #1 SMP Sat May 2 07:49:07 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/topk-rg-level-dynamic-pruning (6c554cc) to 6176a6d (merge-base) diff using: topk_tpch
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-06-16T15:42:03Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_topk-rg-level-dynamic-pruning
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃ feat_topk-rg-level-dynamic-pruning ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │ 38.42 / 39.13 ±0.90 / 40.89 ms │     38.44 / 39.37 ±1.08 / 41.46 ms │    no change │
│ QQuery 2  │ 18.84 / 19.28 ±0.27 / 19.67 ms │     19.07 / 19.45 ±0.26 / 19.83 ms │    no change │
│ QQuery 3  │ 30.82 / 32.90 ±1.49 / 34.37 ms │     30.68 / 33.39 ±1.42 / 34.76 ms │    no change │
│ QQuery 4  │ 17.50 / 17.95 ±0.57 / 19.01 ms │     17.49 / 17.70 ±0.14 / 17.86 ms │    no change │
│ QQuery 5  │ 39.15 / 41.18 ±1.16 / 42.75 ms │     37.54 / 39.56 ±1.50 / 41.02 ms │    no change │
│ QQuery 6  │ 16.12 / 16.30 ±0.13 / 16.45 ms │     16.38 / 16.46 ±0.11 / 16.66 ms │    no change │
│ QQuery 7  │ 45.53 / 47.60 ±1.41 / 49.70 ms │     44.30 / 46.14 ±1.17 / 47.63 ms │    no change │
│ QQuery 8  │ 42.84 / 43.13 ±0.25 / 43.47 ms │     42.96 / 43.20 ±0.17 / 43.39 ms │    no change │
│ QQuery 9  │ 49.38 / 50.71 ±0.95 / 52.27 ms │     50.71 / 50.92 ±0.14 / 51.13 ms │    no change │
│ QQuery 10 │ 42.45 / 43.35 ±1.02 / 45.23 ms │     42.39 / 42.62 ±0.18 / 42.94 ms │    no change │
│ QQuery 11 │ 13.17 / 13.39 ±0.20 / 13.73 ms │     13.76 / 13.96 ±0.19 / 14.26 ms │    no change │
│ QQuery 12 │ 23.76 / 24.07 ±0.23 / 24.42 ms │     24.47 / 24.76 ±0.22 / 25.09 ms │    no change │
│ QQuery 13 │ 33.41 / 35.29 ±1.64 / 37.97 ms │     32.79 / 34.69 ±1.93 / 37.80 ms │    no change │
│ QQuery 14 │ 23.79 / 24.03 ±0.14 / 24.16 ms │     23.87 / 24.31 ±0.56 / 25.41 ms │    no change │
│ QQuery 15 │ 31.07 / 32.68 ±1.79 / 35.51 ms │     31.50 / 32.40 ±0.58 / 33.16 ms │    no change │
│ QQuery 16 │ 14.19 / 14.38 ±0.15 / 14.64 ms │     14.12 / 14.36 ±0.22 / 14.63 ms │    no change │
│ QQuery 17 │ 74.50 / 75.89 ±2.03 / 79.85 ms │     73.90 / 75.00 ±0.70 / 75.90 ms │    no change │
│ QQuery 18 │ 59.18 / 60.45 ±0.93 / 61.68 ms │     59.64 / 60.31 ±0.47 / 60.96 ms │    no change │
│ QQuery 19 │ 33.18 / 34.49 ±1.37 / 36.49 ms │     33.33 / 34.35 ±0.83 / 35.69 ms │    no change │
│ QQuery 20 │ 31.98 / 32.21 ±0.24 / 32.51 ms │     31.99 / 32.64 ±0.64 / 33.58 ms │    no change │
│ QQuery 21 │ 56.72 / 57.56 ±0.87 / 58.82 ms │     57.08 / 57.89 ±0.71 / 58.95 ms │    no change │
│ QQuery 22 │ 14.12 / 14.33 ±0.17 / 14.51 ms │     14.22 / 15.16 ±1.09 / 17.23 ms │ 1.06x slower │
└───────────┴────────────────────────────────┴────────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                                 ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                                 │ 770.31ms │
│ Total Time (feat_topk-rg-level-dynamic-pruning)   │ 768.66ms │
│ Average Time (HEAD)                               │  35.01ms │
│ Average Time (feat_topk-rg-level-dynamic-pruning) │  34.94ms │
│ Queries Faster                                    │        0 │
│ Queries Slower                                    │        1 │
│ Queries with No Change                            │       21 │
│ Queries with Failure                              │        0 │
└───────────────────────────────────────────────────┴──────────┘

Resource Usage

tpch — base (merge-base)

Metric	Value
Wall time	5.0s
Peak memory	1.2 GiB
Avg memory	519.2 MiB
CPU user	22.4s
CPU sys	1.6s
Peak spill	0 B

tpch — branch

Metric	Value
Wall time	5.0s
Peak memory	1.2 GiB
Avg memory	534.6 MiB
CPU user	22.3s
CPU sys	1.7s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-06-16T15:42:47Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_topk-rg-level-dynamic-pruning
--------------------
Benchmark run_topk_tpch.json
--------------------
┏━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query ┃                           HEAD ┃ feat_topk-rg-level-dynamic-pruning ┃        Change ┃
┡━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ Q1    │    2.41 / 3.36 ±0.83 / 4.66 ms │        2.29 / 2.84 ±0.89 / 4.61 ms │ +1.19x faster │
│ Q2    │ 10.85 / 11.44 ±0.58 / 12.55 ms │        3.01 / 3.13 ±0.15 / 3.41 ms │ +3.66x faster │
│ Q3    │ 33.31 / 34.39 ±1.23 / 36.59 ms │     33.42 / 34.18 ±0.86 / 35.80 ms │     no change │
│ Q4    │ 12.30 / 12.73 ±0.64 / 14.01 ms │        3.44 / 3.52 ±0.07 / 3.65 ms │ +3.62x faster │
│ Q5    │  9.86 / 10.13 ±0.18 / 10.41 ms │     10.44 / 10.83 ±0.41 / 11.49 ms │  1.07x slower │
│ Q6    │ 17.28 / 17.62 ±0.31 / 18.05 ms │     17.88 / 18.48 ±0.95 / 20.37 ms │     no change │
│ Q7    │ 38.59 / 39.33 ±0.50 / 40.01 ms │     38.89 / 39.62 ±0.65 / 40.75 ms │     no change │
│ Q8    │ 28.88 / 30.41 ±1.41 / 32.76 ms │        7.18 / 7.27 ±0.11 / 7.48 ms │ +4.18x faster │
│ Q9    │ 36.03 / 36.53 ±0.46 / 37.18 ms │        8.88 / 8.95 ±0.06 / 9.06 ms │ +4.08x faster │
│ Q10   │ 56.65 / 57.04 ±0.34 / 57.65 ms │     13.56 / 14.30 ±1.11 / 16.50 ms │ +3.99x faster │
│ Q11   │    4.03 / 4.45 ±0.72 / 5.89 ms │        4.31 / 4.47 ±0.27 / 5.00 ms │     no change │
└───────┴────────────────────────────────┴────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                                 ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                                 │ 257.43ms │
│ Total Time (feat_topk-rg-level-dynamic-pruning)   │ 147.59ms │
│ Average Time (HEAD)                               │  23.40ms │
│ Average Time (feat_topk-rg-level-dynamic-pruning) │  13.42ms │
│ Queries Faster                                    │        6 │
│ Queries Slower                                    │        1 │
│ Queries with No Change                            │        4 │
│ Queries with Failure                              │        0 │
└───────────────────────────────────────────────────┴──────────┘

Resource Usage

topk_tpch — base (merge-base)

Metric	Value
Wall time	5.0s
Peak memory	587.4 MiB
Avg memory	97.9 MiB
CPU user	0.0s
CPU sys	0.0s
Peak spill	0 B

topk_tpch — branch

Metric	Value
Wall time	5.0s
Peak memory	236.0 KiB
Avg memory	39.3 KiB
CPU user	0.0s
CPU sys	0.0s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-06-16T15:43:32Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_topk-rg-level-dynamic-pruning
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃    feat_topk-rg-level-dynamic-pruning ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │           5.20 / 5.75 ±0.90 / 7.54 ms │           5.29 / 5.79 ±0.92 / 7.63 ms │     no change │
│ QQuery 2  │        80.50 / 80.64 ±0.12 / 80.78 ms │        81.17 / 81.49 ±0.22 / 81.76 ms │     no change │
│ QQuery 3  │        28.65 / 29.04 ±0.23 / 29.33 ms │        29.41 / 29.52 ±0.10 / 29.66 ms │     no change │
│ QQuery 4  │     510.46 / 518.33 ±6.04 / 527.26 ms │     491.25 / 499.11 ±5.52 / 508.09 ms │     no change │
│ QQuery 5  │        51.08 / 51.49 ±0.55 / 52.54 ms │        51.02 / 51.40 ±0.47 / 52.26 ms │     no change │
│ QQuery 6  │        36.28 / 36.70 ±0.31 / 37.21 ms │        36.11 / 36.77 ±0.47 / 37.38 ms │     no change │
│ QQuery 7  │        94.54 / 96.00 ±1.73 / 99.30 ms │        94.69 / 95.27 ±0.55 / 96.16 ms │     no change │
│ QQuery 8  │        37.07 / 37.58 ±0.41 / 38.20 ms │        36.66 / 38.80 ±3.00 / 44.71 ms │     no change │
│ QQuery 9  │        53.25 / 54.35 ±0.86 / 55.47 ms │        51.43 / 54.57 ±1.71 / 56.42 ms │     no change │
│ QQuery 10 │        62.30 / 62.62 ±0.30 / 63.10 ms │        62.82 / 63.23 ±0.25 / 63.56 ms │     no change │
│ QQuery 11 │     321.25 / 322.41 ±1.01 / 324.14 ms │     303.67 / 308.99 ±6.68 / 321.49 ms │     no change │
│ QQuery 12 │        28.47 / 28.88 ±0.28 / 29.20 ms │        28.14 / 28.65 ±0.41 / 29.32 ms │     no change │
│ QQuery 13 │     119.32 / 120.38 ±1.66 / 123.67 ms │     118.50 / 120.00 ±1.32 / 122.43 ms │     no change │
│ QQuery 14 │     414.43 / 416.17 ±1.87 / 419.68 ms │     408.04 / 412.86 ±4.28 / 419.65 ms │     no change │
│ QQuery 15 │        60.13 / 60.67 ±0.57 / 61.70 ms │        59.34 / 59.96 ±0.36 / 60.42 ms │     no change │
│ QQuery 16 │           6.26 / 6.44 ±0.20 / 6.83 ms │           6.37 / 6.53 ±0.24 / 7.00 ms │     no change │
│ QQuery 17 │        80.66 / 82.60 ±1.85 / 85.10 ms │        79.50 / 80.19 ±0.70 / 81.53 ms │     no change │
│ QQuery 18 │     123.01 / 125.07 ±1.51 / 127.53 ms │     123.58 / 125.14 ±1.46 / 127.15 ms │     no change │
│ QQuery 19 │        41.84 / 42.17 ±0.31 / 42.75 ms │        41.37 / 41.54 ±0.13 / 41.68 ms │     no change │
│ QQuery 20 │        35.64 / 37.34 ±1.36 / 39.41 ms │        35.49 / 36.58 ±0.81 / 38.01 ms │     no change │
│ QQuery 21 │        17.64 / 17.83 ±0.24 / 18.30 ms │        17.32 / 17.52 ±0.17 / 17.76 ms │     no change │
│ QQuery 22 │        63.63 / 64.23 ±0.46 / 65.05 ms │        61.57 / 63.18 ±1.02 / 64.72 ms │     no change │
│ QQuery 23 │     349.68 / 355.04 ±3.52 / 360.70 ms │     342.40 / 344.53 ±1.52 / 347.07 ms │     no change │
│ QQuery 24 │     224.22 / 228.17 ±5.94 / 239.96 ms │     222.95 / 225.64 ±1.84 / 227.53 ms │     no change │
│ QQuery 25 │     110.46 / 111.53 ±0.75 / 112.48 ms │     109.32 / 110.54 ±0.66 / 111.21 ms │     no change │
│ QQuery 26 │        57.89 / 59.31 ±1.42 / 61.95 ms │        58.20 / 58.97 ±0.45 / 59.52 ms │     no change │
│ QQuery 27 │           6.17 / 6.34 ±0.14 / 6.59 ms │           5.95 / 6.11 ±0.19 / 6.48 ms │     no change │
│ QQuery 28 │        56.65 / 60.54 ±1.98 / 62.09 ms │        60.74 / 61.00 ±0.30 / 61.52 ms │     no change │
│ QQuery 29 │       97.88 / 99.57 ±1.45 / 101.59 ms │       96.88 / 98.33 ±1.53 / 101.24 ms │     no change │
│ QQuery 30 │        32.29 / 32.62 ±0.27 / 33.10 ms │        31.85 / 33.05 ±1.06 / 34.99 ms │     no change │
│ QQuery 31 │     110.58 / 111.39 ±0.49 / 112.03 ms │     111.86 / 112.19 ±0.39 / 112.93 ms │     no change │
│ QQuery 32 │        20.34 / 20.58 ±0.23 / 20.98 ms │        20.66 / 20.93 ±0.19 / 21.20 ms │     no change │
│ QQuery 33 │        37.53 / 39.51 ±2.24 / 43.75 ms │        38.25 / 38.34 ±0.10 / 38.51 ms │     no change │
│ QQuery 34 │          9.67 / 9.99 ±0.33 / 10.57 ms │          9.70 / 9.91 ±0.18 / 10.21 ms │     no change │
│ QQuery 35 │        71.94 / 72.76 ±0.52 / 73.42 ms │        72.23 / 74.45 ±2.44 / 79.12 ms │     no change │
│ QQuery 36 │           5.55 / 5.69 ±0.19 / 6.07 ms │           5.54 / 5.66 ±0.20 / 6.06 ms │     no change │
│ QQuery 37 │           6.86 / 6.89 ±0.04 / 6.96 ms │           6.83 / 6.93 ±0.09 / 7.10 ms │     no change │
│ QQuery 38 │        64.04 / 64.29 ±0.19 / 64.56 ms │        63.05 / 63.68 ±0.51 / 64.57 ms │     no change │
│ QQuery 39 │     464.70 / 471.86 ±6.59 / 483.19 ms │     459.21 / 466.78 ±5.03 / 472.18 ms │     no change │
│ QQuery 40 │        23.50 / 23.81 ±0.28 / 24.28 ms │        23.19 / 23.35 ±0.12 / 23.56 ms │     no change │
│ QQuery 41 │        11.64 / 11.86 ±0.22 / 12.23 ms │        11.29 / 12.57 ±2.26 / 17.09 ms │  1.06x slower │
│ QQuery 42 │        23.98 / 24.15 ±0.10 / 24.30 ms │        24.14 / 24.24 ±0.07 / 24.36 ms │     no change │
│ QQuery 43 │           4.70 / 4.84 ±0.22 / 5.29 ms │           4.55 / 4.67 ±0.14 / 4.93 ms │     no change │
│ QQuery 44 │          9.11 / 9.55 ±0.59 / 10.72 ms │           8.87 / 9.00 ±0.09 / 9.12 ms │ +1.06x faster │
│ QQuery 45 │        39.68 / 40.40 ±0.42 / 40.90 ms │        38.72 / 39.45 ±0.57 / 40.24 ms │     no change │
│ QQuery 46 │        11.53 / 11.64 ±0.08 / 11.78 ms │        11.26 / 11.40 ±0.15 / 11.65 ms │     no change │
│ QQuery 47 │     235.71 / 238.58 ±1.93 / 240.61 ms │    226.55 / 236.79 ±10.58 / 256.19 ms │     no change │
│ QQuery 48 │        96.37 / 97.21 ±0.57 / 97.84 ms │       96.65 / 99.54 ±2.87 / 103.15 ms │     no change │
│ QQuery 49 │        75.82 / 76.33 ±0.42 / 76.89 ms │        75.44 / 76.15 ±0.61 / 77.27 ms │     no change │
│ QQuery 50 │        59.53 / 62.59 ±4.00 / 70.12 ms │        58.91 / 60.56 ±2.22 / 64.92 ms │     no change │
│ QQuery 51 │        91.42 / 92.98 ±2.03 / 96.93 ms │        92.51 / 93.59 ±0.82 / 94.92 ms │     no change │
│ QQuery 52 │        24.32 / 24.46 ±0.15 / 24.71 ms │        23.80 / 24.14 ±0.23 / 24.43 ms │     no change │
│ QQuery 53 │        30.01 / 31.82 ±3.20 / 38.21 ms │        29.15 / 29.54 ±0.22 / 29.78 ms │ +1.08x faster │
│ QQuery 54 │        55.34 / 56.04 ±0.38 / 56.52 ms │        54.43 / 55.79 ±1.74 / 59.17 ms │     no change │
│ QQuery 55 │        23.95 / 24.24 ±0.23 / 24.63 ms │        23.27 / 23.95 ±0.45 / 24.34 ms │     no change │
│ QQuery 56 │        38.97 / 39.37 ±0.43 / 40.12 ms │        38.92 / 40.07 ±1.63 / 43.20 ms │     no change │
│ QQuery 57 │     178.34 / 181.33 ±4.09 / 189.42 ms │     176.09 / 177.94 ±1.96 / 181.45 ms │     no change │
│ QQuery 58 │     115.33 / 117.99 ±1.95 / 120.37 ms │     115.39 / 116.79 ±0.86 / 117.88 ms │     no change │
│ QQuery 59 │     116.64 / 118.12 ±0.85 / 119.10 ms │     118.62 / 119.89 ±1.39 / 122.12 ms │     no change │
│ QQuery 60 │        39.41 / 39.97 ±0.40 / 40.64 ms │        38.85 / 39.41 ±0.36 / 39.78 ms │     no change │
│ QQuery 61 │        11.93 / 12.01 ±0.10 / 12.20 ms │        11.60 / 11.68 ±0.10 / 11.88 ms │     no change │
│ QQuery 62 │        47.45 / 48.96 ±1.63 / 51.53 ms │        46.89 / 47.91 ±1.73 / 51.36 ms │     no change │
│ QQuery 63 │        29.93 / 30.45 ±0.32 / 30.92 ms │        29.39 / 29.94 ±0.43 / 30.65 ms │     no change │
│ QQuery 64 │     413.11 / 417.57 ±3.30 / 423.14 ms │     408.86 / 413.58 ±3.72 / 419.15 ms │     no change │
│ QQuery 65 │     141.49 / 147.62 ±4.72 / 153.76 ms │     139.52 / 142.60 ±1.93 / 145.01 ms │     no change │
│ QQuery 66 │        79.48 / 80.56 ±0.73 / 81.36 ms │        79.52 / 82.86 ±3.40 / 88.62 ms │     no change │
│ QQuery 67 │     252.54 / 257.91 ±3.55 / 262.95 ms │     244.19 / 250.83 ±4.20 / 255.50 ms │     no change │
│ QQuery 68 │        11.77 / 11.82 ±0.04 / 11.88 ms │        11.40 / 11.61 ±0.15 / 11.85 ms │     no change │
│ QQuery 69 │        56.93 / 58.94 ±1.86 / 61.54 ms │        57.55 / 57.70 ±0.14 / 57.92 ms │     no change │
│ QQuery 70 │     105.08 / 109.89 ±4.49 / 118.09 ms │     105.83 / 107.53 ±1.31 / 109.88 ms │     no change │
│ QQuery 71 │        35.07 / 35.54 ±0.30 / 35.93 ms │        34.97 / 35.24 ±0.22 / 35.49 ms │     no change │
│ QQuery 72 │ 2053.44 / 2199.73 ±76.94 / 2264.84 ms │ 2112.29 / 2188.38 ±54.66 / 2269.41 ms │     no change │
│ QQuery 73 │          9.45 / 9.70 ±0.22 / 10.10 ms │         9.37 / 13.55 ±7.94 / 29.42 ms │  1.40x slower │
│ QQuery 74 │     179.14 / 181.60 ±1.71 / 183.97 ms │     172.77 / 175.37 ±1.46 / 176.75 ms │     no change │
│ QQuery 75 │     148.57 / 152.25 ±4.35 / 159.84 ms │     149.12 / 152.68 ±2.77 / 157.29 ms │     no change │
│ QQuery 76 │        34.95 / 35.71 ±0.55 / 36.52 ms │        35.94 / 36.54 ±0.49 / 37.14 ms │     no change │
│ QQuery 77 │        60.14 / 63.69 ±5.45 / 74.44 ms │        61.91 / 63.22 ±1.59 / 66.26 ms │     no change │
│ QQuery 78 │     186.97 / 189.36 ±2.65 / 194.33 ms │    185.83 / 195.81 ±10.71 / 216.59 ms │     no change │
│ QQuery 79 │        66.66 / 67.53 ±1.09 / 69.67 ms │        66.40 / 66.85 ±0.30 / 67.33 ms │     no change │
│ QQuery 80 │       99.30 / 99.99 ±0.70 / 100.95 ms │      98.23 / 100.00 ±2.03 / 103.68 ms │     no change │
│ QQuery 81 │        25.59 / 27.32 ±2.41 / 32.04 ms │        25.36 / 25.52 ±0.12 / 25.64 ms │ +1.07x faster │
│ QQuery 82 │        16.90 / 18.02 ±1.69 / 21.34 ms │        16.14 / 18.34 ±3.75 / 25.83 ms │     no change │
│ QQuery 83 │        40.11 / 41.85 ±2.71 / 47.21 ms │        39.91 / 40.59 ±0.49 / 41.40 ms │     no change │
│ QQuery 84 │        30.71 / 30.97 ±0.22 / 31.28 ms │        31.12 / 31.93 ±1.46 / 34.85 ms │     no change │
│ QQuery 85 │     106.86 / 108.55 ±1.98 / 112.29 ms │     107.55 / 109.43 ±2.23 / 113.79 ms │     no change │
│ QQuery 86 │        25.65 / 27.01 ±1.33 / 29.53 ms │        25.60 / 27.19 ±2.93 / 33.06 ms │     no change │
│ QQuery 87 │        64.38 / 65.29 ±0.81 / 66.59 ms │        64.42 / 65.12 ±0.52 / 65.86 ms │     no change │
│ QQuery 88 │        61.01 / 61.95 ±0.54 / 62.59 ms │        62.13 / 63.32 ±1.58 / 66.30 ms │     no change │
│ QQuery 89 │        35.37 / 35.78 ±0.32 / 36.29 ms │        35.71 / 36.32 ±0.48 / 37.10 ms │     no change │
│ QQuery 90 │        16.68 / 18.78 ±2.88 / 24.23 ms │        16.57 / 16.76 ±0.14 / 16.95 ms │ +1.12x faster │
│ QQuery 91 │        46.48 / 47.19 ±0.72 / 48.11 ms │        46.23 / 46.38 ±0.16 / 46.63 ms │     no change │
│ QQuery 92 │        29.81 / 30.63 ±0.59 / 31.58 ms │        29.73 / 31.21 ±1.67 / 34.10 ms │     no change │
│ QQuery 93 │        50.40 / 51.14 ±0.74 / 52.46 ms │        50.72 / 51.55 ±0.81 / 53.09 ms │     no change │
│ QQuery 94 │        38.08 / 38.61 ±0.44 / 39.33 ms │        38.54 / 38.90 ±0.24 / 39.26 ms │     no change │
│ QQuery 95 │        81.75 / 84.69 ±2.81 / 90.06 ms │        81.05 / 81.92 ±0.57 / 82.78 ms │     no change │
│ QQuery 96 │        24.08 / 24.23 ±0.12 / 24.41 ms │        24.36 / 25.22 ±1.27 / 27.73 ms │     no change │
│ QQuery 97 │        46.30 / 46.69 ±0.29 / 47.05 ms │        46.75 / 47.56 ±0.57 / 48.42 ms │     no change │
│ QQuery 98 │        42.55 / 43.10 ±0.47 / 43.98 ms │        42.84 / 43.52 ±0.45 / 44.15 ms │     no change │
│ QQuery 99 │        71.84 / 73.27 ±2.43 / 78.11 ms │        71.63 / 71.89 ±0.29 / 72.41 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                 ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                 │ 10465.97ms │
│ Total Time (feat_topk-rg-level-dynamic-pruning)   │ 10369.53ms │
│ Average Time (HEAD)                               │   105.72ms │
│ Average Time (feat_topk-rg-level-dynamic-pruning) │   104.74ms │
│ Queries Faster                                    │          4 │
│ Queries Slower                                    │          2 │
│ Queries with No Change                            │         93 │
│ Queries with Failure                              │          0 │
└───────────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric	Value
Wall time	55.0s
Peak memory	2.2 GiB
Avg memory	1.6 GiB
CPU user	231.8s
CPU sys	5.9s
Peak spill	0 B

tpcds — branch

Metric	Value
Wall time	55.0s
Peak memory	2.4 GiB
Avg memory	1.7 GiB
CPU user	237.4s
CPU sys	5.9s
Peak spill	0 B

File an issue against this benchmark runner

adriangbot · 2026-06-16T15:47:26Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_topk-rg-level-dynamic-pruning
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃    feat_topk-rg-level-dynamic-pruning ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.33 / 3.70 ±4.54 / 12.78 ms │          1.28 / 3.56 ±4.40 / 12.36 ms │     no change │
│ QQuery 1  │        13.75 / 13.91 ±0.10 / 14.02 ms │        12.78 / 13.40 ±0.38 / 13.72 ms │     no change │
│ QQuery 2  │        37.51 / 38.04 ±0.42 / 38.56 ms │        35.68 / 36.55 ±0.57 / 37.46 ms │     no change │
│ QQuery 3  │        31.72 / 32.78 ±1.76 / 36.27 ms │        30.00 / 30.94 ±0.80 / 32.13 ms │ +1.06x faster │
│ QQuery 4  │     241.06 / 244.12 ±1.74 / 246.50 ms │     225.84 / 236.90 ±8.58 / 246.61 ms │     no change │
│ QQuery 5  │     273.65 / 279.66 ±3.39 / 283.13 ms │    267.85 / 286.82 ±18.06 / 318.94 ms │     no change │
│ QQuery 6  │           1.23 / 1.38 ±0.21 / 1.80 ms │           1.25 / 1.39 ±0.21 / 1.81 ms │     no change │
│ QQuery 7  │        14.32 / 14.70 ±0.53 / 15.75 ms │        13.83 / 13.89 ±0.07 / 14.01 ms │ +1.06x faster │
│ QQuery 8  │    324.30 / 332.09 ±10.62 / 352.92 ms │    325.75 / 352.61 ±13.59 / 363.18 ms │  1.06x slower │
│ QQuery 9  │    465.92 / 487.53 ±20.05 / 519.15 ms │    454.61 / 483.60 ±29.96 / 524.30 ms │     no change │
│ QQuery 10 │        70.44 / 72.47 ±1.28 / 74.25 ms │        72.90 / 74.84 ±1.49 / 76.68 ms │     no change │
│ QQuery 11 │        81.58 / 83.53 ±2.22 / 87.66 ms │        84.45 / 85.37 ±0.98 / 86.98 ms │     no change │
│ QQuery 12 │     268.23 / 272.30 ±3.69 / 276.60 ms │    265.51 / 288.01 ±14.04 / 309.67 ms │  1.06x slower │
│ QQuery 13 │    365.42 / 376.31 ±11.17 / 397.11 ms │    401.82 / 418.72 ±13.96 / 439.14 ms │  1.11x slower │
│ QQuery 14 │     279.33 / 286.47 ±5.06 / 294.69 ms │     300.76 / 308.04 ±3.99 / 312.63 ms │  1.08x slower │
│ QQuery 15 │     267.01 / 276.92 ±5.14 / 280.54 ms │    269.98 / 301.21 ±26.06 / 346.03 ms │  1.09x slower │
│ QQuery 16 │    603.47 / 624.95 ±18.27 / 647.49 ms │    612.58 / 626.17 ±10.11 / 638.37 ms │     no change │
│ QQuery 17 │    606.67 / 621.75 ±11.32 / 636.40 ms │    613.64 / 655.11 ±23.80 / 682.19 ms │  1.05x slower │
│ QQuery 18 │ 1247.45 / 1284.49 ±29.95 / 1323.16 ms │  1256.16 / 1270.53 ±9.45 / 1283.13 ms │     no change │
│ QQuery 19 │        29.27 / 31.94 ±4.96 / 41.86 ms │       27.68 / 35.43 ±14.77 / 64.96 ms │  1.11x slower │
│ QQuery 20 │    513.36 / 524.36 ±11.38 / 546.24 ms │    514.05 / 535.86 ±20.31 / 572.13 ms │     no change │
│ QQuery 21 │     517.66 / 526.40 ±7.28 / 536.32 ms │     521.77 / 526.62 ±3.85 / 532.36 ms │     no change │
│ QQuery 22 │  989.14 / 1007.10 ±11.37 / 1018.44 ms │     984.43 / 988.81 ±3.85 / 994.52 ms │     no change │
│ QQuery 23 │ 3213.26 / 3289.95 ±40.02 / 3331.09 ms │ 3098.84 / 3131.07 ±20.45 / 3152.50 ms │     no change │
│ QQuery 24 │        41.97 / 47.71 ±5.80 / 56.58 ms │                                  FAIL │  incomparable │
│ QQuery 25 │     115.07 / 116.35 ±1.16 / 118.51 ms │     112.55 / 114.07 ±1.40 / 116.54 ms │     no change │
│ QQuery 26 │        42.01 / 42.91 ±1.24 / 45.33 ms │                                  FAIL │  incomparable │
│ QQuery 27 │     684.49 / 693.97 ±5.03 / 698.53 ms │     669.41 / 671.56 ±1.91 / 673.95 ms │     no change │
│ QQuery 28 │ 3036.81 / 3068.89 ±27.87 / 3113.73 ms │ 3075.02 / 3108.78 ±27.19 / 3138.43 ms │     no change │
│ QQuery 29 │        40.66 / 46.38 ±7.02 / 57.30 ms │        41.87 / 46.60 ±8.34 / 63.25 ms │     no change │
│ QQuery 30 │    296.32 / 312.81 ±12.86 / 334.11 ms │     316.66 / 319.79 ±2.74 / 324.33 ms │     no change │
│ QQuery 31 │    293.57 / 303.07 ±11.67 / 325.90 ms │    289.03 / 309.80 ±18.32 / 342.78 ms │     no change │
│ QQuery 32 │   942.96 / 973.13 ±22.34 / 1005.06 ms │  988.74 / 1012.95 ±30.19 / 1070.29 ms │     no change │
│ QQuery 33 │ 1447.84 / 1517.78 ±53.37 / 1607.93 ms │ 1447.38 / 1482.14 ±23.25 / 1519.67 ms │     no change │
│ QQuery 34 │ 1449.56 / 1505.81 ±44.82 / 1563.49 ms │ 1484.86 / 1515.54 ±25.77 / 1558.20 ms │     no change │
│ QQuery 35 │    284.40 / 308.70 ±29.39 / 366.58 ms │    272.74 / 319.50 ±67.78 / 454.36 ms │     no change │
│ QQuery 36 │        63.67 / 69.67 ±4.44 / 76.37 ms │        67.37 / 68.81 ±1.36 / 71.10 ms │     no change │
│ QQuery 37 │        35.60 / 38.95 ±4.00 / 44.38 ms │        35.31 / 38.11 ±2.61 / 41.39 ms │     no change │
│ QQuery 38 │        39.04 / 42.76 ±2.17 / 45.04 ms │        42.20 / 48.14 ±5.80 / 57.95 ms │  1.13x slower │
│ QQuery 39 │     135.77 / 147.49 ±8.64 / 160.59 ms │     140.86 / 154.58 ±8.16 / 161.78 ms │     no change │
│ QQuery 40 │        14.14 / 14.28 ±0.11 / 14.44 ms │        14.80 / 16.70 ±2.75 / 22.15 ms │  1.17x slower │
│ QQuery 41 │        13.55 / 15.21 ±2.85 / 20.90 ms │        14.68 / 15.76 ±1.91 / 19.58 ms │     no change │
│ QQuery 42 │        13.47 / 18.19 ±4.91 / 24.85 ms │        14.00 / 14.21 ±0.20 / 14.57 ms │ +1.28x faster │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                 ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                 │ 19920.30ms │
│ Total Time (feat_topk-rg-level-dynamic-pruning)   │ 19962.48ms │
│ Average Time (HEAD)                               │   485.86ms │
│ Average Time (feat_topk-rg-level-dynamic-pruning) │   486.89ms │
│ Queries Faster                                    │          3 │
│ Queries Slower                                    │          9 │
│ Queries with No Change                            │         29 │
│ Queries with Failure                              │          2 │
└───────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	105.0s
Peak memory	11.8 GiB
Avg memory	4.3 GiB
CPU user	1024.4s
CPU sys	70.4s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	105.0s
Peak memory	9.9 GiB
Avg memory	4.2 GiB
CPU user	1027.1s
CPU sys	71.2s
Peak spill	0 B

File an issue against this benchmark runner

adriangb · 2026-06-16T15:48:56Z

@zhuqi-lucas these failures in clickbench are worth looking at:

│ QQuery 24 │ 41.97 / 47.71 ±5.80 / 56.58 ms │ FAIL │ incomparable │

ClickBench Q24 / Q26 (`SELECT … WHERE x <> '' ORDER BY ts LIMIT 10`) were failing with: Parquet error: into_builder called mid-row-group; check is_at_row_group_boundary() first `transition()` in `push_decoder.rs` re-enters Step 2 on every loop iteration — including the iterations where Step 3 returned `NeedsData` and we pushed byte ranges but the decoder has not yet handed back a reader. At those moments the decoder sits in `ReadingRowGroup` state but `is_at_row_group_boundary()` is `false`, and arrow-rs's `into_builder` errors out (it can only rebuild at a clean RG boundary). Step 2's prune-and-rebuild block now skips when the decoder is mid-RG. Step 3 still drives the decoder forward in that iteration, and the next boundary re-enters Step 2 with the pruner in the same state (the pruner is stateful but idempotent — re-evaluating costs one cached `pp.prune` per RG, no rebuild). No semantic change for queries that were already passing. Add `dynamic_rg_pruner_does_not_call_into_builder_mid_row_group`: a 20-RG × 50-row file plus `ORDER BY v ASC LIMIT 10` gives the pruner multiple boundaries to attempt rebuilds and reliably trips the failure mode on the pre-fix code.

zhuqi-lucas · 2026-06-17T03:07:36Z

@zhuqi-lucas these failures in clickbench are worth looking at:

│ QQuery 24 │ 41.97 / 47.71 ±5.80 / 56.58 ms │ FAIL │ incomparable │

Thanks @adriangb for good catch, and fixed in latest PR, let me rerun it to see.

zhuqi-lucas · 2026-06-17T03:08:07Z

run benchmark clickbench_partitioned

adriangbot · 2026-06-17T03:11:20Z

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4725518286-578-2sdzh 6.12.68+ #1 SMP Sat May 2 07:49:07 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing feat/topk-rg-level-dynamic-pruning (608591f) to 96a6096 (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete

File an issue against this benchmark runner

adriangbot · 2026-06-17T03:31:18Z

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)

Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Details

Comparing HEAD and feat_topk-rg-level-dynamic-pruning
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃    feat_topk-rg-level-dynamic-pruning ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.18 / 3.33 ±4.22 / 11.77 ms │          1.21 / 3.40 ±4.30 / 11.99 ms │     no change │
│ QQuery 1  │        12.67 / 13.12 ±0.24 / 13.37 ms │        12.52 / 12.85 ±0.18 / 13.03 ms │     no change │
│ QQuery 2  │        35.85 / 36.12 ±0.21 / 36.40 ms │        35.92 / 36.17 ±0.30 / 36.75 ms │     no change │
│ QQuery 3  │        30.08 / 30.90 ±0.88 / 32.23 ms │        30.30 / 30.88 ±0.82 / 32.47 ms │     no change │
│ QQuery 4  │     218.97 / 225.05 ±3.60 / 229.28 ms │     221.60 / 225.46 ±2.31 / 228.78 ms │     no change │
│ QQuery 5  │     268.66 / 271.68 ±1.65 / 273.05 ms │     267.63 / 272.06 ±3.29 / 277.71 ms │     no change │
│ QQuery 6  │           1.23 / 1.38 ±0.23 / 1.82 ms │           1.25 / 1.39 ±0.22 / 1.82 ms │     no change │
│ QQuery 7  │        14.37 / 14.50 ±0.10 / 14.67 ms │        13.62 / 13.70 ±0.09 / 13.86 ms │ +1.06x faster │
│ QQuery 8  │     321.81 / 324.56 ±1.98 / 327.43 ms │     321.72 / 324.38 ±1.47 / 326.02 ms │     no change │
│ QQuery 9  │     453.20 / 461.77 ±7.89 / 472.07 ms │     448.10 / 455.97 ±7.21 / 468.60 ms │     no change │
│ QQuery 10 │        70.01 / 70.90 ±0.93 / 72.61 ms │        68.93 / 70.08 ±1.11 / 71.54 ms │     no change │
│ QQuery 11 │        80.57 / 82.01 ±0.92 / 83.35 ms │        80.14 / 81.12 ±0.80 / 82.00 ms │     no change │
│ QQuery 12 │     266.64 / 270.40 ±3.37 / 275.31 ms │     264.59 / 269.35 ±3.59 / 273.81 ms │     no change │
│ QQuery 13 │     359.72 / 369.04 ±5.60 / 375.09 ms │    358.32 / 374.82 ±17.38 / 407.83 ms │     no change │
│ QQuery 14 │     283.76 / 288.39 ±4.08 / 295.51 ms │     281.21 / 286.81 ±5.03 / 296.20 ms │     no change │
│ QQuery 15 │     272.89 / 282.02 ±6.45 / 291.68 ms │     266.06 / 271.06 ±5.72 / 281.87 ms │     no change │
│ QQuery 16 │    612.87 / 622.91 ±14.15 / 650.98 ms │     605.35 / 618.32 ±9.43 / 632.67 ms │     no change │
│ QQuery 17 │     621.61 / 629.35 ±4.35 / 633.73 ms │    618.49 / 628.15 ±15.98 / 660.04 ms │     no change │
│ QQuery 18 │ 1255.37 / 1275.37 ±15.36 / 1301.12 ms │ 1240.33 / 1276.90 ±29.39 / 1329.33 ms │     no change │
│ QQuery 19 │        28.11 / 28.33 ±0.23 / 28.73 ms │        27.62 / 28.95 ±2.09 / 33.13 ms │     no change │
│ QQuery 20 │    515.88 / 528.41 ±12.31 / 545.66 ms │     518.99 / 528.90 ±6.37 / 537.95 ms │     no change │
│ QQuery 21 │     516.73 / 521.59 ±6.33 / 533.89 ms │     524.11 / 535.73 ±7.26 / 544.04 ms │     no change │
│ QQuery 22 │    982.07 / 991.12 ±7.19 / 1003.13 ms │   994.90 / 1008.56 ±8.86 / 1022.40 ms │     no change │
│ QQuery 23 │ 3176.23 / 3202.05 ±19.86 / 3227.01 ms │ 3092.23 / 3112.29 ±18.68 / 3136.49 ms │     no change │
│ QQuery 24 │        40.89 / 44.26 ±6.13 / 56.50 ms │        40.77 / 42.13 ±1.73 / 45.49 ms │     no change │
│ QQuery 25 │     111.18 / 117.20 ±7.00 / 129.61 ms │     110.28 / 111.22 ±0.96 / 112.80 ms │ +1.05x faster │
│ QQuery 26 │        41.35 / 42.48 ±1.06 / 44.19 ms │        41.21 / 45.54 ±4.17 / 52.60 ms │  1.07x slower │
│ QQuery 27 │     669.76 / 674.83 ±3.45 / 678.92 ms │     672.23 / 681.12 ±8.22 / 695.36 ms │     no change │
│ QQuery 28 │  3024.02 / 3040.31 ±8.74 / 3048.95 ms │ 3002.37 / 3040.75 ±24.78 / 3079.61 ms │     no change │
│ QQuery 29 │       40.42 / 54.39 ±16.76 / 77.40 ms │        40.33 / 41.10 ±0.53 / 41.85 ms │ +1.32x faster │
│ QQuery 30 │     301.31 / 306.47 ±4.27 / 311.72 ms │     293.22 / 306.55 ±7.58 / 313.20 ms │     no change │
│ QQuery 31 │     290.61 / 295.37 ±5.06 / 305.04 ms │     278.15 / 287.97 ±7.57 / 300.62 ms │     no change │
│ QQuery 32 │   954.35 / 981.75 ±23.39 / 1022.63 ms │    937.50 / 948.73 ±12.92 / 971.62 ms │     no change │
│ QQuery 33 │ 1440.34 / 1478.84 ±32.21 / 1523.17 ms │ 1447.23 / 1473.63 ±21.57 / 1512.90 ms │     no change │
│ QQuery 34 │ 1473.82 / 1512.34 ±33.42 / 1552.74 ms │ 1452.38 / 1489.44 ±21.21 / 1516.71 ms │     no change │
│ QQuery 35 │    278.84 / 316.74 ±42.33 / 369.59 ms │    274.24 / 306.81 ±58.80 / 424.28 ms │     no change │
│ QQuery 36 │        65.74 / 69.17 ±4.10 / 76.45 ms │        70.18 / 74.76 ±4.07 / 81.35 ms │  1.08x slower │
│ QQuery 37 │        35.34 / 36.07 ±0.71 / 37.42 ms │        35.61 / 38.28 ±4.18 / 46.61 ms │  1.06x slower │
│ QQuery 38 │        45.16 / 49.76 ±2.74 / 52.86 ms │        39.83 / 43.91 ±4.31 / 52.25 ms │ +1.13x faster │
│ QQuery 39 │     145.60 / 150.23 ±4.59 / 158.26 ms │     144.34 / 148.48 ±2.85 / 151.93 ms │     no change │
│ QQuery 40 │        14.20 / 15.18 ±1.22 / 17.39 ms │        13.63 / 14.33 ±0.61 / 15.44 ms │ +1.06x faster │
│ QQuery 41 │        13.59 / 16.36 ±4.27 / 24.85 ms │        13.42 / 14.84 ±2.51 / 19.85 ms │ +1.10x faster │
│ QQuery 42 │        12.93 / 16.79 ±6.87 / 30.50 ms │        12.69 / 12.95 ±0.17 / 13.20 ms │ +1.30x faster │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                 ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                 │ 19762.87ms │
│ Total Time (feat_topk-rg-level-dynamic-pruning)   │ 19589.85ms │
│ Average Time (HEAD)                               │   459.60ms │
│ Average Time (feat_topk-rg-level-dynamic-pruning) │   455.58ms │
│ Queries Faster                                    │          7 │
│ Queries Slower                                    │          3 │
│ Queries with No Change                            │         33 │
│ Queries with Failure                              │          0 │
└───────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric	Value
Wall time	100.0s
Peak memory	9.5 GiB
Avg memory	4.1 GiB
CPU user	1008.9s
CPU sys	71.9s
Peak spill	0 B

clickbench_partitioned — branch

Metric	Value
Wall time	100.0s
Peak memory	11.8 GiB
Avg memory	4.6 GiB
CPU user	1004.5s
CPU sys	71.2s
Peak spill	0 B

File an issue against this benchmark runner

zhuqi-lucas · 2026-06-17T03:36:03Z

@adriangb It works well now for the latest run.

adriangb

Thanks @zhuqi-lucas, I think this is ready!

zhuqi-lucas · 2026-06-21T12:41:00Z

Thanks @zhuqi-lucas, I think this is ready!

Thank you @adriangb for review, added follow-up issue for full-matched improvement

#23067

github-actions Bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) datasource Changes to the datasource crate labels May 22, 2026

zhuqi-lucas changed the title ~~feat(parquet): apply TopK threshold to row-group statistics mid-scan~~ feat(parquet): runtime row-group early stop via TopK dynamic filter May 22, 2026

zhuqi-lucas marked this pull request as ready for review May 22, 2026 05:46

Copilot AI review requested due to automatic review settings May 22, 2026 05:46

Copilot started reviewing on behalf of zhuqi-lucas May 22, 2026 05:46 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread datafusion/datasource-parquet/src/opener/mod.rs Outdated

Comment thread datafusion/datasource-parquet/src/push_decoder.rs

zhuqi-lucas requested review from adriangb and alamb May 22, 2026 06:24

zhuqi-lucas mentioned this pull request Jun 16, 2026

ParquetPushDecoder: expose the next row-group index that try_next_reader will yield apache/arrow-rs#10148

Open

zhuqi-lucas and others added 2 commits June 17, 2026 11:06

Merge branch 'main' into feat/topk-rg-level-dynamic-pruning

608591f

This was referenced Jun 18, 2026

feat(parquet): add ParquetPushDecoder::peek_next_row_group() apache/arrow-rs#10158

Open

[EPIC] Sort Pushdown: skip sorts and skip IO for ORDER BY / TopK queries #23036

Open

[DISCUSSION] 2026 Q3-Q4 Roadmap Discussion #22882

Open

adriangb approved these changes Jun 21, 2026

View reviewed changes

adriangb changed the title ~~feat(parquet): runtime row-group early stop via TopK dynamic filter~~ feat(parquet): intra-file early stopping via statistics + dynamic filters Jun 21, 2026

zhuqi-lucas mentioned this pull request Jun 21, 2026

[Follow-up #22450] Per-RG fully_matched RowFilter skip · needs arrow-rs#10158 (peek_next_row_group) #23067

Open

8 tasks

Conversation

zhuqi-lucas commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Observability

Benchmarks (benchmarks/sort_pushdown_inexact, 5 iterations)

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Dandandan commented May 22, 2026

Uh oh!

adriangbot commented May 22, 2026

Uh oh!

adriangbot commented May 22, 2026

Uh oh!

adriangbot commented May 22, 2026

Uh oh!

adriangbot commented May 22, 2026

Uh oh!

zhuqi-lucas commented May 22, 2026

Uh oh!

adriangbot commented May 22, 2026

Uh oh!

adriangbot commented May 22, 2026

Uh oh!

adriangbot commented May 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

adriangbot commented May 22, 2026

Uh oh!

zhuqi-lucas commented May 22, 2026

Uh oh!

adriangbot commented May 22, 2026

Uh oh!

adriangbot commented May 22, 2026

Uh oh!

zhuqi-lucas commented May 22, 2026

Uh oh!

adriangbot commented May 22, 2026

Uh oh!

adriangbot commented May 22, 2026

Uh oh!

zhuqi-lucas commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dandandan commented May 22, 2026

Uh oh!

zhuqi-lucas commented Jun 16, 2026

Uh oh!

zhuqi-lucas commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adriangb commented Jun 16, 2026

Uh oh!

adriangb commented Jun 16, 2026

Uh oh!

adriangb commented Jun 16, 2026

Uh oh!

adriangbot commented Jun 16, 2026

Uh oh!

adriangbot commented Jun 16, 2026

Uh oh!

adriangbot commented Jun 16, 2026

Uh oh!

adriangbot commented Jun 16, 2026

Uh oh!

adriangbot commented Jun 16, 2026

Uh oh!

adriangbot commented Jun 16, 2026

Uh oh!

adriangbot commented Jun 16, 2026

Uh oh!

zhuqi-lucas commented May 22, 2026 •

edited

Loading

Benchmarks (`benchmarks/sort_pushdown_inexact`, 5 iterations)

zhuqi-lucas commented May 22, 2026 •

edited

Loading

zhuqi-lucas commented Jun 16, 2026 •

edited

Loading