Skip to content

Add whole-script (batch) parse benchmark for time and memory#17

Merged
LucaCappelletti94 merged 1 commit into
mainfrom
batch-parse-bench
Jun 7, 2026
Merged

Add whole-script (batch) parse benchmark for time and memory#17
LucaCappelletti94 merged 1 commit into
mainfrom
batch-parse-bench

Conversation

@LucaCappelletti94

Copy link
Copy Markdown
Owner

Parses each parser's whole accepted set as one script and reports time and peak memory per statement next to the single-statement numbers. A completeness guard drops any batch the parser does not finish. New batch_parsing bench and membench batch mode, with batch columns in the explorer tables.

Issue #15 noted that sqlparser-rs parses a multi-statement script faster
per statement than the sum of single-statement parses, because parse_sql
fills one growing Vec of large Statement values. This adds a batch axis
that measures exactly that, for both time and memory, alongside the
existing per-statement benchmarks.

Benchmarks:
- benches/batch_parsing.rs times each parser's whole accepted set parsed
  as one script, normalized per statement, into target/batch_dist/.
- membench gains a `batch` subcommand for the same over the counting
  allocator, into target/batch_mem_dist/.
- BenchParser::can_batch, parse_batch, and measure_mem_batch in src/lib.rs
  back both. Only databend-common-ast lacks a multi-statement entry point
  and sits out the batch axis. join_batch (src/batch.rs) builds identical
  scripts for both benches.

Completeness guard: both summaries record n_parsed, and export drops any
batch datum whose parse did not consume the whole accepted set, so a
parser that bails out partway never shows a misleading fast number.

Website: export merges the two batch summaries into a per-parser
ParserBatch on each dialect, and the Speed and Memory tables (dialect and
parser views) gain batch columns placed next to the comparable
single-statement mean, so the comparison is apples to apples.

One command: `cargo regen` (alias for `sqlbench regen`) runs the timing
benches, both memory benches, and export in order. The memory benches
need their own process for the global allocator, so the pipeline cannot
be a single binary, but regen hides that.

Docs: README and CONTRIBUTING document the batch axis, membench, and the
single-command workflow.
@LucaCappelletti94 LucaCappelletti94 merged commit b3207e4 into main Jun 7, 2026
7 checks passed
@codecov

codecov Bot commented Jun 7, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 40.31008% with 154 lines in your changes missing coverage. Please review.
✅ Project coverage is 46.53%. Comparing base (7f1a984) to head (bb0e7fd).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
benches/batch_parsing.rs 17.85% 69 Missing ⚠️
membench/src/main.rs 0.00% 40 Missing ⚠️
src/bin/sqlbench.rs 0.00% 27 Missing ⚠️
src/export.rs 74.57% 15 Missing ⚠️
web/src/components.rs 0.00% 2 Missing ⚠️
src/lib.rs 97.43% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #17      +/-   ##
==========================================
+ Coverage   46.20%   46.53%   +0.32%     
==========================================
  Files          14       17       +3     
  Lines        1649     1906     +257     
==========================================
+ Hits          762      887     +125     
- Misses        887     1019     +132     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant