Add whole-script (batch) parse benchmark for time and memory#17
Merged
Conversation
Issue #15 noted that sqlparser-rs parses a multi-statement script faster per statement than the sum of single-statement parses, because parse_sql fills one growing Vec of large Statement values. This adds a batch axis that measures exactly that, for both time and memory, alongside the existing per-statement benchmarks. Benchmarks: - benches/batch_parsing.rs times each parser's whole accepted set parsed as one script, normalized per statement, into target/batch_dist/. - membench gains a `batch` subcommand for the same over the counting allocator, into target/batch_mem_dist/. - BenchParser::can_batch, parse_batch, and measure_mem_batch in src/lib.rs back both. Only databend-common-ast lacks a multi-statement entry point and sits out the batch axis. join_batch (src/batch.rs) builds identical scripts for both benches. Completeness guard: both summaries record n_parsed, and export drops any batch datum whose parse did not consume the whole accepted set, so a parser that bails out partway never shows a misleading fast number. Website: export merges the two batch summaries into a per-parser ParserBatch on each dialect, and the Speed and Memory tables (dialect and parser views) gain batch columns placed next to the comparable single-statement mean, so the comparison is apples to apples. One command: `cargo regen` (alias for `sqlbench regen`) runs the timing benches, both memory benches, and export in order. The memory benches need their own process for the global allocator, so the pipeline cannot be a single binary, but regen hides that. Docs: README and CONTRIBUTING document the batch axis, membench, and the single-command workflow.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17 +/- ##
==========================================
+ Coverage 46.20% 46.53% +0.32%
==========================================
Files 14 17 +3
Lines 1649 1906 +257
==========================================
+ Hits 762 887 +125
- Misses 887 1019 +132 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Parses each parser's whole accepted set as one script and reports time and peak memory per statement next to the single-statement numbers. A completeness guard drops any batch the parser does not finish. New batch_parsing bench and membench batch mode, with batch columns in the explorer tables.