LucaCappelletti94 · LucaCappelletti94 · Jun 7, 2026 · Jun 6, 2026
diff --git a/.cargo/config.toml b/.cargo/config.toml
@@ -0,0 +1,5 @@
+# `cargo regen` rebuilds every input to web/assets/bench.json with one command:
+# the timing benches, the two memory benches (a separate process each, since
+# they install a counting global allocator), and finally the export.
+[alias]
+regen = "run --release --bin sqlbench -- regen"
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -17,11 +17,21 @@ No unsafe code is allowed (`unsafe_code = "forbid"`). Clippy runs with pedantic
 The site under `web/` is a Dioxus -> WASM app that renders a committed snapshot, `web/assets/bench.json`, produced by `sqlbench export`. CI (`.github/workflows/pages.yml`) only builds and deploys the committed crates, so regenerate the snapshot manually after changing the corpus or parsers:
 
 ```bash
-cargo bench                          # write target/bench_dist/ timings (long)
-cargo run --bin sqlbench -- export   # write web/assets/bench.json
-cd web && dx serve                   # preview at http://127.0.0.1:8080/sql_ast_benchmark/
+cargo regen          # one command: timing benches + memory benches + export (long)
+cd web && dx serve   # preview at http://127.0.0.1:8080/sql_ast_benchmark/
 ```
 
+`cargo regen` (alias in `.cargo/config.toml` for `cargo run --release --bin sqlbench -- regen`) runs the producers in order and ends with the export. The memory benches install a counting global allocator, so they each run in their own process, separate from the timing bench and from export. That is the only reason this is a pipeline rather than a single binary. To run a stage on its own:
+
+```bash
+cargo bench                              # write target/bench_dist/ + target/batch_dist/ timings
+cargo run --release -p membench          # write target/mem_dist/ per-statement memory
+cargo run --release -p membench -- batch # write target/batch_mem_dist/ whole-script memory
+cargo run --bin sqlbench -- export       # read all of the above, write web/assets/bench.json
+```
+
+`export` reads whatever timing, memory, and batch summaries are present under `target/` and warns (rather than fails) for any that are missing, so the memory and batch columns stay empty until their producers have been run.
+
 The charts are rendered in the browser from the JSON by the shared `viz` crate (plotters, SVG backend), so no chart images are committed.
 
 ## Coverage
@@ -31,4 +41,4 @@ tar --zstd -xf datasets.tar.zst   # coverage runs the bench in smoke mode, which
 cargo tarpaulin                    # LLVM engine, includes the bench
 ```
 
-`tarpaulin.toml` runs the benchmark in verify-only mode (`--test`) under the LLVM engine, since the benchmark is the main exercise of the `BenchParser` layer. With the corpus present it covers `benches/parsing.rs` and the dialect-mapping / accept / reprint paths in `src/lib.rs`.
+`tarpaulin.toml` runs the benchmark in verify-only mode (`--test`) under the LLVM engine, since the benchmark is the main exercise of the `BenchParser` layer. With the corpus present it covers `benches/parsing.rs` and `benches/batch_parsing.rs` (both in smoke mode) and the dialect-mapping / accept / reprint paths in `src/lib.rs`.
diff --git a/Cargo.toml b/Cargo.toml
@@ -51,6 +51,10 @@ syntect = { version = "5", default-features = false, features = ["default-fancy"
 name = "parsing"
 harness = false
 
+[[bench]]
+name = "batch_parsing"
+harness = false
+
 [[bin]]
 name = "sqlbench"
 path = "src/bin/sqlbench.rs"

diff --git a/README.md b/README.md
@@ -38,19 +38,29 @@ Per-parser repository metadata (stars, contributors, fuzzing, test and benchmark
 
 311,594 statements across 34 files and 13 dialects, committed compressed as `datasets.tar.zst` (5.3 MB) and unpacked to `datasets/{dialect}/{name}.txt`, one statement per line. The commands below extract it automatically on first use. All sources are openly licensed (Apache-2.0, MIT, BSD, public domain or CC-BY), drawn from each engine's own regression suites and official samples. Natural-language-with-embedded-SQL datasets are intentionally excluded.
 
-Correctness is defined per dialect. Dialects with a runnable engine are graded against that real database engine, run in Docker via testcontainers by the `oracle` crate: a statement is valid unless the engine reports a syntax error (a missing table or column still counts as parsed). The validity labels are computed once and committed under `oracle/labels`, so grading and CI need no Docker. That reference splits the corpus into valid and invalid and scores recall, false positives, round-trip, and fidelity. Dialects with no runnable engine (cloud services, heavy JVM engines) have no reference, so their statements count as provenance-valid (sourced from each engine's own suites) and the metric is acceptance rate. Speed is a per-statement parse-time distribution over every accepted statement, timed with an adaptive iteration count on a no-`catch_unwind` path.
+Correctness is defined per dialect. Dialects with a runnable engine are graded against that real database engine, run in Docker via testcontainers by the `oracle` crate: a statement is valid unless the engine reports a syntax error (a missing table or column still counts as parsed). The validity labels are computed once and committed under `oracle/labels`, so grading and CI need no Docker. That reference splits the corpus into valid and invalid and scores recall, false positives, round-trip, and fidelity. Dialects with no runnable engine (cloud services, heavy JVM engines) have no reference, so their statements count as provenance-valid (sourced from each engine's own suites) and the metric is acceptance rate. Speed is a per-statement parse-time distribution over every accepted statement, timed with an adaptive iteration count on a no-`catch_unwind` path. Memory is measured separately with a counting allocator, as peak live bytes and retained (AST) bytes per statement. A companion batch axis parses each parser's whole accepted set as one script and normalizes the time and memory by the statement count, showing what bulk parsing amortizes against parsing one statement at a time. A batch that does not parse the whole set (a parser that bails out partway) is dropped rather than reported, and parsers without a multi-statement entry point (databend-common-ast) sit out the batch axis.
 
 ## Running
 
-The corpus auto-extracts on first use, so just run:
+The corpus auto-extracts on first use. To rebuild the whole explorer snapshot (`web/assets/bench.json`) with one command:
+
+```bash
+cargo regen   # timing benches + memory benches + export, in order
+```
+
+That is an alias (see `.cargo/config.toml`) for `cargo run --release --bin sqlbench -- regen`. The memory measurement installs a counting global allocator, so it has to run in its own process, separate from the timing bench (which must stay on the default allocator for fair numbers). The `regen` command orchestrates that sequence so you do not have to. The individual steps, if you want to run one on its own:
 
 ```bash
 cargo run --release --bin sqlbench correctness --per-file    # per-file acceptance, every dialect
 cargo run --release --bin sqlbench correctness               # reference + provenance correctness
-cargo bench                                                  # parse-throughput, every dialect
+cargo bench                                                  # parse time (per-statement and batch), every dialect
+cargo run --release -p membench                              # per-statement memory (peak + retained bytes)
+cargo run --release -p membench -- batch                     # whole-script (batch) memory, per statement
 cargo run --release --bin sqlbench export                    # regenerate web/assets/bench.json for the explorer
 ```
 
+`cargo bench` runs both the per-statement (`parsing`) and whole-script (`batch_parsing`) timing benches. Add `--bench batch_parsing` to run only the batch one. `export` reads whatever the benches left under `target/`, warning rather than failing for any missing source, so the memory and batch columns stay empty until their producers have run.
+
 Validity labels for the reference dialects are produced by the `oracle` crate (real engines in Docker via testcontainers) and committed under `oracle/labels`, so `correctness` and `export` need no Docker. Regenerate them with `cargo run --release -p oracle`.
 
 ### Requirements

diff --git a/benches/batch_parsing.rs b/benches/batch_parsing.rs
@@ -0,0 +1,245 @@
+//! Multi-dialect BATCH (whole-script) parse-time benchmark over the full
+//! `datasets/` corpus.
+//!
+//! Companion to `benches/parsing.rs`. Where `parsing` times each statement in
+//! isolation, this concatenates every statement a parser accepts in a dialect
+//! into one script and times parsing that whole script in a single call, then
+//! divides by the statement count to get a normalized per-statement cost. The
+//! contrast between this and the per-statement median isolates what a batch API
+//! pays or amortizes, the effect raised in issue #15: `Parser::parse_sql` grows
+//! a `Vec` of large `Statement` values, so bulk parsing can behave differently
+//! from many single-statement calls.
+//!
+//! Both axes are measured over the SAME accepted set (statements the parser
+//! parses in that dialect), so the two numbers are directly comparable.
+//!
+//! Only parsers with a multi-statement entry point take part (see
+//! `BenchParser::can_batch`); `databend-common-ast` parses one statement per
+//! call and is simply skipped here.
+//!
+//! Output (under `target/batch_dist/`), self-contained for now (not yet wired
+//! into the web export):
+//!   - `summary.csv` : per-pair statement count, statements the parser saw,
+//!     batch size in bytes, whole-script time, and time normalized per
+//!     statement.
+//!
+//! Full run:        `cargo bench --bench batch_parsing`
+//! Smoke (default): `cargo test` or `cargo bench --bench batch_parsing -- --test`
+//!
+//! The full run unpacks `datasets.tar.zst` automatically if `datasets/` is
+//! missing. The smoke path needs no corpus, so `cargo test` stays fast.
+
+use sql_ast_benchmark::batch::join_batch;
+use sql_ast_benchmark::datasets::Dialect;
+use sql_ast_benchmark::report::load_dialect;
+use sql_ast_benchmark::BenchParser;
+use std::fs;
+use std::hint::black_box;
+use std::io::Write as _;
+use std::time::Instant;
+
+/// Deep statements can exhaust the default stack inside recursive-descent
+/// parsers, and a stack overflow aborts the process, so time on a large stack.
+const WORKER_STACK: usize = 1024 * 1024 * 1024;
+
+const OUT_DIR: &str = "target/batch_dist";
+
+const DIALECTS: &[Dialect] = &[
+    Dialect::Postgresql,
+    Dialect::Sqlite,
+    Dialect::Mysql,
+    Dialect::Clickhouse,
+    Dialect::Duckdb,
+    Dialect::Hive,
+    Dialect::SparkSql,
+    Dialect::Trino,
+    Dialect::Tsql,
+    Dialect::Oracle,
+    Dialect::Bigquery,
+    Dialect::Redshift,
+    Dialect::Multi,
+];
+
+/// Whole-script parse time (ns/batch): adaptive iteration count so a short
+/// script still accumulates enough work per round, capped low because one batch
+/// call already does a lot. Best (min) of `ROUNDS` rounds.
+fn time_batch(mut f: impl FnMut() -> usize) -> f64 {
+    const TARGET_NS: u128 = 2_000_000; // aim for ~2 ms of work per round
+    const ROUNDS: usize = 5;
+
+    black_box(f()); // warm up
+    let probe = Instant::now();
+    black_box(f());
+    let single = probe.elapsed().as_nanos().max(1);
+    let iters = u64::try_from((TARGET_NS / single).clamp(1, 1_000)).unwrap_or(1);
+
+    let mut best = f64::MAX;
+    for _ in 0..ROUNDS {
+        let start = Instant::now();
+        for _ in 0..iters {
+            black_box(f());
+        }
+        let per = start.elapsed().as_nanos() as f64 / iters as f64;
+        best = best.min(per);
+    }
+    best
+}
+
+struct Row {
+    dialect: &'static str,
+    parser: &'static str,
+    /// Statements fed into the batch (the parser's accepted set).
+    n_accepted: usize,
+    /// Statements the parser reported parsing from the batch (coverage).
+    n_parsed: usize,
+    batch_bytes: usize,
+    /// Whole-script parse time (ns).
+    batch_ns: f64,
+    /// `batch_ns / n_accepted`: time per statement in batch context.
+    ns_per_stmt: f64,
+}
+
+/// Time one (parser, dialect) pair: build the accepted set, concatenate it into
+/// one script, time the whole-script parse, and normalize per statement.
+fn run_pair(parser: BenchParser, dialect: Dialect, stmts: &[String]) -> Row {
+    let accepted: Vec<&str> = stmts
+        .iter()
+        .filter(|s| parser.accepts(s, dialect) == Some(true))
+        .map(String::as_str)
+        .collect();
+
+    let mut row = Row {
+        dialect: dialect.dir_name(),
+        parser: parser.name(),
+        n_accepted: accepted.len(),
+        n_parsed: 0,
+        batch_bytes: 0,
+        batch_ns: 0.0,
+        ns_per_stmt: 0.0,
+    };
+    if accepted.is_empty() {
+        return row;
+    }
+
+    let batch = join_batch(&accepted);
+    row.batch_bytes = batch.len();
+    row.n_parsed = parser.parse_batch(&batch, dialect).unwrap_or(0);
+    row.batch_ns = time_batch(|| parser.parse_batch(&batch, dialect).unwrap_or(0));
+    row.ns_per_stmt = row.batch_ns / accepted.len() as f64;
+    row
+}
+
+/// Quick smoke check used by `cargo test`: every batch-capable parser parses a
+/// tiny multi-statement script per supported dialect without panicking. Needs
+/// no corpus, so it stays instant.
+fn smoke() {
+    std::panic::set_hook(Box::new(|_| {}));
+    let script = "SELECT 1;\nSELECT 2;\nSELECT 3";
+    for &dialect in DIALECTS {
+        for parser in BenchParser::all() {
+            if !parser.can_batch() || !parser.supports(dialect) {
+                continue;
+            }
+            black_box(parser.parse_batch(script, dialect));
+        }
+    }
+    println!("smoke ok");
+}
+
+fn main() {
+    // Match `benches/parsing.rs`: only an explicit `cargo bench` (which passes
+    // `--bench` and not `--test`) does the full, datasets-backed run. `cargo
+    // test` and a bare run take the fast smoke path, which needs no corpus.
+    let args: Vec<String> = std::env::args().collect();
+    let full_run = args.iter().any(|a| a == "--bench") && !args.iter().any(|a| a == "--test");
+    if !full_run {
+        smoke();
+        return;
+    }
+
+    // Acceptance checks are panic-guarded; suppress the default panic message so
+    // a caught panic does not spam stderr.
+    std::panic::set_hook(Box::new(|_| {}));
+
+    if let Err(e) = sql_ast_benchmark::datasets::ensure_corpus() {
+        eprintln!("ERROR: could not prepare datasets/: {e}");
+        std::process::exit(1);
+    }
+    fs::create_dir_all(OUT_DIR).expect("create out dir");
+
+    let mut summary = fs::File::create(format!("{OUT_DIR}/summary.csv")).expect("summary.csv");
+    writeln!(
+        summary,
+        "dialect,parser,n_accepted,n_parsed,batch_bytes,batch_ns,ns_per_stmt"
+    )
+    .unwrap();
+
+    let parsers = BenchParser::all();
+    let start_all = Instant::now();
+
+    for &dialect in DIALECTS {
+        let stmts = load_dialect(dialect);
+        if stmts.is_empty() {
+            continue;
+        }
+        for parser in &parsers {
+            let parser = *parser;
+            if !parser.can_batch() || !parser.supports(dialect) {
+                continue;
+            }
+            let job_start = Instant::now();
+            // Run on a large stack: deeply nested accepted statements can
+            // otherwise overflow the default stack and abort the process.
+            let result = std::thread::scope(|scope| {
+                std::thread::Builder::new()
+                    .stack_size(WORKER_STACK)
+                    .spawn_scoped(scope, || run_pair(parser, dialect, &stmts))
+                    .expect("spawn worker")
+                    .join()
+            });
+            let Ok(row) = result else {
+                eprintln!(
+                    "  [warn] {}/{} panicked, skipping pair",
+                    dialect.dir_name(),
+                    parser.name()
+                );
+                continue;
+            };
+
+            writeln!(
+                summary,
+                "{},{},{},{},{},{:.1},{:.1}",
+                row.dialect,
+                row.parser,
+                row.n_accepted,
+                row.n_parsed,
+                row.batch_bytes,
+                row.batch_ns,
+                row.ns_per_stmt,
+            )
+            .unwrap();
+            summary.flush().unwrap();
+
+            let coverage = if row.n_accepted == 0 {
+                0.0
+            } else {
+                100.0 * row.n_parsed as f64 / row.n_accepted as f64
+            };
+            println!(
+                "{:<11} {:<24} n={:>6} seen={:>6} ({:>3.0}%) batch={:>9.0}ns/stmt  ({:.1}s)",
+                row.dialect,
+                row.parser,
+                row.n_accepted,
+                row.n_parsed,
+                coverage,
+                row.ns_per_stmt,
+                job_start.elapsed().as_secs_f64(),
+            );
+        }
+    }
+
+    println!(
+        "\nDone in {:.1}s. summary.csv in {OUT_DIR}/",
+        start_all.elapsed().as_secs_f64()
+    );
+}