fix(benchmarks): dedicated WASM timing tolerance in regression guard by carlos-alm · Pull Request #1255 · optave/ops-codegraph-tool

carlos-alm · 2026-05-30T19:26:17Z

Summary

The v3.11.1 publish workflow kept failing the test:regression-guard gate on WASM timing metrics — one or two different metrics each run — because WASM wall-clock jitter on shared CI runners is large in percentage terms. PR #1248 already exempted three 3.11.1 metrics by hand; this PR replaces that whack-a-mole with a structural fix.

Adds WASM_TIMING_THRESHOLD (70%), applied to timing metrics measured under the WASM engine via an engine-aware thresholdFor(label, engine). The three benchmark suites (build/query/incremental) now pass engineKey through to assertNoRegressions.
WASM runs every parse/query through the tree-sitter-wasm interpreter (3–5× slower than native, dominated by interpreter + GC overhead), so identical ±10–20ms runner jitter lands as a much larger percentage swing. Observed +27–67% run-to-run on byte-identical code.
Native is the canary. It shares all extraction/resolution/query logic with WASM and keeps the strict 25%/50% thresholds, so a real algorithmic regression still trips it. The WASM widening still flags the 100–220% catastrophes the guard exists to catch.
Size metrics excluded (SIZE_METRICS = DB bytes/file) — engine-independent and deterministic, so they keep the strict threshold regardless of engine.
Removes the now-superseded 3.11.1:No-op rebuild and 3.11.1:Full build entries (both WASM-only timing trips). Keeps 3.11.1:fnDeps depth 3/5 (trip the native engine too: 24.3→34.7, 24.7→34.7) and 3.11.1:DB bytes/file (size metric) — neither is covered by the WASM widening.

Test plan

RUN_REGRESSION_GUARD=1 vitest run on committed data — 17/17 pass
biome check — clean
Injected the publish-run 3.11.1 numbers (No-op 15→25, Full build 7664→9833, fnDeps/DB bytes): guard passes with the widening
Negative test: with the widening disabled, guard fails on exactly No-op rebuild + Full build (the original publish failures), confirming the tolerance is load-bearing
Re-run the publish workflow; regression-guard gate passes

…oise The publish-time regression guard tripped on two more WASM timing metrics that are CI runner noise, not real regressions: - No-op rebuild (build): 15 → 25 (+67%), a 10ms delta at the noise floor on a sub-30ms NOISY metric; historical wasm range is 5–22ms. - Full build (incremental): 7664 → 9833 (+28%); wasm full-build history spans 7.2s–14.0s, so 9.8s is inside the envelope. Native figures did not trip and no build/incremental codepath changed between 3.10.0 and 3.11.1, confirming runner variance. Same shape and root cause as the existing 3.10.0/3.11.0 No-op rebuild and Full build exemptions. Remove once 3.12.0+ data is captured against a committed 3.11.x baseline.

claude · 2026-05-30T19:26:30Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

greptile-apps · 2026-05-30T19:28:05Z

Greptile Summary

This PR replaces the per-version whack-a-mole KNOWN_REGRESSIONS entries for WASM timing noise with a structural fix: a dedicated WASM_TIMING_THRESHOLD of 70% applied to all WASM timing metrics except deterministic size metrics (DB bytes/file), which keeps the strict 25% threshold regardless of engine.

Adds WASM_TIMING_THRESHOLD = 0.7, SIZE_METRICS, and an engine-aware thresholdFor(label, engine) that returns the appropriate tolerance. All three benchmark suites (build, query, incremental) now pass engineKey down to assertNoRegressions.
Cleans up KNOWN_REGRESSIONS: removes the WASM-only timing exemptions for 3.11.1:No-op rebuild and 3.11.1:Full build, retaining only entries that also trip the native engine (3.11.1:fnDeps depth 3/5) or are engine-independent size metrics (3.11.1:DB bytes/file).

Confidence Score: 5/5

Safe to merge. The change is confined to the regression guard test file and only widens tolerances for WASM timing metrics — it cannot cause false negatives on native benchmarks, which retain the strict 25%/50% thresholds.

The logic in thresholdFor is straightforward: WASM timing metrics return 70%, size metrics fall through to the strict 25% regardless of engine, and native always uses the original thresholds. The three call-sites in the test suites all pass engineKey consistently.

No files require special attention.

Important Files Changed

Filename	Overview
tests/benchmarks/regression-guard.test.ts	Adds WASM_TIMING_THRESHOLD (70%), SIZE_METRICS exclusion set, and propagates engine key through thresholdFor/assertNoRegressions to structurally handle WASM timing jitter. Logic is consistent and well-guarded. No issues found.

_{Reviews (2): Last reviewed commit: "fix(benchmarks): add dedicated WASM timi..." | Re-trigger Greptile}

Replaces the per-version, per-metric KNOWN_REGRESSIONS whack-a-mole for WASM timing noise with a structural fix: timing metrics measured under the WASM engine get a wider WASM_TIMING_THRESHOLD (70%) via an engine-aware thresholdFor. WASM wall-clock is 3-5x slower than native and dominated by interpreter + GC overhead, so identical shared-runner jitter lands as a far larger percentage swing (observed +27-67% run-to-run on byte-identical code). The native engine shares all extraction/resolution/query logic and keeps the strict 25%/50% thresholds, so it remains the canary for real regressions; the WASM widening still flags the 100-220% catastrophes the guard exists to catch. Size metrics (DB bytes/file) are engine-independent and excluded via SIZE_METRICS so they keep the strict threshold. Removes the now-superseded 3.11.1 No-op rebuild and Full build entries (both WASM-only timing trips). The remaining 3.11.x entries are kept: fnDeps depth 3/5 trip the native engine too (24.3->34.7, 24.7->34.7) and DB bytes/file is a size metric — neither is covered by the WASM widening. Verified by injecting the publish-run 3.11.1 numbers: the guard passes with the widening and fails on exactly No-op rebuild + Full build without it.

carlos-alm changed the title ~~fix(benchmarks): exempt 3.11.1 No-op rebuild and Full build wasm CI noise~~ fix(benchmarks): dedicated WASM timing tolerance in regression guard May 30, 2026

carlos-alm merged commit 8b8b93c into main May 30, 2026
21 checks passed

carlos-alm deleted the fix/regression-guard-3.11.1-noop-fullbuild branch May 30, 2026 21:06

github-actions Bot locked and limited conversation to collaborators May 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(benchmarks): dedicated WASM timing tolerance in regression guard#1255

fix(benchmarks): dedicated WASM timing tolerance in regression guard#1255
carlos-alm merged 2 commits into
mainfrom
fix/regression-guard-3.11.1-noop-fullbuild

carlos-alm commented May 30, 2026 •

edited

Loading

Uh oh!

claude Bot commented May 30, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

claude Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

carlos-alm commented May 30, 2026 •

edited

Loading

claude Bot commented May 30, 2026 •

edited

Loading

greptile-apps Bot commented May 30, 2026 •

edited

Loading