fix(benchmarks): exempt 3.11.1 regression-guard entries that fail on publish by carlos-alm · Pull Request #1248 · optave/ops-codegraph-tool

carlos-alm · 2026-05-30T05:22:48Z

Summary

Adds 3.11.1:DB bytes/file, 3.11.1:fnDeps depth 3, and 3.11.1:fnDeps depth 5 to KNOWN_REGRESSIONS in regression-guard.test.ts

Why this fails on publish but not in CI

The per-PR benchmark gate runs with --version dev. assertNoRegressions has a dev-only fallback: when comparing dev vs a baseline, KNOWN_REGRESSIONS entries keyed to the baseline version also exempt the metric. So dev vs 3.11.0 was covered by 3.11.0:fnDeps depth 3 etc.

When publish.yml runs, it uses the real semver (3.11.1). The fallback doesn't fire for non-dev versions, and KNOWN_REGRESSIONS.has('3.11.1:fnDeps depth 3') is false → gate fails.

Why the baseline is 3.10.0, not 3.11.0

3.11.0 has no query benchmark data in committed history, so findLatestPair skips it and falls back to 3.10.0. The 3.10.0 numbers predate the corpus-scope change from #1134 (resolution fixtures excluded from the build sweep), so DB bytes/file and fnDeps values look inflated against that older baseline.

Exemption rationale

Entry	Root cause
`3.11.1:DB bytes/file`	Corpus denominator drop from #1134: ~745 files → ~607 files; bytes constant, per-file ratio inflates
`3.11.1:fnDeps depth 3/5`	3.10.0 baseline predates 3.11.0 steady-state; no fn_deps implementation change

All three entries carry "remove once 3.12.0+ data confirms stable numbers against a 3.11.x baseline" in their doc comments — the stale-entry test will flag them automatically after 3.12.0 ships.

Test plan

RUN_REGRESSION_GUARD=1 npm run test:regression-guard — 17/17 pass locally
publish.yml regression-guard step passes on the next release run

Default model claude-sonnet-4-20250514 is deprecated; API rejects it with 0 tokens causing automated-review to fail with CLAUDE_SUCCESS=false.

…publish 3.11.0 has no query benchmark data in committed history, so findLatestPair falls back to 3.10.0 as the baseline for 3.11.1. The 3.10.0 numbers predate the corpus-scope change from #1134 (resolution fixtures excluded from the build sweep), making DB bytes/file and fnDeps depth 3/5 appear as regressions against the older baseline. The per-PR gate uses version 'dev', which triggers the assertNoRegressions baseline-version fallback so KNOWN_REGRESSIONS entries for the baseline release also apply — masking the failures in CI. Publish uses the real semver (3.11.1), so that fallback doesn't fire and the guard fails. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude · 2026-05-30T05:23:00Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

greptile-apps · 2026-05-30T05:25:34Z

Greptile Summary

This PR exempts three 3.11.1-versioned metrics from the publish-time regression guard — DB bytes/file, fnDeps depth 3, and fnDeps depth 5 — and pins the claude-code-action GitHub Action to a commit SHA instead of the mutable @beta tag.

regression-guard.test.ts: Adds 3.11.1:DB bytes/file, 3.11.1:fnDeps depth 3, and 3.11.1:fnDeps depth 5 to KNOWN_REGRESSIONS with detailed doc comments explaining that the apparent regressions are measurement artifacts: the missing 3.11.0 build/query benchmark history causes findLatestPair to fall back to the pre-perf(bench): exclude resolution-benchmark fixtures from dogfooding sweep #1134 3.10.0 baseline (smaller file corpus denominator inflating DB bytes/file, older baseline inflating fnDeps values). The existing stale-entry test at line 593 will automatically surface these exemptions for pruning once a 3.12.0 baseline is committed.
claude.yml: Replaces the mutable @beta tag on anthropics/claude-code-action with a pinned commit SHA — a supply-chain hardening improvement unrelated to the benchmark fix.

Confidence Score: 5/5

Safe to merge — the changes are narrowly scoped exemptions with exhaustive root-cause documentation and a self-cleaning stale-entry mechanism already in place.

The three new KNOWN_REGRESSIONS entries directly address the described publish-gate failure, the exemption logic in assertNoRegressions is straightforward and correct, the stale-entry test will automatically surface these entries for pruning after 3.12.0, and the claude.yml SHA-pin is a supply-chain improvement with no downside. The only finding is a minor wording inconsistency in a doc comment.

The doc comment on the 3.11.1:DB bytes/file entry in regression-guard.test.ts references query benchmark data for a metric that lives in the build benchmark history — worth confirming which history file is actually missing 3.11.0 data before the entry is pruned at 3.12.0.

Important Files Changed

Filename	Overview
tests/benchmarks/regression-guard.test.ts	Adds three well-documented KNOWN_REGRESSIONS entries for 3.11.1; logic is correct for the publish-time gate, with a minor doc-comment wording inconsistency (says "query benchmark data" for the build-benchmark-derived DB bytes/file metric).
.github/workflows/claude.yml	Pins claude-code-action from the mutable @beta tag to a specific commit SHA — standard supply-chain hardening, no functional change.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["publish.yml runs\n(version = 3.11.1)"] --> B["regression-guard.test.ts\nassertNoRegressions()"]
    B --> C{"KNOWN_REGRESSIONS\n.has('3.11.1:metric')?"}
    C -- "Before this PR → false" --> D["❌ Gate fails"]
    C -- "After this PR → true" --> E["✅ Exempted"]
    F["Per-PR CI gate\n(version = 'dev')"] --> G["baseline fallback\n.has('3.11.0:metric')"]
    G --> H["✅ Already covered"]
    I["Stale-entry test"] --> J{"minorGap > 1?"}
    J -- "After 3.12.0" --> K["⚠️ 3.11.1 entries flagged → pruned"]

_{Reviews (3): Last reviewed commit: "fix(ci): pin claude-code-action to SHA d..." | Re-trigger Greptile}

@beta

@beta is a moving tag; the unpin caused the automated-review job to pick up a version with a deprecated default model (claude-sonnet-4-20250514), which the API rejected with 0 tokens and CLAUDE_SUCCESS=false. Pinning to the SHA that @beta currently resolves to locks in the working version.

carlos-alm · 2026-05-30T05:47:21Z

@claude

claude · 2026-05-30T05:47:36Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

carlos-alm and others added 3 commits May 29, 2026 20:00

docs: prepare release notes for v3.11.1

a6fb799

fix(ci): pin claude-code-action to claude-sonnet-4-6 model

0996514

Default model claude-sonnet-4-20250514 is deprecated; API rejects it with 0 tokens causing automated-review to fail with CLAUDE_SUCCESS=false.

carlos-alm and others added 2 commits May 29, 2026 23:41

Merge branch 'main' into fix/claude-action-model

46646ce

carlos-alm merged commit d93b257 into main May 30, 2026
21 checks passed

carlos-alm deleted the fix/claude-action-model branch May 30, 2026 06:15

github-actions Bot locked and limited conversation to collaborators May 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(benchmarks): exempt 3.11.1 regression-guard entries that fail on publish#1248

fix(benchmarks): exempt 3.11.1 regression-guard entries that fail on publish#1248
carlos-alm merged 5 commits into
mainfrom
fix/claude-action-model

carlos-alm commented May 30, 2026

Uh oh!

claude Bot commented May 30, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 30, 2026 •

edited

Loading

Uh oh!

carlos-alm commented May 30, 2026

Uh oh!

claude Bot commented May 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented May 30, 2026

Summary

Why this fails on publish but not in CI

Why the baseline is 3.10.0, not 3.11.0

Exemption rationale

Test plan

Uh oh!

claude Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

carlos-alm commented May 30, 2026

Uh oh!

claude Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented May 30, 2026 •

edited

Loading

greptile-apps Bot commented May 30, 2026 •

edited

Loading

claude Bot commented May 30, 2026 •

edited

Loading