docs: dogfood report for v3.11.1 by carlos-alm · Pull Request #1262 · optave/ops-codegraph-tool

carlos-alm · 2026-05-31T03:34:30Z

Summary

Dogfood report for v3.11.1. Captures native-engine validation (not stale — verified via complexity-phase timing), build/incremental/query benchmarks (native ~5× faster than WASM on full builds), and the bugs found this session.

Findings

bug: watch-mode incremental cascade inflates 'calls' edges (resolver divergence from full build) #1259 (High) — watch-mode incremental cascade inflates calls edges via a resolver that diverged from the full build. Fixed in fix(watch): align incremental call resolver with full build #1261.
bug: watch-mode cascade under-rebuilds receiver/extends/dynamic-import edges vs full build #1260 (Medium) — residual receiver/extends/dynamic-import edge divergence in the same cascade. Open.
Embedding recall benchmark timed out (jina worker hit the 1800s cap) — harness/hardware limitation, not a product defect.

Test plan

Report content cross-checked against captured benchmark JSON, codegraph info/stats, and issues/PR state

Documents the v3.11.1 dogfooding session: native engine validation, build/incremental/query benchmarks, and the watch-mode calls-edge inflation bug (#1259, fixed in #1261) plus the residual cascade divergence (#1260). Embedding benchmark timed out (harness limitation).

claude · 2026-05-31T03:34:45Z

Claude finished @carlos-alm's task —— View job

PR Review: docs/dogfood-report-3.11.1

Root Cause Analysis

Problem Being Solved: Documents the dogfood validation session for @optave/codegraph@3.11.1, capturing real-world testing results, performance benchmarks, and two bugs found during the session — one fixed (#1259 → PR #1261), one tracked (#1260).

Why This Approach: A structured dogfood report surfaces issues that unit/integration tests don't catch (watch-mode bugs, multi-command sweeps, embedding timeouts). The report follows established practice for this repo (prior sessions exist under generated/dogfood/).

Risk Assessment: Zero. This is a documentation-only commit adding a markdown report. It introduces no code changes, no configuration changes, and no test modifications.

Phase 2: Configuration Change Detection

No configuration files modified. No linting rules relaxed. No test timeouts changed. No TypeScript flags touched. Clean.

Phase 3: Test Integrity

No test files modified. The report references a new regression test (tests/integration/issue-1259-watch-call-resolution.test.ts) added in PR #1261 — that test should be scrutinized in that PR, not here. No test integrity concerns.

Phase 4: Code Quality (Content Accuracy)

Reviewing the report content for factual consistency and accuracy:

Strengths:

The native-staleness verification at line 91 is rigorous: using complexityMs (17.9 ms native vs 829.6 ms WASM) as a proxy for whether native is computing in-engine is a concrete and convincing check.
Bug §9 (bug: watch-mode incremental cascade inflates 'calls' edges (resolver divergence from full build) #1259) is well-documented with root cause analysis, reproducer, fix summary, and parity result (10,178 / 10,178 calls edges). The three distinct failure modes (unconditional global fallback, no dedup, broken barrel re-exports) are clearly separated.
The 1-node/1-edge engine delta at line 60 is correctly flagged as pre-existing rather than a new regression — consistent with honesty expectations in CLAUDE.md.
Improvement suggestions in §10 are specific and actionable (collapsing resolvers, embedding benchmark fast path, incremental roles scoping).

Minor observations:

Line 17 — version note is ambiguous: package.json / package-lock.json left unchanged in git (no pin commit on the fix branch) — this is a process note about what was not committed. It's internally consistent but could confuse future readers who check whether 3.11.1 is pinned anywhere in source. Could note that this is intentional to avoid polluting the fix branch.
Line 74 — version mismatch acknowledged but could be clearer: "benchmark JSON reports the worktree package version 3.11.2-dev.4" is noted but the phrase "the active parse engine is native 3.11.1" that follows relies on codegraph info output not shown in the report. Readers have to trust the assertion. A one-line codegraph info output snippet would make this falsifiable.
Line 104 — "candidate for incremental scoping, but not a regression" — correct framing, and the improvement is already captured in §10 suggestion Bump actions/github-script from 7 to 8 #3. No issue.
Line 128 (embedding benchmark) — the timeout is correctly characterized as a harness/hardware limitation. The recommendation to add --models minilm fast path is tracked in §10 suggestion Bump actions/checkout from 4 to 6 #2. This is the right call per CLAUDE.md ("never document bugs as expected behavior") — it's framed as a limitation to fix, not acceptable behavior. ✓
§5 line 60 — "1-node / 1-edge delta between engines is a pre-existing minor divergence" — since CLAUDE.md explicitly says divergence between engines is a bug that should be fixed, not documented, this statement technically violates that guideline. However, this is a dogfood report acknowledging a pre-existing finding for tracking purposes, not adding test expectations that accept the wrong output. The appropriate action is to ensure an issue exists for this delta. The report doesn't appear to reference a tracking issue for this specific delta. Consider whether it warrants a gh issue create.

Phase 5: Backlog Compliance

Zero-dep: ✓ — Documentation only. No package.json changes.
Foundation-aligned: ✓ — Not applicable (no code changes).
Problem-fit: 3/5 — Dogfood reports are useful operational documentation but don't directly address the "lost AI" problem. They ensure release quality, which is indirectly valuable.
Breaking: No — Purely additive documentation.
Tier: 1 (documentation, no dependency or architectural concerns)

Critical Concerns

One item worth a follow-up but not blocking:

Line 60 references a 1-node/1-edge engine divergence as "pre-existing" without linking to a tracking issue. Per CLAUDE.md, this should not be silently accepted. If no issue exists for this divergence, one should be filed.

Final Recommendation

Rating: ⭐⭐⭐⭐ (4/5)
Action: APPROVE
Reasoning: This is a clean, well-structured dogfood report. The content is rigorous — the native-staleness check is concrete, the headline bug is root-caused and fixed to exact parity, and the open divergence (bug: watch-mode cascade under-rebuilds receiver/extends/dynamic-import edges vs full build #1260) is tracked rather than dismissed. The only minor gap is the pre-existing 1-node/1-edge engine delta on line 60 has no linked issue. All other content is accurate and consistent with the repo's standards. No code, config, or test changes are included — the diff is purely additive documentation.

Suggested follow-up (non-blocking): Verify that the 1-node/1-edge WASM/native divergence (§5, line 60) has a tracking issue. If not, gh issue create it so future agents know to fix it rather than accept it.

greptile-apps · 2026-05-31T03:37:04Z

Greptile Summary

This PR adds the v3.11.1 dogfood report documenting installation, engine validation, build/incremental/query benchmarks, and two bugs found during the session. Previous review threads have been addressed with explanatory notes covering full-build number variance, native 1-file rebuild inversion, and calls-edge count discrepancy.

Captures native-engine validation (not stale, confirmed via complexity-phase timing), a ~5× full-build speedup over WASM, and bug reproduction/fix for watch-mode calls-edge inflation (bug: watch-mode incremental cascade inflates 'calls' edges (resolver divergence from full build) #1259, fixed in PR fix(watch): align incremental call resolver with full build #1261) plus an open residual divergence (bug: watch-mode cascade under-rebuilds receiver/extends/dynamic-import edges vs full build #1260).
The embedding benchmark is documented as incomplete due to a hardware timeout, and the 1-node/1-edge WASM vs native delta is tracked in follow-up: fix pre-existing 1-node/1-edge WASM vs native full-build divergence #1263.

Confidence Score: 5/5

Documentation-only change adding a dogfood report; no executable code is modified.

The PR adds a single markdown report file. All three previously flagged data inconsistencies have been addressed with explanatory notes. The only remaining issue is a minor arithmetic gap in the per-phase build totals that a reader summing the columns would notice but that does not affect any product behaviour.

No files require special attention beyond the one suggestion noted on the per-phase build table in DOGFOOD_REPORT_v3.11.1.md.

Important Files Changed

Filename	Overview
generated/dogfood/DOGFOOD_REPORT_v3.11.1.md	New dogfood report for v3.11.1. Documents installation, engine comparison, benchmarks, and two bugs (#1259, #1260). Previous review threads addressed inconsistent full-build numbers, native rebuild inversion, and edge-count mismatches — all resolved with explanatory notes. One remaining minor issue: per-phase totals in the full-build table don't sum to the reported aggregate (~424 ms WASM / ~99 ms native unaccounted).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Install @optave/codegraph@3.11.1] --> B[Verify native engine active]
    B --> C[Cold start / command sweep]
    C --> D[Full build benchmark\n625-file repo, --no-tests]
    D --> E[Engine comparison\nWASM vs Native]
    E --> F{1-node/1-edge delta?}
    F -->|Yes - pre-existing| G[Tracked in #1263]
    E --> H[1-file incremental rebuild]
    H --> I[Watch cascade test\nrebuildFile]
    I --> J{calls edge count correct?}
    J -->|No - inflated by ~700| K[Bug #1259 — High\nUnconditional global name fallback\nNo dedup / no confidence filter]
    K --> L[Fix in PR #1261\nExact parity 10,178/10,178]
    L --> M{Other edge kinds OK?}
    M -->|±36 edge delta remains| N[Bug #1260 — Medium\nreceiver/extends/dynamic-import\nOpen]
    E --> O[Embedding benchmark]
    O --> P[jina worker SIGKILL at 1800s cap\nIncomplete — harness limitation]

_{Reviews (11): Last reviewed commit: "docs: reconcile 625-file benchmark vs 77..." | Re-trigger Greptile}

greptile-apps · 2026-05-31T03:37:07Z

+`roles` dominates the 1-file rebuild on both engines (~18–20 ms) — it always recomputes globally regardless of how little changed. Candidate for incremental scoping, but not a regression.
+
+### Incremental & resolution
+
+| Metric | WASM | Native |


Inconsistent full-build numbers across sections

The "Incremental & resolution" table in §8 reports Full build | 6,510 ms | 1,417 ms, while §5's Engine Comparison table and §8's per-phase Build table both report 7,529 ms / 1,393 ms for the same "625-file source repo". The ~1,019 ms WASM gap and ~24 ms native gap are large enough to be noticeable and aren't explained (e.g., different benchmark runs, different warmup, JIT variance). A reader cross-checking §5 against §8 will see contradictory totals — a short inline note clarifying they are from separate runs would prevent confusion.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fixed — added a note above the Incremental & resolution table explaining that the 'Full build' row (6,510 ms WASM / 1,417 ms native) is from a separate benchmark run to the per-phase Build table (7,529 ms / 1,393 ms). Both measured the same 625-file source repo on different runs; the ~1,000 ms WASM gap and ~24 ms native gap reflect normal JIT/scheduling variance between runs, not a measurement inconsistency.

greptile-apps · 2026-05-31T03:37:08Z

+| roles | 79.0 | 83.2 |
+| **total** | **7,529** | **1,393** |
+
+Native `complexityMs` (17.9) is far **below** WASM (829.6) — confirming the native binary computes complexity in-engine and is **not** stale (the staleness symptom would be native complexity 50–100× higher).
+
+### 1-file rebuild (per-phase)
+
+| Phase | WASM (ms) | Native (ms) |
+|-------|-----------|-------------|
+| collect | 14.8 | 5.4 |
+| detect | 7.6 | 1.9 |


Native 1-file rebuild is slower than WASM without explanation

The per-phase table shows native is faster on every individual phase (collect 5.4 vs 14.8 ms; detect 1.9 vs 7.6 ms), summing to ~29 ms native vs ~44 ms WASM. Yet the wall-clock totals invert: native 79 ms vs WASM 56 ms. After subtracting the noop baselines (23 ms / 26 ms), the incremental cost is ~56 ms native vs ~30 ms WASM — native is nearly 2× slower. The report attributes the rebuild time to roles (~20 ms, similar on both engines), but that doesn't account for the ~27 ms gap between native's phase sum and its wall-clock total. Noting the likely cause (e.g., FFI/IPC call overhead for the native addon on small incremental payloads) would make the result self-explanatory.

Fixed — added an explanation after the 1-file rebuild table: the per-phase phase times sum to ~29 ms native vs ~44 ms WASM, but the wall-clock totals invert because FFI/IPC boundary overhead (thread dispatch + serialisation crossing the Node.js ↔ native addon boundary) dominates on single-file incremental payloads. For large full builds this overhead is amortised; for small incremental workloads it makes the native addon slower than the in-process WASM module. Noted this is a known characteristic, not a regression.

greptile-apps · 2026-05-31T03:37:09Z

+**Fix (PR #1261, branch `fix/dogfood-incremental-call-resolution`):** ported the full-build resolution semantics into `incremental.ts` — barrel-target follow in the import-scoped branch, a shared `resolveByMethodOrGlobal` helper applying the same receiver gating + `>= 0.5` confidence filter, and per-rebuild edge dedup via a `seenCallEdges` set. Result: **exact `calls`-edge parity (10,178 / 10,178, zero duplicates)** between a watch-cascade rebuild and a clean full build on the real repo. Added regression test `tests/integration/issue-1259-watch-call-resolution.test.ts` (drives `rebuildFile` directly; fails on pre-fix code, passes after). Full suite green (2,788 passed, 11 skipped, 0 failed).
+
+### #1260 — Watch cascade under-rebuilds receiver/extends/dynamic-import edges (Medium) — **OPEN**
+


calls-edge count in §9 fix validation doesn't match §2 graph state

§9 reports the fix achieved exact parity at 10,178 / 10,178 calls edges, but §2's "Built-graph state" table shows the current worktree has 10,192 calls edges. A reader may wonder whether the fix was validated against a different code snapshot. A brief note (e.g., "validated on the fix branch at commit X, which has 10,178 calls edges; the worktree at report time has 10,192 because it includes additional changes merged since") would prevent this from reading as an unexplained discrepancy.

Fixed — added a note below the §9 fix paragraph explaining that the 10,178 / 10,178 parity count was measured on the fix branch at the PR #1261 validation snapshot, while §2's 10,192 count reflects the worktree at report time (which includes additional changes merged since that snapshot). The two counts are both accurate for their respective code states.

) - Add note explaining the full-build number difference between §5/§8 Build table (7,529/1,393 ms) and §8 Incremental table (6,510/1,417 ms) — they are from separate benchmark runs with normal JIT/scheduling variance. - Explain why native wall-clock for 1-file rebuild (79 ms) exceeds WASM (56 ms) despite faster per-phase times: FFI/IPC boundary overhead dominates on small incremental payloads. - Note that the §9 calls-edge parity count (10,178) and the §2 built-graph count (10,192) refer to different code snapshots (fix branch vs worktree at report time). - Reference tracking issue #1263 for the pre-existing 1-node/1-edge WASM vs native full-build divergence.

carlos-alm · 2026-05-31T07:19:21Z

Addressed Claude's review feedback: the pre-existing 1-node/1-edge WASM vs native full-build divergence (§5, line 60) now has a tracking issue — #1263. The report text has been updated to reference #1263 rather than just noting it as 'pre-existing'. Also addressed all three Greptile inline comments in the same commit (doc clarifications for benchmark number discrepancies, 1-file rebuild wall-clock inversion explanation, and calls-edge count snapshot context).

carlos-alm · 2026-05-31T07:19:38Z

@greptileai

Issue #1263 (pre-existing 1-node/1-edge WASM vs native full-build divergence) was referenced in §5 but absent from the §11 "Issues & PRs Created" summary table. Added a row so the table is complete.

…e/codegraph into docs/dogfood-report-3.11.1

carlos-alm · 2026-05-31T19:11:19Z

Fixed — added #1263 (pre-existing 1-node/1-edge WASM vs native full-build divergence) as a row in the §11 'Issues & PRs Created' summary table. The table now lists all four items produced or tracked during this session: #1259 (fixed), #1260 (open), #1263 (open), and PR #1261.

carlos-alm · 2026-05-31T19:11:30Z

@greptileai

Add an inline note explaining that the benchmark uses --no-tests (-T), which excludes test files, accounting for the 152-file gap between the 625-file benchmark context and the 777-file built-graph state in §2.

carlos-alm · 2026-05-31T21:32:03Z

Fixed — added an inline note in §5 explaining that the benchmark was run with --no-tests (-T), which excludes test files. The 625-file count is the filtered set used for benchmarking; the 777-file count in §2 is the full worktree (includes test files and generated artefacts). The note clarifies that the speedup ratios apply to the filtered set and the graph-quality/edge-count metrics in §2 describe the full worktree.

greptile-apps Bot reviewed May 31, 2026

View reviewed changes

Merge branch 'main' into docs/dogfood-report-3.11.1

bb0d795

carlos-alm mentioned this pull request May 31, 2026

follow-up: fix pre-existing 1-node/1-edge WASM vs native full-build divergence #1263

Closed

carlos-alm and others added 3 commits May 31, 2026 12:40

Merge branch 'main' into docs/dogfood-report-3.11.1

13f5183

docs: add #1263 to §11 issues table in dogfood report (#1262)

e34ee07

Issue #1263 (pre-existing 1-node/1-edge WASM vs native full-build divergence) was referenced in §5 but absent from the §11 "Issues & PRs Created" summary table. Added a row so the table is complete.

Merge branch 'docs/dogfood-report-3.11.1' of https://github.com/optav…

3d38cd2

…e/codegraph into docs/dogfood-report-3.11.1

carlos-alm and others added 4 commits May 31, 2026 14:09

Merge branch 'main' into docs/dogfood-report-3.11.1

1522c47

Merge branch 'main' into docs/dogfood-report-3.11.1

3a40e3a

Merge branch 'main' into docs/dogfood-report-3.11.1

ad12479

docs: reconcile 625-file benchmark vs 777-file worktree in §5 (#1262)

f186045

Add an inline note explaining that the benchmark uses --no-tests (-T), which excludes test files, accounting for the 152-file gap between the 625-file benchmark context and the 777-file built-graph state in §2.

carlos-alm merged commit fbba56e into main May 31, 2026
21 checks passed

carlos-alm deleted the docs/dogfood-report-3.11.1 branch May 31, 2026 22:05

github-actions Bot locked and limited conversation to collaborators May 31, 2026

		Fix (PR #1261, branch `fix/dogfood-incremental-call-resolution`): ported the full-build resolution semantics into `incremental.ts` — barrel-target follow in the import-scoped branch, a shared `resolveByMethodOrGlobal` helper applying the same receiver gating + `>= 0.5` confidence filter, and per-rebuild edge dedup via a `seenCallEdges` set. Result: exact `calls`-edge parity (10,178 / 10,178, zero duplicates) between a watch-cascade rebuild and a clean full build on the real repo. Added regression test `tests/integration/issue-1259-watch-call-resolution.test.ts` (drives `rebuildFile` directly; fails on pre-fix code, passes after). Full suite green (2,788 passed, 11 skipped, 0 failed).

		### #1260 — Watch cascade under-rebuilds receiver/extends/dynamic-import edges (Medium) — OPEN

Conversation

carlos-alm commented May 31, 2026

Summary

Findings

Test plan

Uh oh!

claude Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: docs/dogfood-report-3.11.1

Root Cause Analysis

Phase 2: Configuration Change Detection

Phase 3: Test Integrity

Phase 4: Code Quality (Content Accuracy)

Phase 5: Backlog Compliance

Critical Concerns

Final Recommendation

Uh oh!

greptile-apps Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm May 31, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm May 31, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm May 31, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented May 31, 2026

Uh oh!

carlos-alm commented May 31, 2026

Uh oh!

carlos-alm commented May 31, 2026

Uh oh!

carlos-alm commented May 31, 2026

Uh oh!

carlos-alm commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented May 31, 2026 •

edited

Loading

greptile-apps Bot commented May 31, 2026 •

edited

Loading