Skip to content

docs: dogfood report for v3.11.1#1262

Merged
carlos-alm merged 10 commits into
mainfrom
docs/dogfood-report-3.11.1
May 31, 2026
Merged

docs: dogfood report for v3.11.1#1262
carlos-alm merged 10 commits into
mainfrom
docs/dogfood-report-3.11.1

Conversation

@carlos-alm

Copy link
Copy Markdown
Contributor

Summary

Dogfood report for v3.11.1. Captures native-engine validation (not stale — verified via complexity-phase timing), build/incremental/query benchmarks (native ~5× faster than WASM on full builds), and the bugs found this session.

Findings

Test plan

  • Report content cross-checked against captured benchmark JSON, codegraph info/stats, and issues/PR state

Documents the v3.11.1 dogfooding session: native engine validation,
build/incremental/query benchmarks, and the watch-mode calls-edge
inflation bug (#1259, fixed in #1261) plus the residual cascade
divergence (#1260). Embedding benchmark timed out (harness limitation).
@claude

claude Bot commented May 31, 2026

Copy link
Copy Markdown

Claude finished @carlos-alm's task —— View job


PR Review: docs/dogfood-report-3.11.1

  • Gather context and understand request
  • Read changed files
  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity
  • Phase 4: Code quality
  • Phase 5: Backlog compliance
  • Post final review

Root Cause Analysis

Problem Being Solved: Documents the dogfood validation session for @optave/codegraph@3.11.1, capturing real-world testing results, performance benchmarks, and two bugs found during the session — one fixed (#1259 → PR #1261), one tracked (#1260).

Why This Approach: A structured dogfood report surfaces issues that unit/integration tests don't catch (watch-mode bugs, multi-command sweeps, embedding timeouts). The report follows established practice for this repo (prior sessions exist under generated/dogfood/).

Risk Assessment: Zero. This is a documentation-only commit adding a markdown report. It introduces no code changes, no configuration changes, and no test modifications.


Phase 2: Configuration Change Detection

No configuration files modified. No linting rules relaxed. No test timeouts changed. No TypeScript flags touched. Clean.


Phase 3: Test Integrity

No test files modified. The report references a new regression test (tests/integration/issue-1259-watch-call-resolution.test.ts) added in PR #1261 — that test should be scrutinized in that PR, not here. No test integrity concerns.


Phase 4: Code Quality (Content Accuracy)

Reviewing the report content for factual consistency and accuracy:

Strengths:

  • The native-staleness verification at line 91 is rigorous: using complexityMs (17.9 ms native vs 829.6 ms WASM) as a proxy for whether native is computing in-engine is a concrete and convincing check.
  • Bug §9 (bug: watch-mode incremental cascade inflates 'calls' edges (resolver divergence from full build) #1259) is well-documented with root cause analysis, reproducer, fix summary, and parity result (10,178 / 10,178 calls edges). The three distinct failure modes (unconditional global fallback, no dedup, broken barrel re-exports) are clearly separated.
  • The 1-node/1-edge engine delta at line 60 is correctly flagged as pre-existing rather than a new regression — consistent with honesty expectations in CLAUDE.md.
  • Improvement suggestions in §10 are specific and actionable (collapsing resolvers, embedding benchmark fast path, incremental roles scoping).

Minor observations:

  1. Line 17 — version note is ambiguous: package.json / package-lock.json left unchanged in git (no pin commit on the fix branch) — this is a process note about what was not committed. It's internally consistent but could confuse future readers who check whether 3.11.1 is pinned anywhere in source. Could note that this is intentional to avoid polluting the fix branch.

  2. Line 74 — version mismatch acknowledged but could be clearer: "benchmark JSON reports the worktree package version 3.11.2-dev.4" is noted but the phrase "the active parse engine is native 3.11.1" that follows relies on codegraph info output not shown in the report. Readers have to trust the assertion. A one-line codegraph info output snippet would make this falsifiable.

  3. Line 104 — "candidate for incremental scoping, but not a regression" — correct framing, and the improvement is already captured in §10 suggestion Bump actions/github-script from 7 to 8 #3. No issue.

  4. Line 128 (embedding benchmark) — the timeout is correctly characterized as a harness/hardware limitation. The recommendation to add --models minilm fast path is tracked in §10 suggestion Bump actions/checkout from 4 to 6 #2. This is the right call per CLAUDE.md ("never document bugs as expected behavior") — it's framed as a limitation to fix, not acceptable behavior. ✓

  5. §5 line 60 — "1-node / 1-edge delta between engines is a pre-existing minor divergence" — since CLAUDE.md explicitly says divergence between engines is a bug that should be fixed, not documented, this statement technically violates that guideline. However, this is a dogfood report acknowledging a pre-existing finding for tracking purposes, not adding test expectations that accept the wrong output. The appropriate action is to ensure an issue exists for this delta. The report doesn't appear to reference a tracking issue for this specific delta. Consider whether it warrants a gh issue create.


Phase 5: Backlog Compliance

  • Zero-dep: ✓ — Documentation only. No package.json changes.
  • Foundation-aligned: ✓ — Not applicable (no code changes).
  • Problem-fit: 3/5 — Dogfood reports are useful operational documentation but don't directly address the "lost AI" problem. They ensure release quality, which is indirectly valuable.
  • Breaking: No — Purely additive documentation.
  • Tier: 1 (documentation, no dependency or architectural concerns)

Critical Concerns

One item worth a follow-up but not blocking:

  • Line 60 references a 1-node/1-edge engine divergence as "pre-existing" without linking to a tracking issue. Per CLAUDE.md, this should not be silently accepted. If no issue exists for this divergence, one should be filed.

Final Recommendation

  • Rating: ⭐⭐⭐⭐ (4/5)
  • Action: APPROVE
  • Reasoning: This is a clean, well-structured dogfood report. The content is rigorous — the native-staleness check is concrete, the headline bug is root-caused and fixed to exact parity, and the open divergence (bug: watch-mode cascade under-rebuilds receiver/extends/dynamic-import edges vs full build #1260) is tracked rather than dismissed. The only minor gap is the pre-existing 1-node/1-edge engine delta on line 60 has no linked issue. All other content is accurate and consistent with the repo's standards. No code, config, or test changes are included — the diff is purely additive documentation.

Suggested follow-up (non-blocking): Verify that the 1-node/1-edge WASM/native divergence (§5, line 60) has a tracking issue. If not, gh issue create it so future agents know to fix it rather than accept it.

@greptile-apps

greptile-apps Bot commented May 31, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds the v3.11.1 dogfood report documenting installation, engine validation, build/incremental/query benchmarks, and two bugs found during the session. Previous review threads have been addressed with explanatory notes covering full-build number variance, native 1-file rebuild inversion, and calls-edge count discrepancy.

Confidence Score: 5/5

Documentation-only change adding a dogfood report; no executable code is modified.

The PR adds a single markdown report file. All three previously flagged data inconsistencies have been addressed with explanatory notes. The only remaining issue is a minor arithmetic gap in the per-phase build totals that a reader summing the columns would notice but that does not affect any product behaviour.

No files require special attention beyond the one suggestion noted on the per-phase build table in DOGFOOD_REPORT_v3.11.1.md.

Important Files Changed

Filename Overview
generated/dogfood/DOGFOOD_REPORT_v3.11.1.md New dogfood report for v3.11.1. Documents installation, engine comparison, benchmarks, and two bugs (#1259, #1260). Previous review threads addressed inconsistent full-build numbers, native rebuild inversion, and edge-count mismatches — all resolved with explanatory notes. One remaining minor issue: per-phase totals in the full-build table don't sum to the reported aggregate (~424 ms WASM / ~99 ms native unaccounted).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Install @optave/codegraph@3.11.1] --> B[Verify native engine active]
    B --> C[Cold start / command sweep]
    C --> D[Full build benchmark\n625-file repo, --no-tests]
    D --> E[Engine comparison\nWASM vs Native]
    E --> F{1-node/1-edge delta?}
    F -->|Yes - pre-existing| G[Tracked in #1263]
    E --> H[1-file incremental rebuild]
    H --> I[Watch cascade test\nrebuildFile]
    I --> J{calls edge count correct?}
    J -->|No - inflated by ~700| K[Bug #1259 — High\nUnconditional global name fallback\nNo dedup / no confidence filter]
    K --> L[Fix in PR #1261\nExact parity 10,178/10,178]
    L --> M{Other edge kinds OK?}
    M -->|±36 edge delta remains| N[Bug #1260 — Medium\nreceiver/extends/dynamic-import\nOpen]
    E --> O[Embedding benchmark]
    O --> P[jina worker SIGKILL at 1800s cap\nIncomplete — harness limitation]
Loading

Fix All in Claude Code

Reviews (11): Last reviewed commit: "docs: reconcile 625-file benchmark vs 77..." | Re-trigger Greptile

Comment on lines +104 to +108
`roles` dominates the 1-file rebuild on both engines (~18–20 ms) — it always recomputes globally regardless of how little changed. Candidate for incremental scoping, but not a regression.

### Incremental & resolution

| Metric | WASM | Native |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Inconsistent full-build numbers across sections

The "Incremental & resolution" table in §8 reports Full build | 6,510 ms | 1,417 ms, while §5's Engine Comparison table and §8's per-phase Build table both report 7,529 ms / 1,393 ms for the same "625-file source repo". The ~1,019 ms WASM gap and ~24 ms native gap are large enough to be noticeable and aren't explained (e.g., different benchmark runs, different warmup, JIT variance). A reader cross-checking §5 against §8 will see contradictory totals — a short inline note clarifying they are from separate runs would prevent confusion.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added a note above the Incremental & resolution table explaining that the 'Full build' row (6,510 ms WASM / 1,417 ms native) is from a separate benchmark run to the per-phase Build table (7,529 ms / 1,393 ms). Both measured the same 625-file source repo on different runs; the ~1,000 ms WASM gap and ~24 ms native gap reflect normal JIT/scheduling variance between runs, not a measurement inconsistency.

Comment on lines +88 to +98
| roles | 79.0 | 83.2 |
| **total** | **7,529** | **1,393** |

Native `complexityMs` (17.9) is far **below** WASM (829.6) — confirming the native binary computes complexity in-engine and is **not** stale (the staleness symptom would be native complexity 50–100× higher).

### 1-file rebuild (per-phase)

| Phase | WASM (ms) | Native (ms) |
|-------|-----------|-------------|
| collect | 14.8 | 5.4 |
| detect | 7.6 | 1.9 |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Native 1-file rebuild is slower than WASM without explanation

The per-phase table shows native is faster on every individual phase (collect 5.4 vs 14.8 ms; detect 1.9 vs 7.6 ms), summing to ~29 ms native vs ~44 ms WASM. Yet the wall-clock totals invert: native 79 ms vs WASM 56 ms. After subtracting the noop baselines (23 ms / 26 ms), the incremental cost is ~56 ms native vs ~30 ms WASM — native is nearly 2× slower. The report attributes the rebuild time to roles (~20 ms, similar on both engines), but that doesn't account for the ~27 ms gap between native's phase sum and its wall-clock total. Noting the likely cause (e.g., FFI/IPC call overhead for the native addon on small incremental payloads) would make the result self-explanatory.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added an explanation after the 1-file rebuild table: the per-phase phase times sum to ~29 ms native vs ~44 ms WASM, but the wall-clock totals invert because FFI/IPC boundary overhead (thread dispatch + serialisation crossing the Node.js ↔ native addon boundary) dominates on single-file incremental payloads. For large full builds this overhead is amortised; for small incremental workloads it makes the native addon slower than the in-process WASM module. Noted this is a known characteristic, not a regression.

Comment on lines +142 to +145
**Fix (PR #1261, branch `fix/dogfood-incremental-call-resolution`):** ported the full-build resolution semantics into `incremental.ts` — barrel-target follow in the import-scoped branch, a shared `resolveByMethodOrGlobal` helper applying the same receiver gating + `>= 0.5` confidence filter, and per-rebuild edge dedup via a `seenCallEdges` set. Result: **exact `calls`-edge parity (10,178 / 10,178, zero duplicates)** between a watch-cascade rebuild and a clean full build on the real repo. Added regression test `tests/integration/issue-1259-watch-call-resolution.test.ts` (drives `rebuildFile` directly; fails on pre-fix code, passes after). Full suite green (2,788 passed, 11 skipped, 0 failed).

### #1260 — Watch cascade under-rebuilds receiver/extends/dynamic-import edges (Medium) — **OPEN**

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 calls-edge count in §9 fix validation doesn't match §2 graph state

§9 reports the fix achieved exact parity at 10,178 / 10,178 calls edges, but §2's "Built-graph state" table shows the current worktree has 10,192 calls edges. A reader may wonder whether the fix was validated against a different code snapshot. A brief note (e.g., "validated on the fix branch at commit X, which has 10,178 calls edges; the worktree at report time has 10,192 because it includes additional changes merged since") would prevent this from reading as an unexplained discrepancy.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added a note below the §9 fix paragraph explaining that the 10,178 / 10,178 parity count was measured on the fix branch at the PR #1261 validation snapshot, while §2's 10,192 count reflects the worktree at report time (which includes additional changes merged since that snapshot). The two counts are both accurate for their respective code states.

)

- Add note explaining the full-build number difference between §5/§8 Build
  table (7,529/1,393 ms) and §8 Incremental table (6,510/1,417 ms) — they
  are from separate benchmark runs with normal JIT/scheduling variance.
- Explain why native wall-clock for 1-file rebuild (79 ms) exceeds WASM
  (56 ms) despite faster per-phase times: FFI/IPC boundary overhead
  dominates on small incremental payloads.
- Note that the §9 calls-edge parity count (10,178) and the §2 built-graph
  count (10,192) refer to different code snapshots (fix branch vs worktree
  at report time).
- Reference tracking issue #1263 for the pre-existing 1-node/1-edge
  WASM vs native full-build divergence.
@carlos-alm

Copy link
Copy Markdown
Contributor Author

Addressed Claude's review feedback: the pre-existing 1-node/1-edge WASM vs native full-build divergence (§5, line 60) now has a tracking issue — #1263. The report text has been updated to reference #1263 rather than just noting it as 'pre-existing'. Also addressed all three Greptile inline comments in the same commit (doc clarifications for benchmark number discrepancies, 1-file rebuild wall-clock inversion explanation, and calls-edge count snapshot context).

@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

carlos-alm and others added 3 commits May 31, 2026 12:40
Issue #1263 (pre-existing 1-node/1-edge WASM vs native full-build
divergence) was referenced in §5 but absent from the §11 "Issues &
PRs Created" summary table. Added a row so the table is complete.
@carlos-alm

Copy link
Copy Markdown
Contributor Author

Fixed — added #1263 (pre-existing 1-node/1-edge WASM vs native full-build divergence) as a row in the §11 'Issues & PRs Created' summary table. The table now lists all four items produced or tracked during this session: #1259 (fixed), #1260 (open), #1263 (open), and PR #1261.

@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

carlos-alm and others added 4 commits May 31, 2026 14:09
Add an inline note explaining that the benchmark uses --no-tests (-T),
which excludes test files, accounting for the 152-file gap between the
625-file benchmark context and the 777-file built-graph state in §2.
@carlos-alm

Copy link
Copy Markdown
Contributor Author

Fixed — added an inline note in §5 explaining that the benchmark was run with --no-tests (-T), which excludes test files. The 625-file count is the filtered set used for benchmarking; the 777-file count in §2 is the full worktree (includes test files and generated artefacts). The note clarifies that the speedup ratios apply to the filtered set and the graph-quality/edge-count metrics in §2 describe the full worktree.

@carlos-alm carlos-alm merged commit fbba56e into main May 31, 2026
21 checks passed
@carlos-alm carlos-alm deleted the docs/dogfood-report-3.11.1 branch May 31, 2026 22:05
@github-actions github-actions Bot locked and limited conversation to collaborators May 31, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant