docs: dogfood report for v3.11.1#1262
Conversation
|
Claude finished @carlos-alm's task —— View job PR Review: docs/dogfood-report-3.11.1
Root Cause AnalysisProblem Being Solved: Documents the dogfood validation session for Why This Approach: A structured dogfood report surfaces issues that unit/integration tests don't catch (watch-mode bugs, multi-command sweeps, embedding timeouts). The report follows established practice for this repo (prior sessions exist under Risk Assessment: Zero. This is a documentation-only commit adding a markdown report. It introduces no code changes, no configuration changes, and no test modifications. Phase 2: Configuration Change DetectionNo configuration files modified. No linting rules relaxed. No test timeouts changed. No TypeScript flags touched. Clean. Phase 3: Test IntegrityNo test files modified. The report references a new regression test ( Phase 4: Code Quality (Content Accuracy)Reviewing the report content for factual consistency and accuracy: Strengths:
Minor observations:
Phase 5: Backlog Compliance
Critical ConcernsOne item worth a follow-up but not blocking:
Final Recommendation
|
Greptile SummaryThis PR adds the v3.11.1 dogfood report documenting installation, engine validation, build/incremental/query benchmarks, and two bugs found during the session. Previous review threads have been addressed with explanatory notes covering full-build number variance, native 1-file rebuild inversion, and calls-edge count discrepancy.
Confidence Score: 5/5Documentation-only change adding a dogfood report; no executable code is modified. The PR adds a single markdown report file. All three previously flagged data inconsistencies have been addressed with explanatory notes. The only remaining issue is a minor arithmetic gap in the per-phase build totals that a reader summing the columns would notice but that does not affect any product behaviour. No files require special attention beyond the one suggestion noted on the per-phase build table in DOGFOOD_REPORT_v3.11.1.md. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Install @optave/codegraph@3.11.1] --> B[Verify native engine active]
B --> C[Cold start / command sweep]
C --> D[Full build benchmark\n625-file repo, --no-tests]
D --> E[Engine comparison\nWASM vs Native]
E --> F{1-node/1-edge delta?}
F -->|Yes - pre-existing| G[Tracked in #1263]
E --> H[1-file incremental rebuild]
H --> I[Watch cascade test\nrebuildFile]
I --> J{calls edge count correct?}
J -->|No - inflated by ~700| K[Bug #1259 — High\nUnconditional global name fallback\nNo dedup / no confidence filter]
K --> L[Fix in PR #1261\nExact parity 10,178/10,178]
L --> M{Other edge kinds OK?}
M -->|±36 edge delta remains| N[Bug #1260 — Medium\nreceiver/extends/dynamic-import\nOpen]
E --> O[Embedding benchmark]
O --> P[jina worker SIGKILL at 1800s cap\nIncomplete — harness limitation]
Reviews (11): Last reviewed commit: "docs: reconcile 625-file benchmark vs 77..." | Re-trigger Greptile |
| `roles` dominates the 1-file rebuild on both engines (~18–20 ms) — it always recomputes globally regardless of how little changed. Candidate for incremental scoping, but not a regression. | ||
|
|
||
| ### Incremental & resolution | ||
|
|
||
| | Metric | WASM | Native | |
There was a problem hiding this comment.
Inconsistent full-build numbers across sections
The "Incremental & resolution" table in §8 reports Full build | 6,510 ms | 1,417 ms, while §5's Engine Comparison table and §8's per-phase Build table both report 7,529 ms / 1,393 ms for the same "625-file source repo". The ~1,019 ms WASM gap and ~24 ms native gap are large enough to be noticeable and aren't explained (e.g., different benchmark runs, different warmup, JIT variance). A reader cross-checking §5 against §8 will see contradictory totals — a short inline note clarifying they are from separate runs would prevent confusion.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Fixed — added a note above the Incremental & resolution table explaining that the 'Full build' row (6,510 ms WASM / 1,417 ms native) is from a separate benchmark run to the per-phase Build table (7,529 ms / 1,393 ms). Both measured the same 625-file source repo on different runs; the ~1,000 ms WASM gap and ~24 ms native gap reflect normal JIT/scheduling variance between runs, not a measurement inconsistency.
| | roles | 79.0 | 83.2 | | ||
| | **total** | **7,529** | **1,393** | | ||
|
|
||
| Native `complexityMs` (17.9) is far **below** WASM (829.6) — confirming the native binary computes complexity in-engine and is **not** stale (the staleness symptom would be native complexity 50–100× higher). | ||
|
|
||
| ### 1-file rebuild (per-phase) | ||
|
|
||
| | Phase | WASM (ms) | Native (ms) | | ||
| |-------|-----------|-------------| | ||
| | collect | 14.8 | 5.4 | | ||
| | detect | 7.6 | 1.9 | |
There was a problem hiding this comment.
Native 1-file rebuild is slower than WASM without explanation
The per-phase table shows native is faster on every individual phase (collect 5.4 vs 14.8 ms; detect 1.9 vs 7.6 ms), summing to ~29 ms native vs ~44 ms WASM. Yet the wall-clock totals invert: native 79 ms vs WASM 56 ms. After subtracting the noop baselines (23 ms / 26 ms), the incremental cost is ~56 ms native vs ~30 ms WASM — native is nearly 2× slower. The report attributes the rebuild time to roles (~20 ms, similar on both engines), but that doesn't account for the ~27 ms gap between native's phase sum and its wall-clock total. Noting the likely cause (e.g., FFI/IPC call overhead for the native addon on small incremental payloads) would make the result self-explanatory.
There was a problem hiding this comment.
Fixed — added an explanation after the 1-file rebuild table: the per-phase phase times sum to ~29 ms native vs ~44 ms WASM, but the wall-clock totals invert because FFI/IPC boundary overhead (thread dispatch + serialisation crossing the Node.js ↔ native addon boundary) dominates on single-file incremental payloads. For large full builds this overhead is amortised; for small incremental workloads it makes the native addon slower than the in-process WASM module. Noted this is a known characteristic, not a regression.
| **Fix (PR #1261, branch `fix/dogfood-incremental-call-resolution`):** ported the full-build resolution semantics into `incremental.ts` — barrel-target follow in the import-scoped branch, a shared `resolveByMethodOrGlobal` helper applying the same receiver gating + `>= 0.5` confidence filter, and per-rebuild edge dedup via a `seenCallEdges` set. Result: **exact `calls`-edge parity (10,178 / 10,178, zero duplicates)** between a watch-cascade rebuild and a clean full build on the real repo. Added regression test `tests/integration/issue-1259-watch-call-resolution.test.ts` (drives `rebuildFile` directly; fails on pre-fix code, passes after). Full suite green (2,788 passed, 11 skipped, 0 failed). | ||
|
|
||
| ### #1260 — Watch cascade under-rebuilds receiver/extends/dynamic-import edges (Medium) — **OPEN** | ||
|
|
There was a problem hiding this comment.
calls-edge count in §9 fix validation doesn't match §2 graph state
§9 reports the fix achieved exact parity at 10,178 / 10,178 calls edges, but §2's "Built-graph state" table shows the current worktree has 10,192 calls edges. A reader may wonder whether the fix was validated against a different code snapshot. A brief note (e.g., "validated on the fix branch at commit X, which has 10,178 calls edges; the worktree at report time has 10,192 because it includes additional changes merged since") would prevent this from reading as an unexplained discrepancy.
There was a problem hiding this comment.
Fixed — added a note below the §9 fix paragraph explaining that the 10,178 / 10,178 parity count was measured on the fix branch at the PR #1261 validation snapshot, while §2's 10,192 count reflects the worktree at report time (which includes additional changes merged since that snapshot). The two counts are both accurate for their respective code states.
) - Add note explaining the full-build number difference between §5/§8 Build table (7,529/1,393 ms) and §8 Incremental table (6,510/1,417 ms) — they are from separate benchmark runs with normal JIT/scheduling variance. - Explain why native wall-clock for 1-file rebuild (79 ms) exceeds WASM (56 ms) despite faster per-phase times: FFI/IPC boundary overhead dominates on small incremental payloads. - Note that the §9 calls-edge parity count (10,178) and the §2 built-graph count (10,192) refer to different code snapshots (fix branch vs worktree at report time). - Reference tracking issue #1263 for the pre-existing 1-node/1-edge WASM vs native full-build divergence.
|
Addressed Claude's review feedback: the pre-existing 1-node/1-edge WASM vs native full-build divergence (§5, line 60) now has a tracking issue — #1263. The report text has been updated to reference #1263 rather than just noting it as 'pre-existing'. Also addressed all three Greptile inline comments in the same commit (doc clarifications for benchmark number discrepancies, 1-file rebuild wall-clock inversion explanation, and calls-edge count snapshot context). |
…e/codegraph into docs/dogfood-report-3.11.1
Add an inline note explaining that the benchmark uses --no-tests (-T), which excludes test files, accounting for the 152-file gap between the 625-file benchmark context and the 777-file built-graph state in §2.
|
Fixed — added an inline note in §5 explaining that the benchmark was run with |
Summary
Dogfood report for v3.11.1. Captures native-engine validation (not stale — verified via complexity-phase timing), build/incremental/query benchmarks (native ~5× faster than WASM on full builds), and the bugs found this session.
Findings
callsedges via a resolver that diverged from the full build. Fixed in fix(watch): align incremental call resolver with full build #1261.Test plan
codegraph info/stats, and issues/PR state