bench(jelly-micro): import 59 Jelly micro-test fixtures for JS coverage#1376
Conversation
|
Claude finished @carlos-alm's task —— View job PR Review: bench(jelly-micro): import 59 Jelly micro-test fixtures for JS coverageTasks:
Root Cause AnalysisProblem Being Solved: The existing benchmark suite had only 6 fixtures, providing insufficient signal for JS call-resolution quality. This PR imports 59 ground-truth call-graph fixtures from Jelly's academic micro-test corpus to give the benchmark 65 tests covering a much wider set of JS features (classes, closures, generators, async/await, prototypes, etc.). Why This Approach: Using Jelly's published micro-tests is a principled approach — these are academically authored, well-understood ground truth fixtures. The conversion script ( Risk Assessment: Mostly additive (test fixtures only), but there are specific correctness and integrity concerns detailed below. Phase 2: Configuration ChangesNo configuration files, linter rules, TypeScript settings, or test thresholds were modified. ✓ Phase 3: Test Integrity — CONCERNS FOUND1. Zero-threshold gate (non-blocking but worth flagging)
expect(recall).toBeGreaterThanOrEqual(0);This passes trivially for every fixture, including ones with 0% recall. The comment says "Soft gate: recall must be ≥ 0% (we don't gate yet — this benchmark is diagnostic)". That's a reasonable policy for a new baseline PR, but it means CI never actually gates on regression in this suite. A fixture that used to have 100% recall could drop to 0% with no CI failure. Consider adding a minimal per-test floor once the baseline stabilizes — even 2.
|
| # | Severity | Location | Issue |
|---|---|---|---|
| 1 | Medium | more1/more1.js vs more1/expected-edges.json |
Function names mismatch (_iterPlain vs iterPlain) — edges can never score as TP |
| 2 | Medium | spawn-cwd/expected-edges.json |
References index.js/lib3.js which don't exist in the fixture dir |
| 3 | Low | jelly-micro.test.ts:198 |
Zero-threshold CI gate — no actual regression protection |
| 4 | Low | generators/expected-edges.json:2 |
Wrong $schema path (3 levels up instead of 2) |
| 5 | Info | more1 |
Hand-authored fixture mixed in with Jelly imports, wrong provenance |
Final Recommendation
- Rating: ⭐⭐⭐☆☆ (3/5)
- Action: REQUEST CHANGES
- Reasoning: The fixture corpus is valuable and the approach is sound — using Jelly's academically-authored micro-tests as ground truth is exactly the right strategy. The test infrastructure is well-designed. However, bugs Bump actions/setup-node from 4 to 6 #1 and Bump actions/checkout from 4 to 6 #2 introduce silently incorrect baselines: the
more1name mismatch will cause allmore1edges to permanently read as FN (0% recall) even if codegraph correctly resolves them, and thespawn-cwdedges reference files that will never be in the graph. These aren't cosmetic — they corrupt the benchmark numbers for those fixtures. Fix the two data bugs before merging. The other items are low-severity and can be addressed as follow-on issues.
Codegraph Impact Analysis108 functions changed → 61 callers affected across 37 files
|
Adds 59 ground-truth call-graph fixtures from Jelly's micro-test corpus (github.com/cs-au-dk/jelly/tests/micro), giving the benchmark suite 65 total Jelly fixtures with 333 Jelly edges (129 named/scoreable). Baseline result (WASM engine): precision=76.7% recall=25.6% (TP=33 FP=10 FN=96 across 24 tests with named edges) Fix function name mismatch in more1 (iterPlain/iterSet had underscore prefix in source but not in expected-edges), clear spawn-cwd edges that reference missing files, and correct generators $schema path. Closes #1309
84add3b to
e5bac8e
Compare
|
Addressed the concerns from the automated review: Bug #1 (more1 name mismatch) — Fixed: Removed underscore prefixes from Bug #2 (spawn-cwd cross-process edges) — Fixed: Cleared the edges array in Issue #3 (zero-threshold CI gate) — Tracked: The Issue #4 (generators $schema path) — Fixed: Corrected Issue #5 (more1 provenance) — Tracked: Filed #1388 to track moving All fixes are squashed into the single commit and force-pushed to fix the commitlint |
|
Claude finished @carlos-alm's task —— View job Claude finished @carlos-alm's task —— View job Re-review: bench(jelly-micro): import 59 Jelly micro-test fixtures for JS coverageTasks:
All three medium/low bugs from the previous review have been correctly addressed: Bug #1 (more1 name mismatch) — ✅ Verified Bug #2 (spawn-cwd cross-process edges) — ✅ Verified Issue #4 (generators $schema path) — ✅ Verified Issues #3 and #5 — Appropriately deferred Rating: ✅ Approved (5/5) The two data-correctness bugs that would have silently corrupted the benchmark numbers are fixed. The fixture corpus is now a sound baseline. Ready to merge. |
Greptile SummaryAdds 59 ground-truth call-graph fixture pairs from Jelly's micro-test corpus into the benchmark suite, raising the total from 6 to 65 fixtures. The import script's
Confidence Score: 5/5Safe to merge — the change is purely additive fixture data plus a one-line path correction in the import script, with no modifications to production code paths. All 59 newly-added expected-edges.json manifests use valid schema values and the corrected ../../../expected-edges.schema.json path. The only logic change is a single constant fix in the import script. Previously flagged issues (schema path and missing companion lib files) are either resolved here or already tracked in a follow-up issue (#1391). No new defects are introduced. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[import-jelly-micro.mjs] -->|--src or --fetch| B[Collect .js + .json pairs]
B --> C[convertJellyGraph]
C --> D[buildNameMap regex scan]
D --> E[Map position to function name]
E --> F[Generate edges array]
F --> G[Write fixture dir]
G --> H[testname.js]
G --> I[expected-edges.json\n$schema: ../../../...]
I --> J[jelly-micro.test.ts]
J -->|namedExpected.length > 0| K[buildGraph fixtureDir]
K --> L[SQLite edges query]
L --> M[Recall / Precision scoring]
J -->|namedExpected.length == 0| N[Skip buildGraph\nrecord zeros]
Reviews (3): Last reviewed commit: "fix(jelly-micro): correct SCHEMA constan..." | Re-trigger Greptile |
| @@ -1,5 +1,5 @@ | |||
| { | |||
| "$schema": "../../../expected-edges.schema.json", | |||
| "$schema": "../../expected-edges.schema.json", | |||
There was a problem hiding this comment.
$schema path regressed from correct to broken
Before this PR, generators/expected-edges.json had "../../../expected-edges.schema.json", which correctly navigated 3 levels up to tests/benchmarks/resolution/expected-edges.schema.json. This PR changed it to "../../expected-edges.schema.json", which resolves to tests/benchmarks/resolution/fixtures/expected-edges.schema.json — a path that does not exist. All 59 newly-added expected-edges.json files make the same off-by-one error. IDE/editor schema validation will silently fail for the entire jelly-micro corpus. The pre-existing sibling file accessors3/expected-edges.json demonstrates the correct "../../../expected-edges.schema.json" pattern.
There was a problem hiding this comment.
Fixed — all 65 jelly-micro expected-edges.json files now use ../../../expected-edges.schema.json (3 levels up to reach resolution/). The root cause was that 64 files had ../../ which resolves to the non-existent fixtures/ directory. Committed in 0aaa45e.
| var x = new Function("console.log('WORLD')"); | ||
| x(); | ||
|
|
||
| require('./lib1'); |
There was a problem hiding this comment.
Missing dependency file
lib1 in fixture directory
eval.js calls require('./lib1') but no lib1.js is committed alongside it. The same issue affects require-extensions/require-extensions.js (require('./lib1.js')), and several client* fixtures that require lib2, lib3, lib4, lib5a/lib5b, and index.js. Currently this is harmless because all those fixtures have only anonymous/root edges and buildGraph is never called for them. However, as soon as a future author adds named edges to any of these manifests, buildGraph will be invoked against a directory with unresolvable require targets, likely causing it to fail or produce misleading results.
There was a problem hiding this comment.
Tracked in #1391. All affected fixtures (client1–client6, client9, client-this, eval, require-extensions) currently have zero named edges so buildGraph is never invoked for them — the namedExpected.length === 0 guard in jelly-micro.test.ts prevents it. The latent risk (a future author adding named edges without noticing the missing companion files) is captured in #1391 for follow-up.
…es (#1376) All jelly-micro expected-edges.json files live three directory levels below the schema file (resolution/fixtures/jelly-micro/<test>/), so the relative path must be ../../../expected-edges.schema.json — not ../../, which resolves to the non-existent resolution/fixtures/ directory. Only accessors3 had the correct path; the other 64 files are fixed here.
|
Addressed Greptile's second review findings: Issue 1 (SCHEMA constant in import-jelly-micro.mjs): Fixed — updated Issue 2 (fun fixture: variable |
Summary
github.com/cs-au-dk/jelly/tests/micro) intotests/benchmarks/resolution/fixtures/jelly-micro/How it was done
The existing
scripts/import-jelly-micro.mjsfetched.js+.jsonpairs fromtests/micro/via the GitHub API, converted Jelly's position-based call graphs to codegraph's name-basedexpected-edges.jsonformat, and formatted each file with Biome before committing.Per-feature highlights
The 0% categories (classes inheritance, prototype chains, spread, template literals) are the clearest gaps to address next.
Test plan
npx vitest run --config .jelly-bench.config.tspasses all 65 testsjelly-micro.test.ts(skips when fixture dir is absent — but fixtures are now committed)Closes #1309