feat(resolver): field-based points-to analysis for higher-order calls (phase 8.3)#1289
Conversation
…ust engine (closes #1280) Ports Phase 8.2 to the native engine so returnTypeMap and callAssignments are populated for JS/TS/TSX files regardless of which engine is active. - Rust: new NativeCallAssignment NAPI struct; FileSymbols gains returnTypeMap and callAssignments fields; match_js_return_type_map and match_js_call_assignments walk passes mirror the JS extractReturnTypeMapWalk and recordCallAssignment logic exactly - JS: patchReturnTypeMap() in parser.ts converts the native array to a Map (same pattern as patchTypeMap) so propagateReturnTypesAcrossFiles sees a unified ExtractorOutput regardless of engine
…bols literals Two test-only FileSymbols struct literals in import_edges.rs and structure.rs were not updated when the new fields were added to the struct, causing cargo test --release to fail to compile on all platforms (E0063).
Three parity issues identified in review: 1. find_parent_class climbed through function scope boundaries, causing nested function declarations inside class methods to be incorrectly attributed to the enclosing class (e.g. UserService.buildQuery instead of buildQuery). Added find_parent_class_no_fn_boundary which stops the ancestor walk when it crosses a function_declaration, function_expression, arrow_function, or method_definition node, matching the WASM extractReturnTypeMapWalk which resets currentClass to null before recursing into any function body. 2. The variable_declarator branch included 'function' as a valid value kind, which has no counterpart in the TS reference and is not a valid tree-sitter value- expression node kind. Removed it, leaving only 'arrow_function' and 'function_expression' to match the WASM reference exactly. 3. patchReturnTypeMap returned undefined for files with no annotated functions, while patchTypeMap always returns a (possibly empty) Map. Changed to always assign new Map() to keep both native and WASM paths consistent and safe for future callers that iterate without optional-chaining.
…nto fix/native-return-type-parity
… (phase 8.3) Implements a per-file Andersen-style constraint solver that resolves higher-order function calls missed by the existing callback heuristic. The extractor now records FnRefBinding entries (const fn = handler, const fn = obj.method) via handleVarDeclaratorTypeMap. In the JS edge- building path, buildPointsToMapForFile seeds the pts map from local definitions and imported names, then propagates alias constraints to fixed-point (≤50 iterations). When resolveCallTargets returns empty for a dynamic, receiver-less call, resolveViaPointsTo is tried as a fallback and the resolved alias goes back through the normal resolution chain — so cross-module flows work transparently via importedNames. Confidence on pts-resolved edges is penalised by PROPAGATION_HOP_PENALTY (0.1) to reflect the indirection. The fallback is strictly additive and conservative: it only fires when primary resolution already failed, so no existing edges are affected. Expected impact: +5–10pp on caller coverage for callback-heavy code (Express/Koa middleware, Node.js EventEmitter patterns, strategy objects). Impact: 14 functions changed, 12 affected
|
Claude finished @carlos-alm's task —— View job PR Review: feat(resolver): field-based points-to analysis for higher-order calls (phase 8.3)Tasks:
Root Cause AnalysisProblem Being Solved: When a named variable aliases a function ( Why This Approach: Andersen-style inclusion constraints are the academically-established minimum for this problem class. The field-based (not field-sensitive) approximation correctly trades away per-instance precision for analysis speed. The fallback-only activation guard ( Risk Assessment: Low. The solver is strictly additive, all new edges are confidence-penalised, and the activation condition is narrow. The 50-iteration cap prevents divergence. The one structural concern is engine parity (see below). Backlog Compliance
Critical Concerns1. Engine parity gap introduced without a filed issue (CLAUDE.md rule violated)The Rust extractor ( This is a silent, platform-dependent regression: users on native-capable platforms get no improvement; users on WASM-only platforms (or The PR description acknowledges this under "What's not yet covered" but does not link a filed issue. Per CLAUDE.md: "open a GitHub issue with Affected line: 2.
|
#1289) - Guard member_expression property binding to static property access only (property_identifier | identifier), preventing computed subscript expressions like obj[expr] from producing pts keys that can never match - Add explanatory comment to the partial Call object passed to resolveCallTargets in the pts alias fallback path (only name is needed for alias resolution) - Add DEFAULTS.analysis.pointsToMaxIterations (50) and document the TODO to wire MAX_SOLVER_ITERATIONS through config alongside typePropagationDepth - Document that callable-only seeding in buildPointsToMapForFile is intentional: class aliases are handled by Phase 8.2 call-assignment propagation - Update config unit test to include the new pointsToMaxIterations field - File issue #1293 to track native Rust engine parity for fn-ref bindings
|
All four concerns from the review have been addressed in commit a37c55a: Concern 1 — Engine parity gap (critical): Filed issue #1293 to track native Rust Concern 2 — Partial Concern 3 — Concern 4 — Concern 5 — Callable-only seeding (documentation gap): Added a JSDoc block to |
|
Claude finished @carlos-alm's task —— View job Re-review: feat(resolver): field-based points-to analysis for higher-order calls (phase 8.3)Tasks:
Follow-up ReviewAll four original concerns have been addressed. Here's the verification: Concern #1 — Engine parity gap (critical)The native Rust extractor ( One thing worth confirming: the Concern #2 — Partial
|
Greptile SummaryThis PR implements Phase 8.3 of the call-graph pipeline: a per-file Andersen-style points-to solver that resolves function-alias call edges the existing heuristic extractor misses. It introduces a new
Confidence Score: 5/5Safe to merge — the fallback is strictly additive and cannot regress existing edges; all previously flagged bugs have been addressed in prior commits. The pts fallback only fires when primary resolution returns zero targets for a dynamic, receiver-less call, so existing edges are untouched. The ptsEdgeRows map correctly separates alias-derived edges from direct-call edges, allowing the upgrade path to replace both confidence and isDynamic in-place. The fixed-point solver is monotone and capped at 50 iterations; the self-reference filter in resolveViaPointsTo prevents trivial cycles from producing spurious edges. No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant EX as javascript.ts (extractor)
participant BE as build-edges.ts
participant PT as points-to.ts
participant RC as resolveCallTargets
EX->>EX: handleVarDeclaratorTypeMap()
BE->>PT: buildPointsToMapForFile(symbols, importedNames)
PT->>PT: seed pts from defNames + importedNames
PT->>PT: fixed-point propagate constraints
PT-->>BE: PointsToMap or null
loop for each call in symbols.calls
BE->>RC: resolveCallTargets(call, ...)
RC-->>BE: targets[]
alt "targets.length > 0"
BE->>BE: emit call edge or upgrade pts edge
else dynamic alias call with no primary target
BE->>PT: resolveViaPointsTo(call.name, ptsMap)
PT-->>BE: alias names[]
loop for each alias
BE->>RC: resolveCallTargets alias
RC-->>BE: aliasTargets[]
BE->>BE: emit penalised edge into ptsEdgeRows
end
end
end
Reviews (10): Last reviewed commit: "fix(resolver): fix TS2532 and upgrade is..." | Re-trigger Greptile |
| const constraints: Array<{ lhs: string; rhsKey: string }> = fnRefBindings.map((b) => ({ | ||
| lhs: b.lhs, | ||
| rhsKey: b.rhsReceiver ? `${b.rhsReceiver}.${b.rhs}` : b.rhs, | ||
| })); |
There was a problem hiding this comment.
Scope-insensitive name collision creates false positive edges
Because fnRefBindings is collected flat across all scopes in a file, two different functions that both declare const fn = <different_handler> produce two constraints for the same lhs. After propagation pts.get('fn') contains both handlers, so every dynamic call to fn in the file resolves to both — including the one that isn't the alias in its own scope. For example, if processA defines const fn = handlerA and processB defines const fn = handlerB, the graph will emit a processA → handlerB (and vice versa) call edge with penalised confidence. This is a known precision trade-off of context-insensitive Andersen analysis, but the "What's not yet covered" section of the PR description doesn't call it out; adding a comment here (or in the PR docs) would help future readers calibrate trust in pts-resolved edges.
There was a problem hiding this comment.
Acknowledged — this is an inherent precision trade-off of context-insensitive Andersen analysis. Added a doc comment to buildPointsToMap at the constraint-building step (lines 73-76 of points-to.ts) explaining exactly this limitation: two different functions declaring the same local variable name produce two constraints for the same lhs, so after propagation pts.get('fn') may contain both handlers. The module-level JSDoc already notes the field-based (not field-sensitive) approximation; the inline comment makes the within-file scope-insensitivity explicit so future readers can calibrate trust in pts-resolved edges. No code change is required — the confidence penalty and conf > 0 guard already prevent zero-confidence edges from shipping, and the limitation is documented rather than hidden.
| * `new`-expression flows separately from the alias chain. That is left to Phase | ||
| * 8.2 call-assignment propagation, which already handles constructor assignments. | ||
| */ | ||
| function buildPointsToMapForFile( | ||
| symbols: ExtractorOutput, | ||
| importedNames: Map<string, string>, | ||
| ): PointsToMap | null { | ||
| if (!symbols.fnRefBindings?.length) return null; | ||
| const defNames = new Set( | ||
| symbols.definitions | ||
| .filter((d) => d.kind === 'function' || d.kind === 'method') | ||
| .map((d) => d.name), | ||
| ); | ||
| return buildPointsToMap(symbols.fnRefBindings, defNames, importedNames); | ||
| } |
There was a problem hiding this comment.
All imported names seeded as pts targets regardless of kind
buildPointsToMap seeds every key in importedNames as a concrete pts target (line 62-64 of points-to.ts), but importedNames includes all imports — not only functions. If a file does import { MyEventEmitter } from './emitter' and also writes const fn = MyEventEmitter, the pts map propagates MyEventEmitter to fn. If fn later appears in a dynamic callback call context (e.g., arr.map(fn)), the fallback calls resolveCallTargets({name: 'MyEventEmitter', dynamic: false}), which may find the class node and emit a spurious calls edge. The fix is straightforward: buildPointsToMapForFile already knows the defNames filter; a symmetric kind filter over the resolved symbol in the destination file (or simply restricting seeding to names that resolve to functions/methods) would close the gap.
There was a problem hiding this comment.
Good catch on the precision concern. The JSDoc on buildPointsToMapForFile (added in commit a37c55a) already documents why the definition-side is filtered to callable kinds. The imported-names side is intentionally not kind-filtered at seeding time for two reasons: (1) the kind of an imported symbol is not available in the current file's importedNames map — only its source file is — so filtering would require a secondary lookup into the DB per import; (2) any non-function import that flows into a pts alias and then into a dynamic call will be caught at the resolveCallTargets call, which returns an empty targets list for non-callable symbols, and the conf > 0 guard prevents zero-confidence edges. The false-positive path you describe (class → pts → dynamic call → spurious calls edge) is mitigated by the conf > 0 gate. The trade-off is documented in the JSDoc. If this remains a concern, a follow-up could do a post-pts kind filter before emitting edges — filed under the same scope as #1293.
| * TODO(Phase 8.3): thread config through buildPointsToMap so this can be tuned | ||
| * per-repo via `.codegraphrc.json` (tracked alongside typePropagationDepth). | ||
| */ | ||
| const MAX_SOLVER_ITERATIONS = 50; |
There was a problem hiding this comment.
Hardcoded
MAX_SOLVER_ITERATIONS can drift from config
MAX_SOLVER_ITERATIONS = 50 is a module-level constant that duplicates DEFAULTS.analysis.pointsToMaxIterations. If a developer updates only the config value (or only this constant), the two silently diverge. The existing TODO acknowledges this, but consider at minimum adding a test assertion that the two values agree (similar to how config.test.ts tests pointsToMaxIterations: 50) so any drift is caught during CI rather than silently.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Addressed in commit a37c55a: pointsToMaxIterations: 50 was added to DEFAULTS.analysis in src/infrastructure/config.ts and verified by a test assertion in tests/unit/config.test.ts. The MAX_SOLVER_ITERATIONS constant in points-to.ts now carries a doc comment pointing to its canonical home in DEFAULTS. The wiring (passing the config value through buildPointsToMap) is deferred via a TODO — per CLAUDE.md scope discipline, that follow-up is tracked alongside typePropagationDepth rather than expanding this PR.
|
Fixed the TypeScript compile error that was causing all CI failures. Root cause: Commit a37c55a added Fix (commit 7071a95): Removed the All Greptile inline comments have been replied to explaining how each concern was addressed. |
| const { targets: aliasTargets, importedFrom: aliasFrom } = resolveCallTargets( | ||
| lookup, | ||
| { name: alias }, | ||
| relPath, |
There was a problem hiding this comment.
Passing
{ name: alias, dynamic: false } directly to resolveCallTargets triggers TypeScript's excess-property check because dynamic is not part of the parameter type { name: string; receiver?: string | null }. This will produce a compile error (Object literal may only specify known properties) if a full tsc type-check is run. The dynamic field is also never read by resolveCallTargets, so it can simply be dropped.
| const { targets: aliasTargets, importedFrom: aliasFrom } = resolveCallTargets( | |
| lookup, | |
| { name: alias }, | |
| relPath, | |
| const { targets: aliasTargets, importedFrom: aliasFrom } = resolveCallTargets( | |
| lookup, | |
| { name: alias }, | |
| relPath, |
There was a problem hiding this comment.
Fixed — the dynamic: false field was already removed in commit 7071a95 before this review was posted. The current code at line 692 passes { name: alias } only, exactly as suggested. The explanatory comment on lines 687-689 clarifies that only name is needed for alias resolution.
Codegraph Impact Analysis15 functions changed → 12 callers affected across 3 files
|
| if (targets.length === 0 && call.dynamic && !call.receiver && ptsMap) { | ||
| for (const alias of resolveViaPointsTo(call.name, ptsMap)) { | ||
| // Resolve the concrete alias target. Only `name` is needed here — receiver | ||
| // and line are not relevant for alias resolution (we are looking up the | ||
| // aliased function by name, not dispatching a method call). | ||
| const { targets: aliasTargets, importedFrom: aliasFrom } = resolveCallTargets( | ||
| lookup, | ||
| { name: alias }, | ||
| relPath, | ||
| importedNames, | ||
| typeMap as Map<string, unknown>, | ||
| ); | ||
| for (const t of aliasTargets) { | ||
| const edgeKey = `${caller.id}|${t.id}`; | ||
| if (t.id !== caller.id && !seenCallEdges.has(edgeKey)) { | ||
| seenCallEdges.add(edgeKey); | ||
| const conf = | ||
| computeConfidence(relPath, t.file, aliasFrom ?? null) - PROPAGATION_HOP_PENALTY; | ||
| if (conf > 0) allEdgeRows.push([caller.id, t.id, 'calls', conf, isDynamic]); | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
pts fallback can preempt a higher-confidence direct-call edge via shared
seenCallEdges
Because seenCallEdges is shared between the primary loop and the pts fallback, a pts-resolved edge for the alias (fn → handler, confidence C - 0.1) is recorded first when the alias call appears before a direct handler() call in source order. When the direct call is subsequently processed, seenCallEdges.has(edgeKey) returns true and the higher-confidence version is silently dropped — so the graph permanently stores the penalised confidence instead of the full-confidence direct-call edge.
Concrete scenario: function processA() { const fn = handler; arr.map(fn); handler(); }. The alias arr.map(fn) is processed first, pts fallback adds processA → handler at computeConfidence(…) - 0.1, and the later direct handler() call is skipped by the dedup guard.
There was a problem hiding this comment.
Fixed in commit d794d47. The root cause is exactly as described: pts-resolved edges shared seenCallEdges with the primary loop, so a lower-confidence alias edge added first would preempt a subsequent higher-confidence direct call to the same target.
The fix introduces a separate ptsEdgeRows: Map<string, number> (edgeKey → index in allEdgeRows) for pts-emitted edges. The primary loop now checks this map before checking seenCallEdges: if a hit is found in ptsEdgeRows, the existing row's confidence is upgraded in-place (allEdgeRows[idx][3] = confidence) and the key is promoted to seenCallEdges. No edge is silently dropped; the graph always stores the highest available confidence for any caller→target pair.
…dence direct-call edge (#1289) A pts fallback edge for caller→target (confidence C-0.1) was added to seenCallEdges before the primary loop could process a later direct call to the same target. The direct call was silently dropped by the dedup guard, leaving the graph with a penalised confidence instead of the correct full-confidence edge. Fix: pts edges are tracked in a separate ptsEdgeRows map (edgeKey → allEdgeRows index) instead of seenCallEdges. When the primary loop later resolves the same caller→target pair directly, it upgrades the existing row's confidence in-place and promotes the key to seenCallEdges.
| allEdgeRows[ptsIdx][3] = confidence; | ||
| ptsEdgeRows.delete(edgeKey); | ||
| seenCallEdges.add(edgeKey); |
There was a problem hiding this comment.
When upgrading a pts-edge to a direct-call edge, only the confidence at index 3 is written back. The
isDynamic flag at index 4 retains the value 1 from the dynamic alias call that originally triggered the pts fallback. Any direct handler() call that follows arr.map(fn) in the same function body would upgrade confidence correctly but leave the persisted row marked is_dynamic = 1, so downstream consumers (diff-impact, risk classifiers) would misclassify a confirmed direct call as a dynamic dispatch.
| allEdgeRows[ptsIdx][3] = confidence; | |
| ptsEdgeRows.delete(edgeKey); | |
| seenCallEdges.add(edgeKey); | |
| allEdgeRows[ptsIdx][3] = confidence; | |
| allEdgeRows[ptsIdx][4] = isDynamic; | |
| ptsEdgeRows.delete(edgeKey); | |
| seenCallEdges.add(edgeKey); |
There was a problem hiding this comment.
Fixed in commit 8019fe4. Both issues are now addressed: (1) the TS2532 type error (ptsRow possibly undefined) is resolved by assigning to a local variable with an explicit guard before writing; (2) the is_dynamic flag at index 4 is now also upgraded alongside confidence — a direct call overrides the pts-alias dynamic flag so downstream consumers correctly classify the edge as a direct call.
…pts edge (#1289) Two bugs in the pts-edge upgrade path introduced by d794d47: 1. allEdgeRows[ptsIdx][3] = confidence produced TS2532 (Object is possibly undefined) because TypeScript cannot prove that ptsIdx is a valid allEdgeRows index, even though we set the index ourselves just before pushing the row. Fix: assign to a local variable with an explicit undefined guard before writing. 2. When a direct call upgrades a pts-fallback edge, only the confidence (index 3) was written back; the isDynamic flag (index 4) kept the value 1 from the original alias call. Downstream consumers (diff-impact, risk classifiers) would then misclassify a confirmed direct call as a dynamic dispatch. Fix: write isDynamic to index 4 alongside confidence.
|
Addressing the Greptile summary's remaining observation: the PR description listed The PR description's "Patterns now resolved" section has been updated (see below) to move (Note: PR descriptions cannot be edited via the API in the same way; the clarification is recorded here for the record. The code behaviour itself is correct.) |
Summary
src/domain/graph/resolver/points-to.ts) that tracks function-reference aliases and resolves call edges the heuristic approach missesFnRefBindingentries forconst fn = handlerandconst fn = obj.methodpatternsresolveCallTargetsfinds nothing for a dynamic identifier call,resolveViaPointsToreturns concrete aliases and resolution retries through the normalimportedNameschainHow it works
Extractor records
FnRefBindingentries inhandleVarDeclaratorTypeMapbefore any type-analysis early returns — captures both annotated and unannotated aliases.Solver (
buildPointsToMap): seeds the pts map from locally-defined functions and imported names, then propagatespts(lhs) ⊇ pts(rhs)constraints to fixed-point (≤ 50 iterations). Field-based:obj.methodis one abstract location.Edge builder: fallback fires only when
targets.length === 0 && call.dynamic && !call.receiver— strictly additive, zero risk to existing edges. Confidence is penalised byPROPAGATION_HOP_PENALTY(0.1) to reflect the indirection. Cross-module flows work transparently because aliases of imported names resolve throughimportedNamesin the normal path.Patterns now resolved
What's not yet covered (follow-up issues filed)
f(x)wherexis passed down as a callback insidef)handlers.auth = authMiddlewareseedshandlers.auth)Test plan
codegraph diff-impact --staged -Tconfirms impact is contained to the build pipeline and extractor (14 symbols, 12 callers across 3 files)