feat(analyzers): syntactic IMPORTS + derived OVERRIDES + tree-sitter resolver name→def fix#698
Conversation
|
Warning Review limit reached
More reviews will be available in 2 minutes and 20 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (7)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…bors Nav baseline rebased onto the #681 (MCP tools) + #698 (analyzer IMPORTS/OVERRIDES edges) confluence, dropping the duplicate analyzer-edges copy (722c8a3, byte-identical to #698's ce4ecd9) and the T13/T14 onboarding-template edits (deferred). Adds get_neighbors (symbol-id neighbor expansion over CALLS/IMPORTS/EXTENDS/OVERRIDES), get_file_neighbors (file-level structural coupling), get_importers/get_overrides spikes, and a hybrid (BM25 + structural) ranking for search_code. Cherry-pick of 799b218. PR-split note: get_file_neighbors + _resolve_file + FILE_NEIGHBOR_RELS form a self-contained trailing block (PR N2); everything else is PR N1. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…bors Nav baseline rebased onto the #681 (MCP tools) + #698 (analyzer IMPORTS/OVERRIDES edges) confluence, dropping the duplicate analyzer-edges copy (722c8a3, byte-identical to #698's ce4ecd9) and the T13/T14 onboarding-template edits (deferred). Adds get_neighbors (symbol-id neighbor expansion over CALLS/IMPORTS/EXTENDS/OVERRIDES), get_file_neighbors (file-level structural coupling), get_importers/get_overrides spikes, and a hybrid (BM25 + structural) ranking for search_code. Cherry-pick of 799b218. PR-split note: get_file_neighbors + _resolve_file + FILE_NEIGHBOR_RELS form a self-contained trailing block (PR N2); everything else is PR N1. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
3e8935f to
3d4bca3
Compare
Add language-agnostic File->File IMPORTS edges via per-analyzer import resolution (Python: dotted-module index) and derive OVERRIDES edges from the EXTENDS+DEFINES hierarchy. Wired into the analysis pipeline. Improves the graph for all consumers (HTTP API + MCP) and feeds search_code centrality. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The per-module symbol table in `_index_file` was built by zipping two independently-grouped `QueryCursor.captures()` lists (`@name` and `@def`). When `@def` positions shift relative to `@name` (e.g. decorated defs), the zip mis-pairs names with definitions, so imported-call resolution attaches CALLS edges to the wrong target — producing phantom edges to functions whose token never appears at the call site. Fix: iterate per-match via a `_matches()` helper wrapping `QueryCursor.matches()`, which guarantees each match's `@name`/`@def` captures belong together. Applied across all four indexing loops (top-level funcs, classes, assigns, class methods). Impact (deterministic graph-vs-jedi-oracle caller bench, n=40, paired, identical harness — only the resolver differs): uxarray CALLS macro-F1 0.178 → 0.713 (median 0.0 → 0.94) arkouda CALLS macro-F1 0.031 → 0.262 Adds two regression tests asserting each imported call resolves to the def whose name matches exactly (10 top-level functions, 8 classes). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
8fa2a43 to
0967507
Compare
There was a problem hiding this comment.
Pull request overview
Improves graph quality and analyzer performance by adding syntactic Python IMPORTS edges, deriving OVERRIDES edges from existing hierarchy relations, and fixing tree-sitter resolver name→definition pairing to improve CALLS accuracy.
Changes:
- Add analyzer hooks + orchestrator wiring to create
IMPORTS(File→File) edges afterfirst_pass. - Add
Graph.derive_overrides(max_depth=3)to deriveOVERRIDESedges aftersecond_pass. - Fix tree-sitter Python resolver indexing to iterate per-query match (avoids capture misalignment) and add regression tests.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
api/analyzers/analyzer.py |
Adds base hooks (needs_lsp, import index + import resolution defaults) to support static resolvers and import linking. |
api/analyzers/source_analyzer.py |
Skips Python LSP startup when analyzer can resolve statically; wires link_imports() and derive_overrides() into analysis pipeline. |
api/analyzers/python/analyzer.py |
Adds env-var-selected tree-sitter resolver and implements Python-only syntactic import resolution (build_import_index/resolve_imports). |
api/analyzers/python/ts_resolver.py |
Implements project-wide tree-sitter Python symbol resolver; fixes capture pairing via per-match iteration. |
api/graph.py |
Adds derive_overrides() Cypher query to create OVERRIDES edges post-resolution. |
tests/analyzers/test_ts_python_resolver.py |
Adds resolver unit/regression tests and env-var integration tests for LSP bypass behavior. |
Comments suppressed due to low confidence (2)
tests/analyzers/test_ts_python_resolver.py:9
pytestis imported but never used in this test module, which will failruff check(F401) if linting is enabled for tests.
import pytest
api/graph.py:508
- The
WITH DISTINCT sub, sup, length(x) AS depthclause can yield multiple rows for the same(sub,sup)when there are multiple inheritance paths, and whichever row hitsMERGEfirst will sete.depth(not necessarily the shortest distance). Aggregating tomin(length(x))makesdepthdeterministic and reflects the closest override.
RETURN collect(callee)"""
res = self._query(q, {'func_id': func_id})
return res.result_set[0][0]
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
build_import_index received every analyzed file (all languages) via self.files, indexing them by dotted module name regardless of extension. A Python `import pkg.mod` could then resolve to a same-named non-Python file (e.g. pkg/mod.java) and create spurious IMPORTS edges. Restrict the index to .py files. Add a regression test asserting a .java sibling at the same dotted path is never indexed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…bors Nav baseline rebased onto the #681 (MCP tools) + #698 (analyzer IMPORTS/OVERRIDES edges) confluence, dropping the duplicate analyzer-edges copy (722c8a3, byte-identical to #698's ce4ecd9) and the T13/T14 onboarding-template edits (deferred). Adds get_neighbors (symbol-id neighbor expansion over CALLS/IMPORTS/EXTENDS/OVERRIDES), get_file_neighbors (file-level structural coupling), get_importers/get_overrides spikes, and a hybrid (BM25 + structural) ranking for search_code. Cherry-pick of 799b218. PR-split note: get_file_neighbors + _resolve_file + FILE_NEIGHBOR_RELS form a self-contained trailing block (PR N2); everything else is PR N1. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Prerequisites (merge order)
Merge in order — this PR is stacked on:
TreeSitterAnalyzerbase class (T15)Base: #691.
What
Three graph-quality improvements on top of the T18 tree-sitter resolver (#691):
build_import_index/resolve_importshooks added to the analyzer base class;source_analyzer.link_importswires them afterfirst_pass(once every file has a graph id).graph.derive_overrides(max_depth=3)derives OVERRIDES from EXTENDS + DEFINES via Cypher MERGE, run aftersecond_pass.ts_resolver.py::_index_filebuilt the per-module symbol table byzip-ing two independently-groupedQueryCursor.captures()lists (@name,@def). Decorated defs shift@defpositions, mis-pairing names with defs, so imported-call resolution attached CALLS edges to the wrong target (phantom edges whose token never appears at the call site). Fixed via per-match iteration using a_matches()helper wrappingQueryCursor.matches(), applied across all four indexing loops (top-level funcs, classes, assigns, class methods). Adds 2 regression tests (10 top-level funcs, 8 classes) asserting each imported call resolves to the def whose name matches exactly; 16/16 resolver tests pass.Why
IMPORTS/OVERRIDES edges improve coverage for all consumers and feed
search_codehybrid-ranker centrality (cross-file in-degree). The resolver fix corrects CALLS-edge accuracy — the graph's core "who calls X" capability.Impact (CALLS resolver fix)
Deterministic graph-vs-jedi-oracle caller-accuracy bench, n=40, paired (identical harness, only the resolver differs):
Stack
Notes
Lineage note: the IMPORTS/OVERRIDES content was authored on the pre-T18 mcp-smoke-combined branch and cherry-picked here. The old jedi-era
resolve_path/resolve_typeartifacts from that lineage were intentionally dropped during the cherry-pick (T18 removed them); only the 5 new import-resolution methods + base hooks +_extract_type_target(from #691) remain.Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com