refactor(core-rs): decompose pipeline, read_queries, edge_builders; collapse walker recursion#1232
Conversation
docs check acknowledged - Rust internal helper extraction, no user-facing changes
|
Claude finished @carlos-alm's task —— View job PR Review CompleteReviewing PR #1232: refactor(core-rs): decompose pipeline, read_queries, edge_builders; collapse walker recursion Review Checklist:
Phase 1: Root Cause Analysis ✅Problem Being Solved: This PR addresses code complexity issues in the Rust core engine, specifically targeting the two worst Halstead.bugs offenders (
Why This Approach: The approach appears justified as:
Risk Assessment: Low-medium risk since this is pure refactoring without functional changes, but substantial size (2029 additions, 1531 deletions) requires careful verification. Phase 2: Configuration Change Detection ✅✅ No configuration changes detected - This is purely a code refactoring:
Phase 3: Test Integrity ✅✅ No test files modified - Tests remain unchanged:
Phase 4: Code Quality Assessment ✅Positive Aspects:
Quality Metrics:
Architectural Improvements:
Phase 5: Backlog Compliance ✅Zero-dep: ✅ - No new runtime dependencies added
Problem-fit: 4/5 - Directly improves agent accuracy by:
Breaking: No - Pure internal refactoring with no API changes Tier: 1 (zero-dep ✅ + foundation-aligned ✅) Root Cause AnalysisProblem Being Solved: Technical debt in Rust core engine targeting specific Halstead.bugs complexity offenders Backlog Compliance
Critical ConcernsNone identified. This is well-structured technical debt cleanup. Final Recommendation
|
Codegraph Impact Analysis75 functions changed → 46 callers affected across 10 files
|
Greptile SummaryThis PR decomposes three large, high-Halstead-complexity modules (
Confidence Score: 5/5Pure structural refactor with no functional logic changes — all extracted helpers are faithful one-to-one decompositions of the original code, verified by tracing parameter flow and SQL construction through each extraction site. Every helper was cross-checked against the original inline logic: SQL parameter indexing, scope push/pop order in the dataflow walker, barrel-through edge dedup, .js→.ts/.tsx probe order, and BFS visited-set seeding all match exactly. The two concrete correctness fixes were already flagged and addressed in prior review cycles. No new defects introduced. No files require special attention — all extraction sites were verified to produce identical runtime behaviour. Important Files Changed
Reviews (6): Last reviewed commit: "Merge branch 'main' into refactor/titan-..." | Re-trigger Greptile |
| } | ||
|
|
||
| // Convert to FnDepsCallerNode for output (strip id) | ||
| let callers: Vec<FnDepsCallerNode> = callers_with_id.iter().map(|c| FnDepsCallerNode { | ||
| name: c.name.clone(), | ||
| kind: c.kind.clone(), | ||
| file: c.file.clone(), | ||
| line: c.line, | ||
| via_hierarchy: c.via_hierarchy.clone(), | ||
| }).collect(); | ||
|
|
||
| // BFS transitive callers — reuse callers_with_id as initial frontier | ||
| let transitive_callers = if depth > 1 { | ||
| let mut visited = HashSet::new(); | ||
| visited.insert(node.id); | ||
| let initial_frontier: Vec<CallerWithId> = callers_with_id.iter().map(|c| CallerWithId { | ||
| id: c.id, name: c.name.clone(), kind: c.kind.clone(), file: c.file.clone(), line: c.line, via_hierarchy: c.via_hierarchy.clone(), | ||
| }).collect(); | ||
| let mut frontier: Vec<CallerWithId> = initial_frontier; | ||
| let mut groups: Vec<FnDepsTransitiveGroup> = Vec::new(); | ||
|
|
||
| for d in 2..=depth { | ||
| let unvisited: Vec<&CallerWithId> = frontier.iter() | ||
| .filter(|f| !visited.contains(&f.id)) | ||
| .collect(); | ||
| for f in &unvisited { | ||
| visited.insert(f.id); | ||
| } | ||
| if unvisited.is_empty() { break; } | ||
|
|
||
| // Batch query: find all callers of the unvisited frontier | ||
| let mut next_frontier: Vec<CallerWithId> = Vec::new(); | ||
| let mut next_ids = HashSet::new(); | ||
| for f in &unvisited { | ||
| let mut stmt = conn.prepare_cached( | ||
| "SELECT n.id, n.name, n.kind, n.file, n.line \ | ||
| FROM edges e JOIN nodes n ON e.source_id = n.id \ | ||
| WHERE e.target_id = ?1 AND e.kind = 'calls'" | ||
| ).map_err(|e| napi::Error::from_reason(format!("fn_deps bfs prepare: {e}")))?; | ||
| let rows = stmt.query_map(params![f.id], |row| { | ||
| Ok(CallerWithId { | ||
| id: row.get("id")?, | ||
| name: row.get("name")?, | ||
| kind: row.get("kind")?, | ||
| file: row.get("file")?, | ||
| line: row.get("line")?, | ||
| via_hierarchy: None, | ||
| }) | ||
| }).map_err(|e| napi::Error::from_reason(format!("fn_deps bfs: {e}")))?; | ||
| let upstream: Vec<CallerWithId> = rows.collect::<Result<Vec<_>, _>>() | ||
| .map_err(|e| napi::Error::from_reason(format!("fn_deps bfs collect: {e}")))?; | ||
| for u in upstream { | ||
| if no_tests && is_test_file(&u.file) { continue; } | ||
| if !visited.contains(&u.id) && !next_ids.contains(&u.id) { | ||
| next_ids.insert(u.id); | ||
| next_frontier.push(u); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| if !next_frontier.is_empty() { | ||
| groups.push(FnDepsTransitiveGroup { | ||
| depth: d as i32, | ||
| callers: next_frontier.iter().map(|n| FnDepsNode { | ||
| name: n.name.clone(), | ||
| kind: n.kind.clone(), | ||
| file: n.file.clone(), | ||
| line: n.line, | ||
| }).collect(), | ||
| }); | ||
| } | ||
| frontier = next_frontier; | ||
| } | ||
| groups | ||
| } else { | ||
| Vec::new() | ||
| }; | ||
|
|
||
| // File hash (cached) | ||
| let file_hash = if !file_hash_cache.contains_key(&node.file) { | ||
| let hash: Option<String> = conn.prepare_cached( | ||
| "SELECT hash FROM file_hashes WHERE file = ?1" | ||
| ).ok().and_then(|mut stmt| { | ||
| stmt.query_row(params![node.file], |row| row.get(0)).ok() | ||
| }); | ||
| file_hash_cache.insert(node.file.clone(), hash.clone()); | ||
| hash | ||
| } else { | ||
| file_hash_cache.get(&node.file).cloned().flatten() | ||
| }; | ||
| let callers: Vec<FnDepsCallerNode> = callers_with_id | ||
| .iter() | ||
| .map(|c| FnDepsCallerNode { | ||
| name: c.name.clone(), | ||
| kind: c.kind.clone(), | ||
| file: c.file.clone(), | ||
| line: c.line, | ||
| via_hierarchy: c.via_hierarchy.clone(), | ||
| }) | ||
| .collect(); | ||
|
|
||
| let initial_frontier: Vec<FnDepsCallerWithId> = callers_with_id | ||
| .iter() | ||
| .map(|c| FnDepsCallerWithId { | ||
| id: c.id, |
There was a problem hiding this comment.
Relevance score computed twice per comparison in sort
fn_deps_relevance_score is called twice per element pair inside sort_by — once for a and once for b — each time recomputing node.name.to_lowercase(). For n matched nodes this is O(n log n) score calls. The original closure had the same inefficiency, but Rust's sort_by_cached_key exists precisely for this pattern: it computes the key once per element, stores it, and reuses it for every comparison. Consider replacing the sort_by call with a manual pre-compute step.
There was a problem hiding this comment.
Fixed — pre-computed fn_deps_relevance_score once per element using a (score, node) vec before sorting. f64 doesn't implement Ord so sort_by_cached_key can't be used directly; the pre-compute achieves the same O(n) key cost. Commit 626f7a9.
| /// analysis scope. Returns `(do_analysis, analysis_ok)` so the caller can | ||
| /// compute `analysis_complete`. | ||
| fn run_analysis_persistence( | ||
| conn: &Connection, | ||
| file_symbols: &HashMap<String, FileSymbols>, | ||
| analysis_scope: Option<&Vec<String>>, | ||
| opts: &BuildOpts, | ||
| include_ast: bool, | ||
| include_dataflow: bool, | ||
| timing: &mut PipelineTiming, | ||
| ) -> (bool, bool) { | ||
| let include_complexity = opts.complexity.unwrap_or(true); | ||
| let include_cfg = opts.cfg.unwrap_or(true); | ||
| let do_analysis = include_ast || include_dataflow || include_cfg || include_complexity; | ||
| if !do_analysis { | ||
| return (false, true); | ||
| } | ||
|
|
||
| let analysis_file_set: HashSet<&str> = match analysis_scope { | ||
| Some(files) => files.iter().map(|s| s.as_str()).collect(), | ||
| None => file_symbols.keys().map(|s| s.as_str()).collect(), | ||
| }; | ||
|
|
||
| let node_id_map = build_analysis_node_map(conn, &analysis_file_set); | ||
| let mut analysis_ok = true; | ||
|
|
||
| if include_ast { | ||
| let t0 = Instant::now(); | ||
| let ast_batches = build_ast_batches(file_symbols, &analysis_file_set); | ||
| if ast_db::do_insert_ast_nodes(conn, &ast_batches).is_err() { | ||
| analysis_ok = false; | ||
| } | ||
| timing.ast_ms = t0.elapsed().as_secs_f64() * 1000.0; | ||
| } | ||
| if include_complexity { | ||
| let t0 = Instant::now(); | ||
| if !write_complexity(conn, file_symbols, &analysis_file_set, &node_id_map) { | ||
| analysis_ok = false; | ||
| } | ||
| timing.complexity_ms = t0.elapsed().as_secs_f64() * 1000.0; | ||
| } | ||
| if include_cfg { | ||
| let t0 = Instant::now(); | ||
| if !write_cfg(conn, file_symbols, &analysis_file_set, &node_id_map) { | ||
| analysis_ok = false; | ||
| } | ||
| timing.cfg_ms = t0.elapsed().as_secs_f64() * 1000.0; | ||
| } | ||
| if include_dataflow { | ||
| let t0 = Instant::now(); | ||
| if !write_dataflow(conn, file_symbols, &analysis_file_set) { | ||
| analysis_ok = false; | ||
| } | ||
| timing.dataflow_ms = t0.elapsed().as_secs_f64() * 1000.0; | ||
| } | ||
|
|
||
| (do_analysis, analysis_ok) |
There was a problem hiding this comment.
Opaque
(bool, bool) return type obscures caller intent
run_analysis_persistence returns (do_analysis, analysis_ok) as an untyped pair. At the call site — let (do_analysis, analysis_ok) = run_analysis_persistence(...) — the positional order must be known to interpret the result correctly; swapping the two fields would silently produce wrong analysis_complete values without any compiler feedback. A named struct (e.g., AnalysisResult { ran: bool, ok: bool }) or at minimum a type alias would make the contract self-documenting and prevent future callers from misreading the tuple order.
There was a problem hiding this comment.
Fixed — replaced the opaque (bool, bool) return with AnalysisPersistenceResult { ran: bool, ok: bool }. The caller now reads analysis.ran and analysis.ok by name, making the contract self-documenting and preventing silent field-swap bugs. While at it, also fixed the analysis_complete logic to !analysis.ran || analysis.ok so builds that opt out of analysis correctly report completion. Commit 626f7a9.
…res, fix analysis_complete (#1232) - Replace `run_analysis_persistence` -> `(bool, bool)` tuple with named `AnalysisPersistenceResult { ran, ok }` to prevent silent positional-swap bugs - Fix `analysis_complete: do_analysis && analysis_ok` to `!analysis.ran || analysis.ok` so builds that opt out of analysis (all flags false) correctly report completion - Pre-compute `fn_deps_relevance_score` once per element before sorting instead of calling it twice per pair in `sort_by`, avoiding O(n log n) redundant calls
|
All three Greptile findings addressed in commit 626f7a9:
|
…res, fix analysis_complete
- Replace `run_analysis_persistence` -> `(bool, bool)` tuple with named
`AnalysisPersistenceResult { ran, ok }` to prevent silent positional-swap bugs
- Fix `analysis_complete: do_analysis && analysis_ok` to `!analysis.ran || analysis.ok`
so builds that opt out of analysis (all flags false) correctly report completion
- Pre-compute `fn_deps_relevance_score` once per element before sorting instead
of calling it twice per pair in `sort_by`, avoiding O(n log n) redundant calls
… root When file_exists returns true but strip_prefix fails (candidate outside root_dir), the original code silently fell through to try the next extension. The refactored probe_js_to_ts_remap and probe_known_extensions were calling relativize_to_root which returned an absolute path immediately, stopping the probe loop early. Restore the original behaviour by inlining the strip_prefix check and continuing the loop when it fails.
626f7a9 to
a4a4dfa
Compare
|
Fixed the two import_resolution.rs issues flagged in the Greptile summary (commit 310090d):
|
Summary
read_queries.rs,build_pipeline.rs) from gauntletCommits
Context
Part of the Titan Paradigm cleanup pass (see
.codegraph/titan/TITAN_REPORT.md). Merge order: this PR is #4 of 10 (mergeOrder position: 4).Caveats
Test plan
codegraph stats)