Skip to content

refactor(core-rs): decompose pipeline, read_queries, edge_builders; collapse walker recursion#1232

Merged
carlos-alm merged 6 commits into
mainfrom
refactor/titan-decomposition-core-rs
May 27, 2026
Merged

refactor(core-rs): decompose pipeline, read_queries, edge_builders; collapse walker recursion#1232
carlos-alm merged 6 commits into
mainfrom
refactor/titan-decomposition-core-rs

Conversation

@carlos-alm

Copy link
Copy Markdown
Contributor

Summary

  • Collapses walker mutual recursion into single-entry traversal
  • Decomposes pipeline / read_queries / edge_builders into focused helpers
  • Addresses the two worst halstead.bugs offenders (read_queries.rs, build_pipeline.rs) from gauntlet

Commits

  • b3c36f4: refactor(core-rs): collapse walker mutual recursion into single-entry traversal
  • b49cab5: refactor(core-rs): decompose pipeline, read queries, and edge builders

Context

Part of the Titan Paradigm cleanup pass (see .codegraph/titan/TITAN_REPORT.md). Merge order: this PR is #4 of 10 (mergeOrder position: 4).

Caveats

  • WASM grammars not available in dev worktree — CI will run full test matrix

Test plan

  • CI passes (lint, build, full test matrix)
  • Verify no new cycles introduced (codegraph stats)

docs check acknowledged - Rust internal helper extraction, no user-facing changes
@claude

claude Bot commented May 27, 2026

Copy link
Copy Markdown

Claude finished @carlos-alm's task —— View job


PR Review Complete

Reviewing PR #1232: refactor(core-rs): decompose pipeline, read_queries, edge_builders; collapse walker recursion

Review Checklist:

  • Phase 1: Root cause analysis - understand the problem being solved
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity verification
  • Phase 4: Code quality assessment
  • Phase 5: Backlog compliance evaluation
  • Final assessment and recommendation

Phase 1: Root Cause Analysis ✅

Problem Being Solved: This PR addresses code complexity issues in the Rust core engine, specifically targeting the two worst Halstead.bugs offenders (read_queries.rs and build_pipeline.rs) identified in a "Titan Paradigm cleanup pass." The refactoring aims to:

  1. Collapse walker mutual recursion into single-entry traversal to simplify control flow
  2. Decompose large pipeline/read_queries/edge_builders into focused helper functions
  3. Improve code maintainability by breaking down complex monolithic functions

Why This Approach: The approach appears justified as:

  • It's part of a systematic technical debt cleanup (part 4 of 10 PRs in Titan cleanup)
  • Targets specific complexity metrics (Halstead.bugs) rather than arbitrary refactoring
  • Maintains functional equivalence while improving code organization

Risk Assessment: Low-medium risk since this is pure refactoring without functional changes, but substantial size (2029 additions, 1531 deletions) requires careful verification.

Phase 2: Configuration Change Detection ✅

No configuration changes detected - This is purely a code refactoring:

  • No biome.json, eslint, jest, package.json, or tsconfig modifications
  • No test rules relaxed or thresholds lowered
  • No build configuration changes

Phase 3: Test Integrity ✅

No test files modified - Tests remain unchanged:

  • No .test.* or .spec.* files modified
  • No test logic weakened or assertions removed
  • All existing test coverage preserved

Phase 4: Code Quality Assessment ✅

Positive Aspects:

  • Excellent function decomposition: build_pipeline.rs refactored from monolithic into 35+ focused helper functions (e.g., pipeline_setup, save_and_purge_changed, run_structure_phase)
  • Clear separation of concerns: Each helper function has a single responsibility
  • Consistent error handling: Uses Result<T, String> patterns consistently
  • Good naming conventions: Function names are descriptive and follow Rust conventions
  • No malicious code detected: All refactoring appears legitimate

Quality Metrics:

  • File sizes remain manageable: Despite refactoring, largest files stay under 2500 lines
  • Minimal use of unwraps: Only 44 instances in build_pipeline.rs, mostly for path operations
  • No TODO/FIXME introduced: Only pre-existing TODO found in unrelated file

Architectural Improvements:

  • Walker mutual recursion collapsed into single-entry traversal (better control flow)
  • Import resolution and edge building properly separated
  • Analysis persistence cleanly abstracted

Phase 5: Backlog Compliance ✅

Zero-dep: ✅ - No new runtime dependencies added
Foundation-aligned: ✅ - Maintains core principles:

  • No cloud API calls introduced
  • Preserves zero-infrastructure design
  • Maintains programmatic API contract
  • Does not break embeddable-first approach

Problem-fit: 4/5 - Directly improves agent accuracy by:

  • Reducing complexity in core parsing pipeline
  • Making code easier to maintain and debug
  • Improving reliability of dependency graph construction

Breaking: No - Pure internal refactoring with no API changes

Tier: 1 (zero-dep ✅ + foundation-aligned ✅)

Root Cause Analysis

Problem Being Solved: Technical debt in Rust core engine targeting specific Halstead.bugs complexity offenders
Why This Approach: Systematic refactoring as part of structured cleanup (Titan Paradigm, 4/10)
Risk Assessment: Low-medium (large refactoring but no functional changes)

Backlog Compliance

  • Zero-dep: ✅ — no new dependencies
  • Foundation-aligned: ✅ — maintains all core principles
  • Problem-fit: 4/5 — improves maintainability and reliability for AI agents
  • Breaking: No — internal refactoring only
  • Tier: 1

Critical Concerns

None identified. This is well-structured technical debt cleanup.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: High-quality refactoring that significantly improves code maintainability while preserving functionality. Part of systematic technical debt cleanup with clear benefits. No breaking changes or concerning patterns detected.

@github-actions

github-actions Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Codegraph Impact Analysis

75 functions changed46 callers affected across 10 files

  • pipeline_setup in crates/codegraph-core/src/build_pipeline.rs:110 (1 transitive callers)
  • early_exit_result in crates/codegraph-core/src/build_pipeline.rs:143 (1 transitive callers)
  • save_and_purge_changed in crates/codegraph-core/src/build_pipeline.rs:171 (1 transitive callers)
  • parse_and_index_files in crates/codegraph-core/src/build_pipeline.rs:213 (1 transitive callers)
  • resolve_pipeline_imports in crates/codegraph-core/src/build_pipeline.rs:234 (1 transitive callers)
  • reconnect_saved_reverse_dep_edges in crates/codegraph-core/src/build_pipeline.rs:264 (1 transitive callers)
  • run_structure_phase in crates/codegraph-core/src/build_pipeline.rs:283 (1 transitive callers)
  • run_role_classification in crates/codegraph-core/src/build_pipeline.rs:321 (1 transitive callers)
  • run_analysis_persistence in crates/codegraph-core/src/build_pipeline.rs:362 (1 transitive callers)
  • run_pipeline in crates/codegraph-core/src/build_pipeline.rs:422 (0 transitive callers)
  • builtin_call_receivers in crates/codegraph-core/src/build_pipeline.rs:1100 (2 transitive callers)
  • compute_edge_relevant_files in crates/codegraph-core/src/build_pipeline.rs:1120 (3 transitive callers)
  • load_edge_node_set in crates/codegraph-core/src/build_pipeline.rs:1156 (2 transitive callers)
  • load_all_edge_nodes in crates/codegraph-core/src/build_pipeline.rs:1215 (3 transitive callers)
  • read_edge_node_info in crates/codegraph-core/src/build_pipeline.rs:1230 (0 transitive callers)
  • load_file_node_id_map in crates/codegraph-core/src/build_pipeline.rs:1243 (2 transitive callers)
  • collect_imported_names_for_file in crates/codegraph-core/src/build_pipeline.rs:1262 (2 transitive callers)
  • insert_call_edge_rows in crates/codegraph-core/src/build_pipeline.rs:1292 (2 transitive callers)
  • build_and_insert_call_edges in crates/codegraph-core/src/build_pipeline.rs:1310 (1 transitive callers)
  • visit in crates/codegraph-core/src/dataflow.rs:981 (5 transitive callers)

@greptile-apps

greptile-apps Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR decomposes three large, high-Halstead-complexity modules (build_pipeline.rs, read_queries.rs, edge_builder.rs) into focused single-responsibility helpers, and collapses the visit / visit_children mutual recursion in dataflow.rs into a single-entry traversal. All changes are pure structural refactors — no functional logic is altered.

  • build_pipeline.rs: run_pipeline is reduced to an orchestration spine by extracting ~10 named helpers; analysis_complete is fixed to !analysis.ran || analysis.ok and the opaque (bool, bool) tuple is replaced by AnalysisPersistenceResult { ran, ok }.
  • read_queries.rs: fn_deps, get_graph_stats, and find_nodes_for_triage are decomposed into ~12 helpers; relevance scores are pre-computed once per element before sorting.
  • dataflow.rs: mutual visitvisit_children recursion is collapsed into a single loop with an entered_scope flag, preserving push/pop semantics exactly.
  • edge_builder.rs / import_edges.rs / import_resolution.rs / structure.rs / graph_algorithms.rs: similarly decomposed; normalized_root computation moved outside the per-file loop in edge_builder.rs.

Confidence Score: 5/5

Pure structural refactor with no functional logic changes — all extracted helpers are faithful one-to-one decompositions of the original code, verified by tracing parameter flow and SQL construction through each extraction site.

Every helper was cross-checked against the original inline logic: SQL parameter indexing, scope push/pop order in the dataflow walker, barrel-through edge dedup, .js→.ts/.tsx probe order, and BFS visited-set seeding all match exactly. The two concrete correctness fixes were already flagged and addressed in prior review cycles. No new defects introduced.

No files require special attention — all extraction sites were verified to produce identical runtime behaviour.

Important Files Changed

Filename Overview
crates/codegraph-core/src/build_pipeline.rs Decomposes the monolithic run_pipeline into ~10 focused helpers; fixes analysis_complete to !analysis.ran
crates/codegraph-core/src/dataflow.rs Collapses visit / visit_children mutual recursion into a single-entry visit function using an entered_scope flag; scope push/pop semantics are correctly preserved.
crates/codegraph-core/src/edge_builder.rs Extracts strip_star_as_prefix, classify_import_edge_kind, emit_type_only_symbol_edges, emit_barrel_through_edges, and process_single_import; normalized_root computed once outside the loop.
crates/codegraph-core/src/graph_algorithms.rs Extracts bfs_neighbors_for_direction and reconstruct_bfs_path helpers; shortest_path uses early-continue instead of deep nesting. Logic identical to original.
crates/codegraph-core/src/import_edges.rs Extracts classify_import_kind, emit_type_only_symbol_rows, emit_barrel_through_rows, emit_edges_for_import; resolved_sources dedup logic preserved faithfully.
crates/codegraph-core/src/import_resolution.rs Extracts relativize_to_root, resolve_non_relative_import, probe_js_to_ts_remap, probe_known_extensions; .js→.ts/.tsx probe order preserved.
crates/codegraph-core/src/read_queries.rs Large decomposition of fn_deps, get_graph_stats, find_nodes_for_triage into ~12 helpers; pre-computes relevance scores once per element before sort (addresses previous review).
crates/codegraph-core/src/structure.rs Extracts ~12 helpers from compute_directory_metrics and insert_contains_edges; logic faithfully preserved.

Reviews (6): Last reviewed commit: "Merge branch 'main' into refactor/titan-..." | Re-trigger Greptile

Comment on lines 2169 to +2185
}

// Convert to FnDepsCallerNode for output (strip id)
let callers: Vec<FnDepsCallerNode> = callers_with_id.iter().map(|c| FnDepsCallerNode {
name: c.name.clone(),
kind: c.kind.clone(),
file: c.file.clone(),
line: c.line,
via_hierarchy: c.via_hierarchy.clone(),
}).collect();

// BFS transitive callers — reuse callers_with_id as initial frontier
let transitive_callers = if depth > 1 {
let mut visited = HashSet::new();
visited.insert(node.id);
let initial_frontier: Vec<CallerWithId> = callers_with_id.iter().map(|c| CallerWithId {
id: c.id, name: c.name.clone(), kind: c.kind.clone(), file: c.file.clone(), line: c.line, via_hierarchy: c.via_hierarchy.clone(),
}).collect();
let mut frontier: Vec<CallerWithId> = initial_frontier;
let mut groups: Vec<FnDepsTransitiveGroup> = Vec::new();

for d in 2..=depth {
let unvisited: Vec<&CallerWithId> = frontier.iter()
.filter(|f| !visited.contains(&f.id))
.collect();
for f in &unvisited {
visited.insert(f.id);
}
if unvisited.is_empty() { break; }

// Batch query: find all callers of the unvisited frontier
let mut next_frontier: Vec<CallerWithId> = Vec::new();
let mut next_ids = HashSet::new();
for f in &unvisited {
let mut stmt = conn.prepare_cached(
"SELECT n.id, n.name, n.kind, n.file, n.line \
FROM edges e JOIN nodes n ON e.source_id = n.id \
WHERE e.target_id = ?1 AND e.kind = 'calls'"
).map_err(|e| napi::Error::from_reason(format!("fn_deps bfs prepare: {e}")))?;
let rows = stmt.query_map(params![f.id], |row| {
Ok(CallerWithId {
id: row.get("id")?,
name: row.get("name")?,
kind: row.get("kind")?,
file: row.get("file")?,
line: row.get("line")?,
via_hierarchy: None,
})
}).map_err(|e| napi::Error::from_reason(format!("fn_deps bfs: {e}")))?;
let upstream: Vec<CallerWithId> = rows.collect::<Result<Vec<_>, _>>()
.map_err(|e| napi::Error::from_reason(format!("fn_deps bfs collect: {e}")))?;
for u in upstream {
if no_tests && is_test_file(&u.file) { continue; }
if !visited.contains(&u.id) && !next_ids.contains(&u.id) {
next_ids.insert(u.id);
next_frontier.push(u);
}
}
}

if !next_frontier.is_empty() {
groups.push(FnDepsTransitiveGroup {
depth: d as i32,
callers: next_frontier.iter().map(|n| FnDepsNode {
name: n.name.clone(),
kind: n.kind.clone(),
file: n.file.clone(),
line: n.line,
}).collect(),
});
}
frontier = next_frontier;
}
groups
} else {
Vec::new()
};

// File hash (cached)
let file_hash = if !file_hash_cache.contains_key(&node.file) {
let hash: Option<String> = conn.prepare_cached(
"SELECT hash FROM file_hashes WHERE file = ?1"
).ok().and_then(|mut stmt| {
stmt.query_row(params![node.file], |row| row.get(0)).ok()
});
file_hash_cache.insert(node.file.clone(), hash.clone());
hash
} else {
file_hash_cache.get(&node.file).cloned().flatten()
};
let callers: Vec<FnDepsCallerNode> = callers_with_id
.iter()
.map(|c| FnDepsCallerNode {
name: c.name.clone(),
kind: c.kind.clone(),
file: c.file.clone(),
line: c.line,
via_hierarchy: c.via_hierarchy.clone(),
})
.collect();

let initial_frontier: Vec<FnDepsCallerWithId> = callers_with_id
.iter()
.map(|c| FnDepsCallerWithId {
id: c.id,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Relevance score computed twice per comparison in sort

fn_deps_relevance_score is called twice per element pair inside sort_by — once for a and once for b — each time recomputing node.name.to_lowercase(). For n matched nodes this is O(n log n) score calls. The original closure had the same inefficiency, but Rust's sort_by_cached_key exists precisely for this pattern: it computes the key once per element, stores it, and reuses it for every comparison. Consider replacing the sort_by call with a manual pre-compute step.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — pre-computed fn_deps_relevance_score once per element using a (score, node) vec before sorting. f64 doesn't implement Ord so sort_by_cached_key can't be used directly; the pre-compute achieves the same O(n) key cost. Commit 626f7a9.

Comment on lines +352 to +408
/// analysis scope. Returns `(do_analysis, analysis_ok)` so the caller can
/// compute `analysis_complete`.
fn run_analysis_persistence(
conn: &Connection,
file_symbols: &HashMap<String, FileSymbols>,
analysis_scope: Option<&Vec<String>>,
opts: &BuildOpts,
include_ast: bool,
include_dataflow: bool,
timing: &mut PipelineTiming,
) -> (bool, bool) {
let include_complexity = opts.complexity.unwrap_or(true);
let include_cfg = opts.cfg.unwrap_or(true);
let do_analysis = include_ast || include_dataflow || include_cfg || include_complexity;
if !do_analysis {
return (false, true);
}

let analysis_file_set: HashSet<&str> = match analysis_scope {
Some(files) => files.iter().map(|s| s.as_str()).collect(),
None => file_symbols.keys().map(|s| s.as_str()).collect(),
};

let node_id_map = build_analysis_node_map(conn, &analysis_file_set);
let mut analysis_ok = true;

if include_ast {
let t0 = Instant::now();
let ast_batches = build_ast_batches(file_symbols, &analysis_file_set);
if ast_db::do_insert_ast_nodes(conn, &ast_batches).is_err() {
analysis_ok = false;
}
timing.ast_ms = t0.elapsed().as_secs_f64() * 1000.0;
}
if include_complexity {
let t0 = Instant::now();
if !write_complexity(conn, file_symbols, &analysis_file_set, &node_id_map) {
analysis_ok = false;
}
timing.complexity_ms = t0.elapsed().as_secs_f64() * 1000.0;
}
if include_cfg {
let t0 = Instant::now();
if !write_cfg(conn, file_symbols, &analysis_file_set, &node_id_map) {
analysis_ok = false;
}
timing.cfg_ms = t0.elapsed().as_secs_f64() * 1000.0;
}
if include_dataflow {
let t0 = Instant::now();
if !write_dataflow(conn, file_symbols, &analysis_file_set) {
analysis_ok = false;
}
timing.dataflow_ms = t0.elapsed().as_secs_f64() * 1000.0;
}

(do_analysis, analysis_ok)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Opaque (bool, bool) return type obscures caller intent

run_analysis_persistence returns (do_analysis, analysis_ok) as an untyped pair. At the call site — let (do_analysis, analysis_ok) = run_analysis_persistence(...) — the positional order must be known to interpret the result correctly; swapping the two fields would silently produce wrong analysis_complete values without any compiler feedback. A named struct (e.g., AnalysisResult { ran: bool, ok: bool }) or at minimum a type alias would make the contract self-documenting and prevent future callers from misreading the tuple order.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — replaced the opaque (bool, bool) return with AnalysisPersistenceResult { ran: bool, ok: bool }. The caller now reads analysis.ran and analysis.ok by name, making the contract self-documenting and preventing silent field-swap bugs. While at it, also fixed the analysis_complete logic to !analysis.ran || analysis.ok so builds that opt out of analysis correctly report completion. Commit 626f7a9.

carlos-alm added a commit that referenced this pull request May 27, 2026
…res, fix analysis_complete (#1232)

- Replace `run_analysis_persistence` -> `(bool, bool)` tuple with named
  `AnalysisPersistenceResult { ran, ok }` to prevent silent positional-swap bugs
- Fix `analysis_complete: do_analysis && analysis_ok` to `!analysis.ran || analysis.ok`
  so builds that opt out of analysis (all flags false) correctly report completion
- Pre-compute `fn_deps_relevance_score` once per element before sorting instead
  of calling it twice per pair in `sort_by`, avoiding O(n log n) redundant calls
@carlos-alm

Copy link
Copy Markdown
Contributor Author

All three Greptile findings addressed in commit 626f7a9:

  1. (bool, bool) opaque return — replaced with AnalysisPersistenceResult { ran: bool, ok: bool } named struct. Caller now reads by field name, eliminating the silent positional-swap hazard.

  2. analysis_complete false when analysis disabled — fixed the expression from do_analysis && analysis_ok to !analysis.ran || analysis.ok. Builds that opt out of all analysis phases now correctly report completion.

  3. fn_deps_relevance_score called twice per sort pair — pre-computed scores once per element into a Vec<(f64, FnDepsMatchedNode)> before sorting. f64 doesn't implement Ord so sort_by_cached_key cannot be used directly; this achieves the same O(n) key cost.

@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

…res, fix analysis_complete

- Replace `run_analysis_persistence` -> `(bool, bool)` tuple with named
  `AnalysisPersistenceResult { ran, ok }` to prevent silent positional-swap bugs
- Fix `analysis_complete: do_analysis && analysis_ok` to `!analysis.ran || analysis.ok`
  so builds that opt out of analysis (all flags false) correctly report completion
- Pre-compute `fn_deps_relevance_score` once per element before sorting instead
  of calling it twice per pair in `sort_by`, avoiding O(n log n) redundant calls
… root

When file_exists returns true but strip_prefix fails (candidate outside
root_dir), the original code silently fell through to try the next
extension. The refactored probe_js_to_ts_remap and probe_known_extensions
were calling relativize_to_root which returned an absolute path
immediately, stopping the probe loop early.

Restore the original behaviour by inlining the strip_prefix check and
continuing the loop when it fails.
@carlos-alm carlos-alm force-pushed the refactor/titan-decomposition-core-rs branch from 626f7a9 to a4a4dfa Compare May 27, 2026 07:38
@carlos-alm

Copy link
Copy Markdown
Contributor Author

Fixed the two import_resolution.rs issues flagged in the Greptile summary (commit 310090d):

  1. probe_js_to_ts_remap strip_prefix fall-through — when file_exists returns true but strip_prefix fails (candidate outside root_dir), the refactored version called relativize_to_root which returned an absolute path immediately, stopping the probe loop. Fixed by inlining the strip_prefix check and continuing to the next extension when it fails, matching the original fall-through behaviour.

  2. probe_known_extensions same issue — applied the same fix: skip candidates outside root_dir and continue probing, rather than returning an absolute path.

@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit b902abc into main May 27, 2026
27 checks passed
@carlos-alm carlos-alm deleted the refactor/titan-decomposition-core-rs branch May 27, 2026 23:04
@github-actions github-actions Bot locked and limited conversation to collaborators May 27, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant