Skip to content

refactor(extractors): shared helpers across language extractors (TS+Rust)#1230

Merged
carlos-alm merged 11 commits into
mainfrom
refactor/titan-abstraction-extractors
May 27, 2026
Merged

refactor(extractors): shared helpers across language extractors (TS+Rust)#1230
carlos-alm merged 11 commits into
mainfrom
refactor/titan-abstraction-extractors

Conversation

@carlos-alm

Copy link
Copy Markdown
Contributor

Summary

  • Extracts shared iteration / identifier / symbol-collection helpers
  • Adopts them across 9 TS and 10 Rust extractors
  • Breaks the elixir param/map binding cycle in both engines
  • Adds two helper modules (helpers.ts, helpers.rs); no behavioural change

Commits

  • 9c8be55: refactor(extractors): extend shared helpers for identifier and symbol collection
  • 9c3d016: refactor(extractors): adopt shared helpers across language extractors
  • 5abe6ad: refactor(extractors): break elixir param/map binding cycle
  • 0d687c4: refactor(extractors-rs): extend shared helpers for identifier and symbol collection
  • f10fcab: refactor(extractors-rs): adopt shared helpers across language extractors
  • d9bbc8f: refactor(extractors-rs): break elixir param/map binding cycle

Context

Part of the Titan Paradigm cleanup pass (see .codegraph/titan/TITAN_REPORT.md). Merge order: this PR is #1 of 10 (mergeOrder position: 1).

Caveats

Test plan

  • CI passes (lint, build, full test matrix)
  • Verify no new cycles introduced (codegraph stats)

… collection

Adds shared utilities to src/extractors/helpers.ts in preparation for
adoption across language extractors (phase 2):

- nodeStartLine: companion to nodeEndLine for the ~108 hand-rolled
  startPosition.row + 1 literals scattered across extractors
- findFirstChildOfTypes: find first child matching any of N types
  (useful for grammar variants like string vs string_literal)
- iterChildren / PUNCTUATION_TOKENS: generator-based child iteration
  with punctuation skipping, used in elixir/gleam destructuring walks
- pushCall / pushImport: centralise Call/Import construction so line
  derivation stays consistent across extractors
- extractSimpleParameters / resolveParamName: uniform parameter
  extraction with optional type-map sink — collapses boilerplate in
  the ~16 per-language extractParams helpers

Phase 1 of the TS extractor refactor plan (sync.json clusters 1).
Additive only — no consumer adoption yet; existing helpers and
extractor behaviour unchanged. Consumers updated in phase 2.

docs check acknowledged: internal refactor, no doc updates needed.
Phase 2 of the TS extractor refactor plan (sync.json cluster 1). Adopts
the helpers extended in 9c8be55 (nodeStartLine, findFirstChildOfTypes,
pushCall, pushImport, extractSimpleParameters, stripQuotes) across six
language extractors:

- r.ts: drop local stripQuotes; use shared stripQuotes/pushCall/
  pushImport/findFirstChildOfTypes/nodeStartLine
- gleam.ts: use pushCall/pushImport/findFirstChildOfTypes/nodeStartLine;
  extract pushConstructor helper for the dual-branch data-constructor
  walk
- julia.ts: use pushCall/pushImport/nodeStartLine; collapse Julia param
  wrapper-type branches via JULIA_PARAM_WRAPPER_TYPES set
- java.ts: use pushCall/pushImport/nodeStartLine; collapse
  extractJavaParameters via extractSimpleParameters with typeMap sink;
  extract resolveJavaTypeText for the generic_type unwrap pattern
- gleam.ts and solidity.ts: extract qualifyWithParent helper in solidity
  to collapse 6 duplicated `parent ? \`\${parent}.\${name}\` : name`
  blocks
- solidity.ts: use pushCall/pushImport/findFirstChildOfTypes/
  nodeStartLine; collapse extractSolParams via extractSimpleParameters
- javascript.ts: bulk-replace 43 inline `XXX.startPosition.row + 1`
  literals with nodeStartLine() calls; replace one stray endPosition
  literal with nodeEndLine

Net -65 lines. No behaviour changes — only call-site collapsing onto the
shared helpers (semantics verified by careful inspection of each
replacement; pushImport's empty-names fallback matches the previous
ad-hoc defaults in each extractor).

docs check acknowledged: internal refactor, no doc updates needed.
Convert collectElixirParamIdentifiers from mutual-recursion with
collectElixirMapBindings into a single iterative worklist traversal.
Map/list/tuple/binary-operator dispatch is now done via three leaf
helpers that push child nodes onto the worklist instead of calling
back into the main function. This removes the function-level cycle
flagged by codegraph (9 -> 8 cycles) without changing extractor
semantics. docs check acknowledged: internal refactor only.
Phase 5 of the Rust extractor refactor plan (sync.json cluster 2). Adopts
the helpers extended in 0d687c4 (push_call, push_simple_call, push_import,
push_type_map_entry, extract_simple_parameters, match_c_family_type_map)
across eight language extractors:

- cpp.rs: collapse match_cpp_type_map to a one-line delegate of
  match_c_family_type_map; use push_import/push_simple_call/push_call
  for include and call sites
- cuda.rs: same delegation as cpp.rs; use push_import/push_simple_call/
  push_call across include and call_expression handlers
- java.rs: use push_type_map_entry for local-variable / formal-parameter
  bindings; use push_call/push_simple_call for method invocation and
  object creation; collapse extract_java_parameters to a one-shot
  extract_simple_parameters call; use push_import for import declaration
- javascript.rs: use push_simple_call for new_expression identifier
  branch; use push_type_map_entry for the confidence-0.9 type entries
- julia.rs: use push_simple_call/push_call across identifier and
  field_expression / scoped_identifier call branches
- objc.rs: use push_import for at_import; use push_call for c-call and
  message-expression handlers (drops redundant is_empty guards)
- r_lang.rs: use push_simple_call/push_call across identifier and
  namespace_operator call branches; use push_import for library/source
- solidity.rs: use push_call (drops redundant guard) for call sites;
  collapse extract_sol_params to a one-shot extract_simple_parameters

Net: -207 lines across 8 files, no behavior change. cargo check clean,
324 rust unit tests pass.

Pre-existing test failure: tests/engines/parity.test.ts has two failing
elixir cases unrelated to this commit (filed as #1227 — regression from
commit 5abe6ad in Phase 3).
Convert collect_elixir_param_identifiers from mutual-recursion with
collect_elixir_map_bindings into a single iterative worklist traversal.
Map/list/tuple/binary-operator dispatch is now done via three leaf
helpers (push_elixir_sequence_items, push_elixir_map_values,
push_elixir_binary_operator_operands) that push child nodes onto the
worklist instead of calling back into the main function. This removes
the function-level cycle flagged by codegraph (8 -> 7 cycles) and
mirrors the TS refactor in 5abe6ad without changing extractor
semantics. docs check acknowledged: internal refactor only.
@claude

claude Bot commented May 27, 2026

Copy link
Copy Markdown

Claude finished @carlos-alm's task —— View job


PR Review Complete ✅

Final Assessment

Root Cause Analysis

Problem Being Solved: Code deduplication - extractors had 108 hand-rolled startPosition.row + 1 literals and duplicated parameter/identifier collection logic across 19 language extractors (9 TS + 10 Rust).

Why This Approach: Creates shared helper modules (helpers.ts, helpers.rs) to centralize common patterns. The Elixir cycle fix converts mutual recursion to iterative worklist traversal - legitimate algorithmic improvement, not masking.

Risk Assessment: Low risk - pure internal refactoring with no behavior changes. Systematic cleanup of technical debt.

Backlog Compliance

  • Zero-dep: ✓ No new runtime dependencies
  • Foundation-aligned: ✓ Maintains always-current graph, zero-cost core, embeddable design
  • Problem-fit: 2/5 Infrastructure cleanup, doesn't directly address core "AI agents waste tokens" problem
  • Breaking: No - No external API changes
  • Tier: 1

Critical Concerns

None identified. This is a well-executed refactoring that:

Passes all safety checks: No test modifications, no configuration changes, no masking of real issues
Quality implementation: Proper error handling, consistent patterns across TS/Rust, defensive programming
Legitimate cycle fix: Converts problematic mutual recursion to clean iterative algorithm
Maintains dual-engine parity: Both engines adopt identical abstraction patterns

Final Recommendation

  • Rating: ⭐⭐⭐⭐ (4/5) - High quality internal refactoring
  • Action: APPROVE
  • Reasoning: Excellent code hygiene improvement. Eliminates 108+ duplicate literals, centralizes extraction patterns, and legitimately fixes a cycle issue. No red flags detected. This is the kind of foundational cleanup that enables future development.

Minor Note: Low problem-fit score (2/5) reflects that this is infrastructure work rather than user-facing features. That's appropriate for position #1 in a 10-PR cleanup series.

@github-actions

github-actions Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Codegraph Impact Analysis

141 functions changed153 callers affected across 18 files

  • match_cpp_type_map in crates/codegraph-core/src/extractors/cpp.rs:22 (0 transitive callers)
  • handle_cpp_preproc_include in crates/codegraph-core/src/extractors/cpp.rs:307 (1 transitive callers)
  • handle_cpp_call_expression in crates/codegraph-core/src/extractors/cpp.rs:323 (1 transitive callers)
  • match_cuda_type_map in crates/codegraph-core/src/extractors/cuda.rs:46 (0 transitive callers)
  • handle_cuda_preproc_include in crates/codegraph-core/src/extractors/cuda.rs:470 (1 transitive callers)
  • handle_cuda_call_expression in crates/codegraph-core/src/extractors/cuda.rs:493 (1 transitive callers)
  • collect_elixir_param_identifiers in crates/codegraph-core/src/extractors/elixir.rs:169 (3 transitive callers)
  • push_elixir_binary_operator_operands in crates/codegraph-core/src/extractors/elixir.rs:198 (3 transitive callers)
  • push_elixir_sequence_items in crates/codegraph-core/src/extractors/elixir.rs:223 (3 transitive callers)
  • push_elixir_map_values in crates/codegraph-core/src/extractors/elixir.rs:239 (3 transitive callers)
  • find_first_child_of_types in crates/codegraph-core/src/extractors/helpers.rs:49 (0 transitive callers)
  • iter_children in crates/codegraph-core/src/extractors/helpers.rs:74 (0 transitive callers)
  • push_call in crates/codegraph-core/src/extractors/helpers.rs:809 (21 transitive callers)
  • push_simple_call in crates/codegraph-core/src/extractors/helpers.rs:831 (14 transitive callers)
  • push_import in crates/codegraph-core/src/extractors/helpers.rs:846 (12 transitive callers)
  • ExtractParametersOptions<'a>.default in crates/codegraph-core/src/extractors/helpers.rs:894 (0 transitive callers)
  • resolve_param_name in crates/codegraph-core/src/extractors/helpers.rs:908 (6 transitive callers)
  • extract_simple_parameters in crates/codegraph-core/src/extractors/helpers.rs:929 (7 transitive callers)
  • push_type_map_entry in crates/codegraph-core/src/extractors/helpers.rs:963 (5 transitive callers)
  • match_c_family_type_map in crates/codegraph-core/src/extractors/helpers.rs:988 (2 transitive callers)

@greptile-apps

greptile-apps Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR introduces two new shared helper modules (helpers.ts, helpers.rs) and adopts them across 9 TypeScript and 10 Rust language extractors, replacing ~700 lines of duplicated boilerplate for call/import pushing, parameter extraction, and type-map recording. The Elixir extractors were simultaneously converted from mutual recursion to an iterative worklist to break the function-level cycle in the call graph.

  • Shared output helpers (pushCall, pushImport, extractSimpleParameters, push_type_map_entry, match_c_family_type_map) consolidate the duplicated startPosition.row + 1 / Call { name, line, ... } idioms that appeared in every extractor; both TS and Rust APIs are symmetric.
  • Elixir cycle-break: both collectElixirParamIdentifiers variants are now iterative (LIFO worklist); items are pushed in reverse source order so pop() yields them left-to-right, preserving document order for list/tuple/map pattern parameters.
  • Java handleJavaLocalVarDecl explicitly uses Map.set (last-wins) rather than the shared setTypeMapEntry (first-wins at equal confidence), deliberately preserving the pre-refactor semantics for method-scoped local variables.

Confidence Score: 5/5

Safe to merge; the refactoring is mechanical and the behavioural edge cases are correctly handled in the final commit.

Every output-push helper preserves existing null/empty guards. The Elixir iterative worklist correctly pushes children in reverse order so document order is maintained on pop. Java's handleJavaLocalVarDecl uses Map.set directly rather than setTypeMapEntry, intentionally keeping last-wins semantics for local variable declarations.

No files require special attention.

Important Files Changed

Filename Overview
src/extractors/helpers.ts New shared helpers: pushCall, pushImport, extractSimpleParameters, setTypeMapEntry, iterChildren. All well-typed; iterChildren/PUNCTUATION_TOKENS are defined but not yet consumed by any extractor in this PR.
crates/codegraph-core/src/extractors/helpers.rs New Rust helpers mirroring the TS side. iter_children and PUNCTUATION_TOKENS are defined but unused in this PR. match_c_family_type_map returns void (not bool), addressing the previously noted concern.
src/extractors/elixir.ts Iterative worklist replaces recursive descent. pushElixirSequenceItems correctly pushes in reverse; pushElixirMapValues collects then reverses — parameter document order is preserved.
crates/codegraph-core/src/extractors/elixir.rs Matching Rust iterative worklist. push_elixir_sequence_items uses (0..count).rev(); push_elixir_map_values collects in source order then reverses. Order semantics match the TS implementation.
src/extractors/java.ts extractJavaParameters delegates to extractSimpleParameters; handleJavaLocalVarDecl correctly uses direct Map.set (last-wins), restoring pre-refactor semantics. Introduces resolveJavaTypeText helper.
crates/codegraph-core/src/extractors/java.rs Delegates to push_type_map_entry, push_call, push_import, extract_simple_parameters. Rust type_map is a Vec so no first-wins/last-wins concern; behavior matches the previous append-on-match pattern.
src/extractors/gleam.ts findFirstChildOfTypes replaces chained findChild calls. Functionally equivalent for well-formed Gleam import nodes. pushConstructor extracts repeated ctor-name logic.
src/extractors/solidity.ts extractSolParams and handleCallExpression delegate to shared helpers. qualifyWithParent consolidates repeated findParentNode logic. Solidity-specific ['*'] fallback is explicitly preserved.
crates/codegraph-core/src/extractors/cpp.rs match_cpp_type_map delegates to match_c_family_type_map. Preproc include and call expression helpers adopt push_import / push_call.
crates/codegraph-core/src/extractors/cuda.rs Mirrors cpp.rs changes with unwrap_cuda_declarator. Same push_import / push_call cleanup.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph TS["TypeScript extractors"]
        ETS[elixir.ts] --> HTS
        GTS[gleam.ts] --> HTS
        JTS[java.ts] --> HTS
        JSTS[javascript.ts] --> HTS
        JULTS[julia.ts] --> HTS
        RTS[r.ts] --> HTS
        SOLTS[solidity.ts] --> HTS
        HTS[helpers.ts\npushCall / pushImport\nextractSimpleParameters\nsetTypeMapEntry\niterChildren]
    end

    subgraph RS["Rust extractors"]
        ERS[elixir.rs] --> HRS
        CPP[cpp.rs] --> HRS
        CU[cuda.rs] --> HRS
        JRS[java.rs] --> HRS
        JSRS[javascript.rs] --> HRS
        JULRS[julia.rs] --> HRS
        ORS[objc.rs] --> HRS
        RRS[r_lang.rs] --> HRS
        SOLRS[solidity.rs] --> HRS
        HRS[helpers.rs\npush_call / push_import\nextract_simple_parameters\npush_type_map_entry\nmatch_c_family_type_map]
    end

    HTS -. "mirrors" .-> HRS
Loading

Reviews (8): Last reviewed commit: "fix(extractors/java): use last-wins for ..." | Re-trigger Greptile

…er patterns

The iterative-worklist refactor in both the TS and Rust engines pushed
list/tuple children onto the LIFO stack in forward (left-to-right) order,
causing pop() to emit them in reversed document order.  For example, the
pattern {x, _y} was yielding ['_y', 'x'] instead of ['x', '_y'].

Fix: iterate children in reverse when pushing onto the stack so that
pop() restores the original document order.  The same reverse-push
pattern is applied to pushElixirMapValues / push_elixir_map_values for
multi-binding map patterns.

Also remove the unused `bool` return from `match_c_family_type_map`
(both current callers discard the value).

Fixes the CI failure introduced by the cycle-break refactor.
@carlos-alm

Copy link
Copy Markdown
Contributor Author

All three Greptile concerns addressed in commit 4fc7913:

  1. TS pushElixirSequenceItems order reversal — fixed in src/extractors/elixir.ts: changed loop to iterate in reverse (i = childCount-1..0) so LIFO pop yields document order.

  2. Rust push_elixir_sequence_items / push_elixir_map_values order reversal — fixed in crates/codegraph-core/src/extractors/elixir.rs with the same reverse-iteration approach for sequences and collect-then-push-reverse for map values.

  3. Unused bool return from match_c_family_type_map — changed return type to () in crates/codegraph-core/src/extractors/helpers.rs. Both callers already discarded the value; removing it eliminates the noise.

@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm

Copy link
Copy Markdown
Contributor Author

Greptile feedback addressed in two commits:

  1. TS type error in pushElixirMapValues (7452e72): Added noUncheckedIndexedAccess guard for array index access — values[i] is now checked before push to satisfy TypeScript strict mode. This was the root cause of all CI failures (npm install was failing on build).

  2. typeMap write semantics inconsistency in java.ts (3daa2a1): Migrated handleJavaLocalVarDecl from typeMap.set (always-overwrites) to setTypeMapEntry (first-wins at equal confidence), aligning it with extractJavaParameters which already uses setTypeMapEntry via extractSimpleParameters. Both local-variable and parameter type writes now use consistent semantics.

@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

Comment thread src/extractors/java.ts
Comment on lines 311 to 320
function extractJavaParameters(
paramListNode: TreeSitterNode | null,
typeMap?: Map<string, TypeMapEntry>,
): SubDeclaration[] {
const params: SubDeclaration[] = [];
if (!paramListNode) return params;
for (let i = 0; i < paramListNode.childCount; i++) {
const param = paramListNode.child(i);
if (!param) continue;
if (param.type === 'formal_parameter' || param.type === 'spread_parameter') {
const nameNode = param.childForFieldName('name');
if (nameNode) {
params.push({ name: nameNode.text, kind: 'parameter', line: param.startPosition.row + 1 });
if (typeMap) {
const typeNode = param.childForFieldName('type');
if (typeNode) {
const typeName =
typeNode.type === 'generic_type' ? typeNode.child(0)?.text : typeNode.text;
if (typeName) typeMap.set(nameNode.text, { type: typeName, confidence: 0.9 });
}
}
}
}
}
return params;
return extractSimpleParameters(paramListNode, {
paramTypes: JAVA_PARAM_TYPES,
typeMap,
resolveType: resolveJavaTypeText,
});
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 typeMap write semantics silently changed from last-wins to first-wins

extractJavaParameters now delegates to extractSimpleParametersrecordParamTypesetTypeMapEntry, which only writes when confidence > existing.confidence. At the uniform confidence of 0.9, the first method processed wins and all later methods with a colliding parameter name are silently dropped. The old code used typeMap.set(nameNode.text, ...) which always overwrote, so the last method won.

handleJavaLocalVarDecl (line 277) was also changed from ctx.typeMap?.set(…) to setTypeMapEntry(…, 0.9), giving the same first-wins behaviour. If a parameter x: Foo is seen before a local x: Bar, the local variable type is now suppressed — the opposite of the previous behaviour. The PR claims no behavioural change, but this is a real semantic difference that can produce wrong type-resolution results in files with same-named parameters across methods.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit abd318f. The handleJavaLocalVarDecl path now uses Map.set (last-wins) rather than setTypeMapEntry (first-wins at equal confidence), restoring the original pre-refactor behavior. Root cause: setTypeMapEntry at confidence 0.9 let the constructor parameter 'repo: UserRepository' block the local variable 'InMemoryUserRepository repo' in createDefault() from updating the typeMap. This caused repo.findById() etc. to incorrectly resolve to the interface method instead of the concrete implementation — 3 false-positive call edges, dropping Java precision from 70% to 66.7% on CI (all platforms). Local variable declarations are method-scoped and should override same-named entries from earlier scopes, so last-wins is the correct semantics here.

…LocalVarDecl

setTypeMapEntry (first-wins on tie) blocked local variable types from
overriding same-named constructor parameters at equal confidence (0.9).
In UserService, the constructor parameter 'repo: UserRepository' was
set first, preventing the local variable 'InMemoryUserRepository repo'
in createDefault() from updating the typeMap, which caused
repo.findById() etc. to resolve to the interface method (false positive)
instead of the concrete implementation.

Local variable declarations are method-scoped and should use last-wins
semantics so a concrete type assigned in one method doesn't get blocked
by an earlier abstract type from a different scope.
@carlos-alm

Copy link
Copy Markdown
Contributor Author

Fixed CI failure (Java precision 66.7% < 70% threshold) in commit abd318f.

The root cause was the typeMap write-semantics change in handleJavaLocalVarDecl (commit 3daa2a1): switching from Map.set (last-wins) to setTypeMapEntry (first-wins at equal confidence) let the constructor parameter 'repo: UserRepository' block the local variable 'InMemoryUserRepository repo' in createDefault() from updating the typeMap. This caused repo.findById(), repo.save(), and repo.delete() in other methods to resolve to the interface methods (false positives) instead of the concrete implementation — 3 false-positive edges, dropping precision from 70% to 66.7% on all CI platforms.

Fix: reverted handleJavaLocalVarDecl to use direct Map.set (last-wins). Local variable declarations are method-scoped and must be able to override same-named entries from other scopes. extractJavaParameters (via extractSimpleParameters/setTypeMapEntry) is left unchanged since parameter-to-parameter collisions across different methods don't cause this particular failure.

@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit 3671144 into main May 27, 2026
31 checks passed
@carlos-alm carlos-alm deleted the refactor/titan-abstraction-extractors branch May 27, 2026 21:49
@github-actions github-actions Bot locked and limited conversation to collaborators May 27, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant