refactor(domain): decompose parser, analysis, and search modules by carlos-alm · Pull Request #1236 · optave/ops-codegraph-tool

carlos-alm · 2026-05-27T00:33:01Z

Summary

Parser: extracts LANGUAGE_REGISTRY iteration + worker boundary helpers
Analysis: decomposes module-map; reduces complexity in fn-impact and dependencies
Search: decomposes generator; reduces complexity in semantic and hybrid search

Commits

d2eab30: refactor(parser): extract LANGUAGE_REGISTRY iteration and worker boundary helpers
6819cd6: refactor(analysis): decompose module-map and reduce complexity in fn-impact and dependencies
4f34404: refactor(search): decompose generator and reduce complexity in semantic and hybrid search

Context

Part of the Titan Paradigm cleanup pass (see .codegraph/titan/TITAN_REPORT.md). Merge order: this PR is #8 of 10 (mergeOrder position: 8).

Note: Plan listed PR #1 (extractors) as a dependency because the helper signatures landed there. Cherry-pick applied cleanly on top of main since the parser changes are independent of the extractor helpers — review separately, but merge order can match plan if reviewers prefer.

Caveats

WASM grammars not available in dev worktree — CI will run full test matrix

Test plan

CI passes (lint, build, full test matrix)
Verify no new cycles introduced (codegraph stats)

…dary helpers

…impact and dependencies Split high-cognitive-complexity functions in the analysis domain into focused helpers. Worst functions per gauntlet (cog/cyc/maxNesting/halstead) are now below thresholds. module-map.ts (statsData cog=31 -> below threshold): - Extract buildStatsFromNative and buildStatsFromJs branches - Share false-positive query and quality-score helpers between paths - aggregateRolesFromNative pulls duplicated role-aggregation code out fn-impact.ts (bfsTransitiveCallers cog=37 -> below threshold, impactAnalysisData cog=27 -> below threshold): - Extract recordCaller, processFrontierNode, seedInterfaceImplementors - Extract bfsImportDependents and groupDependentsByLevel dependencies.ts (bfsShortestPath cog=29, bfsFilePath cog=30, buildTransitiveCallers cog=24 -> all below threshold): - Extract buildNextCallerFrontier from buildTransitiveCallers - Extract buildNeighborStmt + visitNeighbor; state collected in struct - Extract visitFileNeighbor + reconstructFilePath docs check acknowledged - internal helper extraction, no user-facing changes

…ic and hybrid search

claude · 2026-05-27T00:33:16Z

Claude finished @carlos-alm's task —— View job

Root Cause Analysis

Problem Being Solved: Titan Paradigm cleanup pass addressing complex function decomposition to improve code maintainability and cognitive load across parser, analysis, and search modules.

Why This Approach: The PR implements function extraction and complexity reduction as part of a systematic refactoring initiative. The changes decompose large modules by extracting helper functions and reducing cognitive complexity in hot-path functions.

Risk Assessment: Low risk refactoring - pure extraction with no semantic changes to existing functionality.

Phase 1: Root Cause Analysis ✅

This PR addresses code complexity as part of the "Titan Paradigm cleanup pass" - a systematic refactoring to reduce cognitive complexity in the codebase. Based on the Titan Paradigm documentation, this targets functions with complexity > 7 as failures that need decomposition.

The changes are legitimate refactoring - extracting helper functions to reduce complexity without changing behavior.

Phase 2: Configuration Change Detection ✅

No concerning configuration changes detected:

✅ No biome/linting rule modifications
✅ No test timeout or coverage threshold changes
✅ No build configuration modifications
✅ No TypeScript strict mode changes
✅ No module resolution changes

Phase 3: Test Integrity ✅

No test integrity concerns:

✅ No test files modified
✅ No assertions weakened or removed
✅ No test expectations lowered
✅ No eslint-disable comments added

The PR mentions "WASM grammars not available in dev worktree — CI will run full test matrix" which is expected for worktree environments.

Phase 4: Code Quality Assessment ✅

Positive refactoring patterns:

Parser module (parser.ts):
- ✅ Extracts collectBackfillPending, mergeAnalysisData helper functions
- ✅ Reduces complexity in ensureWasmTrees by decomposing into smaller functions
- ✅ Maintains same API contracts
Analysis modules:
- ✅ dependencies.ts: Extracts BFS helpers (buildNextCallerFrontier, buildTransitiveCallers)
- ✅ fn-impact.ts: Decomposes bfsTransitiveCallers with helper functions
- ✅ module-map.ts: Extracts stats aggregation helpers for native fast-path
Search modules:
- ✅ generator.ts: Extracts embedding text preparation helpers
- ✅ semantic.ts: Better separation of concerns for multi-search RRF
- ✅ hybrid.ts: Cleaner fusion logic organization
Worker entry: Better visitor setup organization

No concerning patterns found:

✅ No new dependencies added
✅ No security issues introduced
✅ No performance regressions expected
✅ Maintains backward compatibility

Phase 5: Backlog Compliance

Zero-dep: ✅ - No new runtime dependencies added. Pure refactoring using existing tree-sitter/SQLite/AST infrastructure.

Foundation-aligned: ✅ - Fully compliant with all principles:

P1: No cloud API calls, maintains always-current analysis
P4: Zero-cost core maintained
P5: Programmatic API contract preserved
P8: No code generation, pure analysis tool

Problem-fit: 4/5 - This directly addresses the core problem by reducing cognitive complexity that causes AI agents to waste tokens re-orienting in complex functions. Decomposed functions are easier for AI to understand and modify correctly.

Breaking: No - Pure internal refactoring with no API signature changes, DB schema changes, or MCP tool contract modifications.

Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Final Recommendation

Rating: ⭐⭐⭐⭐☆ (4/5)
Action: APPROVE WITH CONDITIONS
Reasoning: High-quality refactoring that reduces complexity without changing behavior. Part of systematic Titan Paradigm cleanup. Minor concern about verifying no functional regressions in complex extraction logic.

Conditions for approval:

✅ CI passes with full test matrix
✅ No new cycles introduced (mentioned in test plan)

This is exactly the type of systematic complexity reduction that makes codebases more maintainable for both humans and AI agents.

greptile-apps · 2026-05-27T00:37:18Z

Greptile Summary

This PR decomposes three large domain modules into focused private helper functions — parser, analysis (module-map, fn-impact, dependencies), and search (generator, semantic, hybrid) — as part of the Titan Paradigm cleanup pass. The changes are pure refactors: no public API signatures change and no new functionality is introduced.

Parser: mergeAnalysisData is split into four single-purpose merge helpers; parseAndExtract and several visitor-builder functions are extracted from the monolithic handleParse; ingestNativeResults and backfillNativeDrops are pulled out of parseFilesAuto.
Analysis: BFS loop bodies in dependencies.ts become visitNeighbor/visitFileNeighbor/buildNextCallerFrontier; impact-analysis BFS is extracted into bfsImportDependents/groupDependentsByLevel; statsData in module-map.ts delegates to buildStatsFromNative/buildStatsFromJs.
Search: buildEmbeddings is decomposed into resolveRoot, loadNodesByFile, prepareEmbeddingTexts, and persistEmbeddings; fuseResults and multiSearchData in hybrid/semantic search receive similarly extracted helpers.

Confidence Score: 5/5

Safe to merge — all extracted helpers are pure decompositions with no observable behavioral changes.

Each extracted function is a mechanical lift of existing inline code with identical logic. Tree disposal in wasm-worker-entry correctly propagates through both the parseAndExtract early-return paths and the handleParse finally block. BFS traversal semantics in dependencies and fn-impact are preserved exactly. The native/JS stat paths in module-map produce the same output shape. No public signatures changed.

No files require special attention.

Important Files Changed

Filename	Overview
src/domain/analysis/dependencies.ts	BFS helpers extracted (buildNextCallerFrontier, visitNeighbor, visitFileNeighbor, reconstructFilePath, buildNeighborStmt); behavior is preserved — target-found early-return semantics match the old continue/break pattern exactly.
src/domain/analysis/fn-impact.ts	recordCaller, processFrontierNode, seedInterfaceImplementors, bfsImportDependents, groupDependentsByLevel extracted; resolveImplementors guard on expandImplementors call preserved correctly in processFrontierNode.
src/domain/analysis/module-map.ts	statsData body split into buildStatsFromNative and buildStatsFromJs; queryFalsePositiveRows/buildFalsePositiveWarnings/computeQualityScore deduplicate the formerly duplicated FP logic across both paths; NativeDatabase type imported.
src/domain/parser.ts	mergeAnalysisData split into mergeScalarMetadata/mergeAnalysisArrays/mergeTypeMap/mergeDefinitionAnalysis; ingestNativeResults and backfillNativeDrops extracted from parseFilesAuto; IMPORT_FIELD_RENAMES data table replaces 15 if-chains in patchImports.
src/domain/search/generator.ts	buildEmbeddings decomposed into resolveRoot, loadNodesByFile, prepareEmbeddingTexts, persistEmbeddings; node count log now computed from byFile map instead of raw nodes array — produces the same value.
src/domain/search/search/hybrid.ts	fuseResults decomposed with createFusionEntry, mergeRankedItem, toHybridResult; type narrowing cast dropped in favour of direct property access on the existing RankedItem union.
src/domain/search/search/semantic.ts	rowVector, checkDimensionMismatch, warnOnSimilarQueries, rankRowsForQuery, fuseRankedHits extracted from multiSearchData; similarityWarnThreshold now read once into a variable before the loop; logic unchanged.
src/domain/wasm-worker-entry.ts	handleParse decomposed into parseAndExtract, runVisitorWalk, serializeExtractorOutput, disposeTree, and four visitor-builder functions; disposeTree correctly placed in both parseAndExtract early-return paths and handleParse finally block.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph parser["parser.ts — parseFilesAuto"]
        PA[parseFilesAuto] --> IR[ingestNativeResults]
        PA --> BND[backfillNativeDrops]
        IR --> BTM[backfillTypeMapBatch]
    end

    subgraph wasm["wasm-worker-entry.ts — handleParse"]
        HP[handleParse] --> PAE[parseAndExtract]
        PAE -->|"null: extractor fails"| DT1[disposeTree]
        PAE -->|"success: tree + symbols"| HP2[handleParse try/finally]
        HP2 --> SV[setupVisitorsLocal]
        HP2 --> RVW[runVisitorWalk]
        HP2 --> SEO[serializeExtractorOutput]
        HP2 -->|"finally"| DT2[disposeTree]
    end

    subgraph analysis["analysis — statsData"]
        SD[statsData] --> jsSections["jsSections"]
        SD -->|"nativeDb present"| BSN[buildStatsFromNative]
        SD -->|"fallback"| BSJ[buildStatsFromJs]
        BSN --> QFP[queryFalsePositiveRows]
        BSN --> CQS[computeQualityScore]
        BSJ --> CMQ[computeQualityMetrics]
        BSJ --> CQS
    end

    subgraph search["search/generator.ts — buildEmbeddings"]
        BE[buildEmbeddings] --> RR[resolveRoot]
        BE --> LNF[loadNodesByFile]
        BE --> PET[prepareEmbeddingTexts]
        BE --> PE[persistEmbeddings]
    end

_{Reviews (4): Last reviewed commit: "Merge branch 'main' into refactor/titan-..." | Re-trigger Greptile}

greptile-apps · 2026-05-27T00:37:22Z

+  let symbols: ExtractorOutput | null;
+  try {
+    const query = _queries.get(entry.id);
+    // tree-sitter's Tree/Query are structurally compatible with
+    // TreeSitterTree/TreeSitterQuery at runtime — same cast style as
+    // parser.ts::wasmExtractSymbols (parser.ts:789).
+    symbols = entry.extractor(tree as any, filePath, query as any) ?? null;
+  } catch {
+    return null;
  }
+  if (!symbols) {
+    return null;
+  }
+  return { tree, symbols };


WASM tree memory leak when extractor fails

The refactor moved tree creation into parseAndExtract, but when the extractor throws (line 718) or returns null (line 722), the function returns null without calling disposeTree. In handleParse, the finally { disposeTree(tree) } block is only reached when parseAndExtract succeeds — the if (!parsed) return null exit on line 830 bypasses it entirely. In the old code, tree was scoped to handleParse and the outer finally block covered all exit paths. Files with unsupported or crashing extractors will now silently accumulate leaked WASM linear memory in long-running workers.

Fixed in ab0b7b3 — added disposeTree(tree) before each early-return null path in parseAndExtract (extractor-throws catch block and symbols-null guard). The tree is now always released when the extractor fails, preventing WASM linear memory accumulation in long-running workers.

github-actions · 2026-05-27T00:40:52Z

Codegraph Impact Analysis

75 functions changed → 75 callers affected across 32 files

buildNextCallerFrontier in src/domain/analysis/dependencies.ts:64 (3 transitive callers)
buildTransitiveCallers in src/domain/analysis/dependencies.ts:84 (3 transitive callers)
buildNeighborStmt in src/domain/analysis/dependencies.ts:292 (3 transitive callers)
visitNeighbor in src/domain/analysis/dependencies.ts:311 (3 transitive callers)
bfsShortestPath in src/domain/analysis/dependencies.ts:341 (4 transitive callers)
visitFileNeighbor in src/domain/analysis/dependencies.ts:534 (3 transitive callers)
reconstructFilePath in src/domain/analysis/dependencies.ts:558 (3 transitive callers)
bfsFilePath in src/domain/analysis/dependencies.ts:574 (3 transitive callers)
recordCaller in src/domain/analysis/fn-impact.ts:87 (7 transitive callers)
processFrontierNode in src/domain/analysis/fn-impact.ts:106 (11 transitive callers)
seedInterfaceImplementors in src/domain/analysis/fn-impact.ts:127 (11 transitive callers)
bfsTransitiveCallers in src/domain/analysis/fn-impact.ts:143 (16 transitive callers)
bfsImportDependents in src/domain/analysis/fn-impact.ts:195 (3 transitive callers)
groupDependentsByLevel in src/domain/analysis/fn-impact.ts:227 (3 transitive callers)
impactAnalysisData in src/domain/analysis/fn-impact.ts:241 (2 transitive callers)
computeQualityMetrics in src/domain/analysis/module-map.ts:166 (3 transitive callers)
queryFalsePositiveRows in src/domain/analysis/module-map.ts:336 (5 transitive callers)
buildFalsePositiveWarnings in src/domain/analysis/module-map.ts:354 (5 transitive callers)
computeQualityScore in src/domain/analysis/module-map.ts:363 (5 transitive callers)
aggregateRolesFromNative in src/domain/analysis/module-map.ts:372 (3 transitive callers)

…osition-domain

…Extract (#1236) When the extractor throws or returns null, the tree allocated by parser.parse is now disposed before returning null, preventing WASM linear memory accumulation in long-running workers.

carlos-alm · 2026-05-27T05:18:08Z

@greptileai

carlos-alm added 3 commits May 26, 2026 18:32

refactor(parser): extract LANGUAGE_REGISTRY iteration and worker boun…

52b9885

…dary helpers

refactor(search): decompose generator and reduce complexity in semant…

d837e89

…ic and hybrid search

greptile-apps Bot reviewed May 27, 2026

View reviewed changes

carlos-alm added 2 commits May 26, 2026 23:14

Merge remote-tracking branch 'origin/main' into refactor/titan-decomp…

fef599e

…osition-domain

fix(worker): dispose WASM tree on extractor-failure paths in parseAnd…

ab0b7b3

…Extract (#1236) When the extractor throws or returns null, the tree allocated by parser.parse is now disposed before returning null, preventing WASM linear memory accumulation in long-running workers.

Merge branch 'main' into refactor/titan-decomposition-domain

b0f762c

carlos-alm merged commit 8ccafa0 into main May 28, 2026
21 checks passed

carlos-alm deleted the refactor/titan-decomposition-domain branch May 28, 2026 03:52

github-actions Bot locked and limited conversation to collaborators May 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(domain): decompose parser, analysis, and search modules#1236

refactor(domain): decompose parser, analysis, and search modules#1236
carlos-alm merged 6 commits into
mainfrom
refactor/titan-decomposition-domain

carlos-alm commented May 27, 2026

Uh oh!

claude Bot commented May 27, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 27, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot May 27, 2026

Uh oh!

carlos-alm May 27, 2026

Uh oh!

github-actions Bot commented May 27, 2026 •

edited

Loading

Uh oh!

carlos-alm commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented May 27, 2026

Summary

Commits

Context

Caveats

Test plan

Uh oh!

claude Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root Cause Analysis

Phase 1: Root Cause Analysis ✅

Phase 2: Configuration Change Detection ✅

Phase 3: Test Integrity ✅

Phase 4: Code Quality Assessment ✅

Phase 5: Backlog Compliance

Final Recommendation

Uh oh!

greptile-apps Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm May 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codegraph Impact Analysis

Uh oh!

carlos-alm commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented May 27, 2026 •

edited

Loading

greptile-apps Bot commented May 27, 2026 •

edited

Loading

github-actions Bot commented May 27, 2026 •

edited

Loading