Add root management and autonomous indexing tools to MCP server by rickross · Pull Request #15 · colbymchenry/codegraph

rickross · 2026-02-05T16:16:06Z

Hi Colby!

We've started using CodeGraph extensively and find it incredibly valuable. We added several MCP tools to make it easier for managing multiple projects and for supporting AI agents handling project lifecycle issues (init/index/sync). We hope these will be helpful contributions.

What's in this PR

1. Root Management Tools

codegraph_get_root - Get currently active root path
codegraph_set_root - Switch between indexed projects
- Now returns status immediately (files, nodes, edges, DB size) - no separate status call needed
- Better error messages for uninitialized roots

2. Project Lifecycle Tools

AI assistants can now manage CodeGraph project lifecycle on behalf of users:

codegraph_init - Initialize CodeGraph in current root
codegraph_index - Full index of current root
codegraph_sync - Incremental sync of current root
codegraph_uninit - Remove CodeGraph from current root (cleanup)

These operate on the "current root" set by set_root, providing a clean context-based API.

3. Performance Optimization (80x speedup for reference resolution)

Optimized reference resolution with symbol caching:

Before: 80+ seconds for sync on projects with 20K+ unresolved refs (9% CPU)
After: ~1 second (95% CPU utilization)
Cause: Repeated SQLite queries for symbol lookups
Fix: Pre-load all symbols into memory maps indexed by name/qualified name

Tested on 681-file codebase with 26,233 unresolved references.

4. Tree-sitter Fixes

Merged @jasques' PR #9 fixes for tree-sitter installation issues (with full credit).

Use Cases

Multi-project workflows:

codegraph_set_root("/path/to/project-a")  // Shows: 500 files, 2000 nodes...
codegraph_search("AuthService")

codegraph_set_root("/path/to/project-b")  // Shows: 800 files, 3500 nodes...
codegraph_search("AuthService")

Autonomous project lifecycle:

codegraph_set_root("/path/to/new-project")  // Detects not initialized
codegraph_init()                              // Initialize it
codegraph_index()                             // Index it (now 80x faster!)
codegraph_search("main")                     // Ready to use!
// ... later ...
codegraph_uninit()                            // Clean up when done

Testing

Verified across 3 different projects (TypeScript, Python, mixed) with successful initialization, indexing, syncing, and querying.

Let us know if you'd like any changes or have questions about the design!

Adds codegraph_get_project and codegraph_set_project tools to enable AI assistants to work across multiple indexed projects in a single session. - codegraph_get_project: Returns currently active project path - codegraph_set_project: Switches to different project (closes old, opens new) - Updated CLAUDE.md documentation This enables multi-project workflows without restarting the MCP server.

Merges jasques' fix for tree-sitter installation issues. - Pins tree-sitter dependencies to exact versions (no ^ ranges) - Adds npm overrides to force tree-sitter@0.22.4 - Removes non-existent queries copy from build script Credit: #9 Co-authored-by: Łukasz Jakóbiec <jasques@users.noreply.github.com>

Adds init, index, and sync tools to enable autonomous project management: - codegraph_init_project: Initialize CodeGraph in a new project - codegraph_index_project: Perform full index of all files - codegraph_sync_project: Incremental update (changed files only) These tools enable AI assistants to discover and index new projects without requiring manual shell commands, making multi-project workflows fully autonomous.

Renamed tools for clarity and simplicity: - codegraph_get_project → codegraph_get_root - codegraph_set_project → codegraph_set_root - codegraph_init_project → codegraph_init (operates on current root) - codegraph_index_project → codegraph_index (operates on current root) - codegraph_sync_project → codegraph_sync (operates on current root) Benefits: - Clearer mental model: set a root, then operations work on that root - No redundant path parameters - Simpler API surface

Shows immediate feedback about the root you just switched to: - Files indexed - Total nodes/edges - Database size No need to run separate status command after switching.

Changed 'codegraph init' → 'codegraph_init' to reference the correct MCP tool name instead of CLI command.

Completes the lifecycle management: - init: create .codegraph/ - index/sync: populate/update index - uninit: remove .codegraph/ (cleanup) Calls CodeGraph.uninitialize() which closes DB and deletes the .codegraph directory.

Documents how to configure CodeGraph with OpenCode in addition to existing Claude Code instructions.

Changed from OpenCode-specific to generic MCP client config. Keeps it neutral and broadly applicable.

Makes it clear the example is for OpenCode and shows the typical config file location.

Problem: - sync was taking 80+ seconds on projects with 20K+ unresolved refs - Low CPU utilization (9%) indicated I/O bottleneck - Root cause: 26K repeated SQLite queries for symbol lookups Solution: - Pre-load all symbols into memory maps indexed by name/qualified name - Cache lookup in getNodesByName() and getNodesByQualifiedName() - warmCaches() called once at start of resolveAll() Results: - sync time: 80s → 1s (80x speedup) - CPU utilization: 9% → 95% (actually using available resources) - Memory trade-off: ~few MB for symbol cache (negligible) Tested on 681-file codebase with 26,233 unresolved references.

…thod

- Add insertUnresolvedRefsBatch() method using SQLite transaction - Replace N individual inserts with single batched transaction - Expected 10-100x speedup on post-parsing phase depending on ref count This avoids repeated transaction overhead when indexing files with many unresolved references.

- Add timing breakdown to IndexResult (scanning, parsing, storing, resolving) - Report progress during 'storing' phase (was silent before) - Track per-file parse times to identify bottlenecks - Users can now see where time is spent during indexing This provides visibility into performance bottlenecks and makes long indexing operations less mysterious.

- Index command now calls resolveReferences() after indexing - Added progress logging during resolution (every 100 refs) - Shows resolved/unresolved counts at completion - This was the missing 'resolving' phase that took most of the time The 'index' command was only parsing+storing but not resolving, so edges weren't being created. Now the full pipeline runs.

- resolveReferences() now accepts onProgress callback - CLI shows real-time progress bar during resolution - Updates every 100ms with current/total refs - Shows resolution duration separately from indexing - Much better UX during the slow resolution phase

- Changed fs.readFileSync to async fs.promises.readFile - Process files in batches of 20 with Promise.all - Overlaps I/O operations instead of sequential reads - Should utilize idle CPU cores better (was 25% CPU, I/O bound) Expected impact: 2-4x faster indexing on projects with many files since file reading is now parallel instead of sequential.

- synchronous=NORMAL: Faster writes (safe with WAL mode) - cache_size=64MB: Larger cache for better read performance - temp_store=MEMORY: Keep temporary tables in RAM - mmap_size=256MB: Memory-mapped I/O for faster access These pragmas should improve both read and write performance significantly without compromising data integrity.

Documents all 6 major optimizations made in performance branch: 1. Batch insert for unresolved refs 2. Detailed timing breakdown 3. Progress reporting for all phases 4. Reference resolution in index command 5. Parallel file I/O 6. SQLite performance pragmas Includes expected impact, benchmarks, and testing instructions.

Resolution was not clearing old edges before creating new ones, causing edges to accumulate on each index run. Now deletes existing edges from source nodes before inserting new resolved edges, preventing duplicates.

Schema.sql execution was accidentally removed when adding performance pragmas, causing 'no such table' errors on fresh init. This restores the schema initialization that creates all tables.

indexAll() was showing 'Resolving refs: 0%' placeholder that did nothing, confusing users before the real resolution started. Resolution happens separately after indexing via resolveReferences(), so removed the misleading progress indicator.

Adds 'codegraph uninit' command to match MCP tool functionality. Includes confirmation prompt to prevent accidental data loss. Usage: codegraph uninit [path]

Users expect 'y' to work, not just 'yes'. Also changed prompt to (y/n) to be clearer.

Shows feedback during the initialization phase (getUnresolvedReferences + warmCaches) so users know the process hasn't hung. Message clears when progress starts.

Changed from clearing at current===100 to clearing on first callback. This ensures the message clears properly when resolution starts.

Displays same stats block as 'codegraph status' after resolution, showing accurate file/node/edge counts and DB size. Eliminates confusion between intermediate counts (412 edges) and final totals (12,159 edges after resolution).

warmCaches was calling getNodesByFile() for each file (880 queries). Changed to single getAllNodes() query and build caches in memory. This was causing ~60 second 'Preparing resolver' delay. Expected to reduce to <1 second.

Added DEBUG logs to measure: - getUnresolvedReferences - getAllNodes (warmCaches) - Convert refs format (22K getNodeById calls) - resolveAndPersist This will identify where the 60s 'stuck' phase is happening.

Fixes by GPT-5.3 Codex addressing code review findings: Critical fixes (P0): - Add await to sync() resolution calls (prevents DB race conditions) - Remove dual DB handle (eliminates connection leaks) - Fix edge cleanup key parsing (handles IDs with colons) Important fixes (P1): - Fix graph traversal 'both' direction (correct neighbor selection) - Fix type hierarchy descendants (separate visited sets) - Add arrow function extraction support Lower priority (P2): - Add missing languages to config validation - Fix VSS search LIMIT issue - Fix toFloat32Array() data copying Improvements: - Restore getDetectedFrameworks() API - Enhanced test coverage - All 200 tests passing Co-authored-by: GPT-5.3 Codex

Recovered from commit 76e6e7b. Guide contains best practices for AI assistants using CodeGraph tools effectively.

- Fix Float32Array embedder bug: was creating zero-filled array instead of copying data from TypedArray-like objects - Fix VSS search query: use subquery pattern so LIMIT applies before JOIN - Pin tree-sitter versions: remove caret ranges for ABI stability, add overrides to lock tree-sitter core at 0.22.4 - Lazy grammar loading: load native bindings on first use per language instead of all at startup, so one missing grammar doesn't affect others - Remove stale src/extraction/queries copy from copy-assets script Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- SQLite performance pragmas: synchronous=NORMAL, 64MB cache, memory temp store, 256MB mmap (safe with WAL mode) - Batch insert for unresolved refs: single transaction instead of N individual inserts per file - Symbol caching (warmCaches): pre-load all nodes into memory maps before resolution, eliminating repeated SQLite queries per ref - Async file I/O: fs.stat/readFile in indexFile() are now non-blocking - Denormalize filePath/language onto UnresolvedReference: avoids N node lookups during resolution, with schema migration v2 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Fix arrow function extraction: explicitly call extractFunction() for arrow functions/function expressions in variable declarations instead of silently skipping them (all 6 arrow function tests now pass) - Best-candidate resolution: collect candidates from all strategies and return highest confidence match instead of first match - Fix graph traversal 'both' direction: correctly determine next node for mixed incoming/outgoing edges in BFS and DFS Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

colbymchenry · 2026-02-10T06:09:31Z

Hey @rickross, thanks for this massive PR! While there were too many merge conflicts to merge directly, we went through it thoroughly and cherry-picked the most valuable changes into PR #19. This includes bug fixes (Float32Array, VSS query), performance improvements (SQLite pragmas, batch inserts, symbol caching), extraction quality fixes (arrow function handling, best-candidate resolution), the CLI uninit command, and MCP improvements. Appreciate the effort you put into this!

Port quality improvements from PR #15

- Fix Float32Array embedder bug: was creating zero-filled array instead of copying data from TypedArray-like objects - Fix VSS search query: use subquery pattern so LIMIT applies before JOIN - Pin tree-sitter versions: remove caret ranges for ABI stability, add overrides to lock tree-sitter core at 0.22.4 - Lazy grammar loading: load native bindings on first use per language instead of all at startup, so one missing grammar doesn't affect others - Remove stale src/extraction/queries copy from copy-assets script Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- SQLite performance pragmas: synchronous=NORMAL, 64MB cache, memory temp store, 256MB mmap (safe with WAL mode) - Batch insert for unresolved refs: single transaction instead of N individual inserts per file - Symbol caching (warmCaches): pre-load all nodes into memory maps before resolution, eliminating repeated SQLite queries per ref - Async file I/O: fs.stat/readFile in indexFile() are now non-blocking - Denormalize filePath/language onto UnresolvedReference: avoids N node lookups during resolution, with schema migration v2 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Fix arrow function extraction: explicitly call extractFunction() for arrow functions/function expressions in variable declarations instead of silently skipping them (all 6 arrow function tests now pass) - Best-candidate resolution: collect candidates from all strategies and return highest confidence match instead of first match - Fix graph traversal 'both' direction: correctly determine next node for mixed incoming/outgoing edges in BFS and DFS Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Port quality improvements from PR colbymchenry#15

jasques and others added 30 commits January 31, 2026 18:31

fixed tree-sitter dependency versions to enable installation

7163448

set_root now returns status of newly set root

861851a

Shows immediate feedback about the root you just switched to: - Files indexed - Total nodes/edges - Database size No need to run separate status command after switching.

Better error message for uninitialized root

203f09e

Changed 'codegraph init' → 'codegraph_init' to reference the correct MCP tool name instead of CLI command.

Add codegraph_uninit tool for cleanup

8011e43

Completes the lifecycle management: - init: create .codegraph/ - index/sync: populate/update index - uninit: remove .codegraph/ (cleanup) Calls CodeGraph.uninitialize() which closes DB and deletes the .codegraph directory.

Add OpenCode installation instructions to README

9d516e5

Documents how to configure CodeGraph with OpenCode in addition to existing Claude Code instructions.

Simplify README to generic MCP client instructions

ea17f29

Changed from OpenCode-specific to generic MCP client config. Keeps it neutral and broadly applicable.

Clarify OpenCode-specific config in README

f4cf8d3

Makes it clear the example is for OpenCode and shows the typical config file location.

Clean up: add comment explaining cache optimization, remove unused me…

df57a4f

…thod

fix: prevent duplicate edges when re-indexing

fb9920b

Resolution was not clearing old edges before creating new ones, causing edges to accumulate on each index run. Now deletes existing edges from source nodes before inserting new resolved edges, preventing duplicates.

fix: CRITICAL - restore schema initialization in DB create

6889f90

Schema.sql execution was accidentally removed when adding performance pragmas, causing 'no such table' errors on fresh init. This restores the schema initialization that creates all tables.

feat: add uninit CLI command with safety confirmation

57cde72

Adds 'codegraph uninit' command to match MCP tool functionality. Includes confirmation prompt to prevent accidental data loss. Usage: codegraph uninit [path]

fix: accept both 'y' and 'yes' for uninit confirmation

c4244cb

Users expect 'y' to work, not just 'yes'. Also changed prompt to (y/n) to be clearer.

feat: show 'Preparing resolver...' during warmCaches

9806ecd

Shows feedback during the initialization phase (getUnresolvedReferences + warmCaches) so users know the process hasn't hung. Message clears when progress starts.

fix: clear 'Preparing resolver' on first progress update

8e05cc5

Changed from clearing at current===100 to clearing on first callback. This ensures the message clears properly when resolution starts.

perf: CRITICAL - fix warmCaches doing 880 queries instead of 1

dddcd2b

warmCaches was calling getNodesByFile() for each file (880 queries). Changed to single getAllNodes() query and build caches in memory. This was causing ~60 second 'Preparing resolver' delay. Expected to reduce to <1 second.

debug: add timing instrumentation to find bottleneck

6542555

Added DEBUG logs to measure: - getUnresolvedReferences - getAllNodes (warmCaches) - Convert refs format (22K getNodeById calls) - resolveAndPersist This will identify where the 60s 'stuck' phase is happening.

rickross added 17 commits February 7, 2026 08:14

Fix core runtime, extraction, and graph traversal issues

7b50c6e

Restore AI_GUIDE.md lost during git detached HEAD incident

a2d3b21

Recovered from commit 76e6e7b. Guide contains best practices for AI assistants using CodeGraph tools effectively.

Improve search quality and MCP tool ergonomics

ba023d3

Update docs and installer for MCP naming and dev versioning

26d1bdd

Harden MCP scoping, narrowing, and trace coverage hints

24e16e6

Track index provenance and expose MCP/runtime versions

55c0815

Update messaging to AI agent focus

dcc3730

Add resilient parser loading and SCIP semantic import

5db279d

Enable default SCIP auto-import with opt-out controls

30ed5e4

Improve SCIP import scaling, progress UX, and version metadata

a65e9bf

Use SCIP-first pass to resolve refs before heuristics

7ae1b39

Improve SCIP-aware ranking and traversal evidence ordering

ad7b074

chore: checkpoint local docs and type updates before resolver/scip work

5e5d7cb

feat: improve resolver ranking and incremental scip import

af862ec

feat: add api intent mode for mcp search and context

4f5d540

fix: bias api intent toward backend route paths

d7cb282

colbymchenry mentioned this pull request Feb 10, 2026

Security hardening, correctness fixes, and documentation improvements #16

Closed

6 tasks

colbymchenry closed this Feb 10, 2026

colbymchenry mentioned this pull request Feb 10, 2026

Port quality improvements from PR #15 #20

Merged

4 tasks

colbymchenry added a commit that referenced this pull request Feb 10, 2026

Merge pull request #20 from colbymchenry/pr-19

5e2d3d6

Port quality improvements from PR #15

jorgerobles pushed a commit to jorgerobles/codegraph that referenced this pull request Jun 1, 2026

Merge pull request colbymchenry#20 from colbymchenry/pr-19

26da0cb

Port quality improvements from PR colbymchenry#15

gyorgybalazsi mentioned this pull request Jun 12, 2026

spec(codegraph-rs): external-package references design #827

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add root management and autonomous indexing tools to MCP server#15

Add root management and autonomous indexing tools to MCP server#15
rickross wants to merge 62 commits into
colbymchenry:mainfrom
rickross:main

rickross commented Feb 5, 2026 •

edited

Loading

Uh oh!

colbymchenry commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rickross commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's in this PR

1. Root Management Tools

2. Project Lifecycle Tools

3. Performance Optimization (80x speedup for reference resolution)

4. Tree-sitter Fixes

Use Cases

Testing

Uh oh!

colbymchenry commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rickross commented Feb 5, 2026 •

edited

Loading