Add root management and autonomous indexing tools to MCP server#15
Closed
rickross wants to merge 62 commits into
Closed
Add root management and autonomous indexing tools to MCP server#15rickross wants to merge 62 commits into
rickross wants to merge 62 commits into
Conversation
Adds codegraph_get_project and codegraph_set_project tools to enable AI assistants to work across multiple indexed projects in a single session. - codegraph_get_project: Returns currently active project path - codegraph_set_project: Switches to different project (closes old, opens new) - Updated CLAUDE.md documentation This enables multi-project workflows without restarting the MCP server.
Merges jasques' fix for tree-sitter installation issues. - Pins tree-sitter dependencies to exact versions (no ^ ranges) - Adds npm overrides to force tree-sitter@0.22.4 - Removes non-existent queries copy from build script Credit: #9 Co-authored-by: Łukasz Jakóbiec <jasques@users.noreply.github.com>
Adds init, index, and sync tools to enable autonomous project management: - codegraph_init_project: Initialize CodeGraph in a new project - codegraph_index_project: Perform full index of all files - codegraph_sync_project: Incremental update (changed files only) These tools enable AI assistants to discover and index new projects without requiring manual shell commands, making multi-project workflows fully autonomous.
Renamed tools for clarity and simplicity: - codegraph_get_project → codegraph_get_root - codegraph_set_project → codegraph_set_root - codegraph_init_project → codegraph_init (operates on current root) - codegraph_index_project → codegraph_index (operates on current root) - codegraph_sync_project → codegraph_sync (operates on current root) Benefits: - Clearer mental model: set a root, then operations work on that root - No redundant path parameters - Simpler API surface
Shows immediate feedback about the root you just switched to: - Files indexed - Total nodes/edges - Database size No need to run separate status command after switching.
Changed 'codegraph init' → 'codegraph_init' to reference the correct MCP tool name instead of CLI command.
Completes the lifecycle management: - init: create .codegraph/ - index/sync: populate/update index - uninit: remove .codegraph/ (cleanup) Calls CodeGraph.uninitialize() which closes DB and deletes the .codegraph directory.
Documents how to configure CodeGraph with OpenCode in addition to existing Claude Code instructions.
Changed from OpenCode-specific to generic MCP client config. Keeps it neutral and broadly applicable.
Makes it clear the example is for OpenCode and shows the typical config file location.
Problem: - sync was taking 80+ seconds on projects with 20K+ unresolved refs - Low CPU utilization (9%) indicated I/O bottleneck - Root cause: 26K repeated SQLite queries for symbol lookups Solution: - Pre-load all symbols into memory maps indexed by name/qualified name - Cache lookup in getNodesByName() and getNodesByQualifiedName() - warmCaches() called once at start of resolveAll() Results: - sync time: 80s → 1s (80x speedup) - CPU utilization: 9% → 95% (actually using available resources) - Memory trade-off: ~few MB for symbol cache (negligible) Tested on 681-file codebase with 26,233 unresolved references.
- Add insertUnresolvedRefsBatch() method using SQLite transaction - Replace N individual inserts with single batched transaction - Expected 10-100x speedup on post-parsing phase depending on ref count This avoids repeated transaction overhead when indexing files with many unresolved references.
- Add timing breakdown to IndexResult (scanning, parsing, storing, resolving) - Report progress during 'storing' phase (was silent before) - Track per-file parse times to identify bottlenecks - Users can now see where time is spent during indexing This provides visibility into performance bottlenecks and makes long indexing operations less mysterious.
- Index command now calls resolveReferences() after indexing - Added progress logging during resolution (every 100 refs) - Shows resolved/unresolved counts at completion - This was the missing 'resolving' phase that took most of the time The 'index' command was only parsing+storing but not resolving, so edges weren't being created. Now the full pipeline runs.
- resolveReferences() now accepts onProgress callback - CLI shows real-time progress bar during resolution - Updates every 100ms with current/total refs - Shows resolution duration separately from indexing - Much better UX during the slow resolution phase
- Changed fs.readFileSync to async fs.promises.readFile - Process files in batches of 20 with Promise.all - Overlaps I/O operations instead of sequential reads - Should utilize idle CPU cores better (was 25% CPU, I/O bound) Expected impact: 2-4x faster indexing on projects with many files since file reading is now parallel instead of sequential.
- synchronous=NORMAL: Faster writes (safe with WAL mode) - cache_size=64MB: Larger cache for better read performance - temp_store=MEMORY: Keep temporary tables in RAM - mmap_size=256MB: Memory-mapped I/O for faster access These pragmas should improve both read and write performance significantly without compromising data integrity.
Documents all 6 major optimizations made in performance branch: 1. Batch insert for unresolved refs 2. Detailed timing breakdown 3. Progress reporting for all phases 4. Reference resolution in index command 5. Parallel file I/O 6. SQLite performance pragmas Includes expected impact, benchmarks, and testing instructions.
Resolution was not clearing old edges before creating new ones, causing edges to accumulate on each index run. Now deletes existing edges from source nodes before inserting new resolved edges, preventing duplicates.
Schema.sql execution was accidentally removed when adding performance pragmas, causing 'no such table' errors on fresh init. This restores the schema initialization that creates all tables.
indexAll() was showing 'Resolving refs: 0%' placeholder that did nothing, confusing users before the real resolution started. Resolution happens separately after indexing via resolveReferences(), so removed the misleading progress indicator.
Adds 'codegraph uninit' command to match MCP tool functionality. Includes confirmation prompt to prevent accidental data loss. Usage: codegraph uninit [path]
Users expect 'y' to work, not just 'yes'. Also changed prompt to (y/n) to be clearer.
Shows feedback during the initialization phase (getUnresolvedReferences + warmCaches) so users know the process hasn't hung. Message clears when progress starts.
Changed from clearing at current===100 to clearing on first callback. This ensures the message clears properly when resolution starts.
Displays same stats block as 'codegraph status' after resolution, showing accurate file/node/edge counts and DB size. Eliminates confusion between intermediate counts (412 edges) and final totals (12,159 edges after resolution).
warmCaches was calling getNodesByFile() for each file (880 queries). Changed to single getAllNodes() query and build caches in memory. This was causing ~60 second 'Preparing resolver' delay. Expected to reduce to <1 second.
Added DEBUG logs to measure: - getUnresolvedReferences - getAllNodes (warmCaches) - Convert refs format (22K getNodeById calls) - resolveAndPersist This will identify where the 60s 'stuck' phase is happening.
Fixes by GPT-5.3 Codex addressing code review findings: Critical fixes (P0): - Add await to sync() resolution calls (prevents DB race conditions) - Remove dual DB handle (eliminates connection leaks) - Fix edge cleanup key parsing (handles IDs with colons) Important fixes (P1): - Fix graph traversal 'both' direction (correct neighbor selection) - Fix type hierarchy descendants (separate visited sets) - Add arrow function extraction support Lower priority (P2): - Add missing languages to config validation - Fix VSS search LIMIT issue - Fix toFloat32Array() data copying Improvements: - Restore getDetectedFrameworks() API - Enhanced test coverage - All 200 tests passing Co-authored-by: GPT-5.3 Codex
Recovered from commit 76e6e7b. Guide contains best practices for AI assistants using CodeGraph tools effectively.
colbymchenry
added a commit
that referenced
this pull request
Feb 10, 2026
- Fix Float32Array embedder bug: was creating zero-filled array instead of copying data from TypedArray-like objects - Fix VSS search query: use subquery pattern so LIMIT applies before JOIN - Pin tree-sitter versions: remove caret ranges for ABI stability, add overrides to lock tree-sitter core at 0.22.4 - Lazy grammar loading: load native bindings on first use per language instead of all at startup, so one missing grammar doesn't affect others - Remove stale src/extraction/queries copy from copy-assets script Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
colbymchenry
added a commit
that referenced
this pull request
Feb 10, 2026
- SQLite performance pragmas: synchronous=NORMAL, 64MB cache, memory temp store, 256MB mmap (safe with WAL mode) - Batch insert for unresolved refs: single transaction instead of N individual inserts per file - Symbol caching (warmCaches): pre-load all nodes into memory maps before resolution, eliminating repeated SQLite queries per ref - Async file I/O: fs.stat/readFile in indexFile() are now non-blocking - Denormalize filePath/language onto UnresolvedReference: avoids N node lookups during resolution, with schema migration v2 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
colbymchenry
added a commit
that referenced
this pull request
Feb 10, 2026
- Fix arrow function extraction: explicitly call extractFunction() for arrow functions/function expressions in variable declarations instead of silently skipping them (all 6 arrow function tests now pass) - Best-candidate resolution: collect candidates from all strategies and return highest confidence match instead of first match - Fix graph traversal 'both' direction: correctly determine next node for mixed incoming/outgoing edges in BFS and DFS Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
6 tasks
Owner
|
Hey @rickross, thanks for this massive PR! While there were too many merge conflicts to merge directly, we went through it thoroughly and cherry-picked the most valuable changes into PR #19. This includes bug fixes (Float32Array, VSS query), performance improvements (SQLite pragmas, batch inserts, symbol caching), extraction quality fixes (arrow function handling, best-candidate resolution), the CLI uninit command, and MCP improvements. Appreciate the effort you put into this! |
4 tasks
colbymchenry
added a commit
that referenced
this pull request
Feb 10, 2026
Port quality improvements from PR #15
jorgerobles
pushed a commit
to jorgerobles/codegraph
that referenced
this pull request
Jun 1, 2026
- Fix Float32Array embedder bug: was creating zero-filled array instead of copying data from TypedArray-like objects - Fix VSS search query: use subquery pattern so LIMIT applies before JOIN - Pin tree-sitter versions: remove caret ranges for ABI stability, add overrides to lock tree-sitter core at 0.22.4 - Lazy grammar loading: load native bindings on first use per language instead of all at startup, so one missing grammar doesn't affect others - Remove stale src/extraction/queries copy from copy-assets script Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
jorgerobles
pushed a commit
to jorgerobles/codegraph
that referenced
this pull request
Jun 1, 2026
- SQLite performance pragmas: synchronous=NORMAL, 64MB cache, memory temp store, 256MB mmap (safe with WAL mode) - Batch insert for unresolved refs: single transaction instead of N individual inserts per file - Symbol caching (warmCaches): pre-load all nodes into memory maps before resolution, eliminating repeated SQLite queries per ref - Async file I/O: fs.stat/readFile in indexFile() are now non-blocking - Denormalize filePath/language onto UnresolvedReference: avoids N node lookups during resolution, with schema migration v2 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
jorgerobles
pushed a commit
to jorgerobles/codegraph
that referenced
this pull request
Jun 1, 2026
- Fix arrow function extraction: explicitly call extractFunction() for arrow functions/function expressions in variable declarations instead of silently skipping them (all 6 arrow function tests now pass) - Best-candidate resolution: collect candidates from all strategies and return highest confidence match instead of first match - Fix graph traversal 'both' direction: correctly determine next node for mixed incoming/outgoing edges in BFS and DFS Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
jorgerobles
pushed a commit
to jorgerobles/codegraph
that referenced
this pull request
Jun 1, 2026
Port quality improvements from PR colbymchenry#15
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi Colby!
We've started using CodeGraph extensively and find it incredibly valuable. We added several MCP tools to make it easier for managing multiple projects and for supporting AI agents handling project lifecycle issues (init/index/sync). We hope these will be helpful contributions.
What's in this PR
1. Root Management Tools
2. Project Lifecycle Tools
AI assistants can now manage CodeGraph project lifecycle on behalf of users:
These operate on the "current root" set by
set_root, providing a clean context-based API.3. Performance Optimization (80x speedup for reference resolution)
Optimized reference resolution with symbol caching:
Tested on 681-file codebase with 26,233 unresolved references.
4. Tree-sitter Fixes
Merged @jasques' PR #9 fixes for tree-sitter installation issues (with full credit).
Use Cases
Multi-project workflows:
Autonomous project lifecycle:
Testing
Verified across 3 different projects (TypeScript, Python, mixed) with successful initialization, indexing, syncing, and querying.
Let us know if you'd like any changes or have questions about the design!