Skip to content

feat(mcp): index_repo tool (T4 #652)#678

Merged
galshubeli merged 5 commits into
stagingfrom
dvirdukhan/mcp-t4-index-repo
Jun 8, 2026
Merged

feat(mcp): index_repo tool (T4 #652)#678
galshubeli merged 5 commits into
stagingfrom
dvirdukhan/mcp-t4-index-repo

Conversation

@DvirDukhan

@DvirDukhan DvirDukhan commented May 27, 2026

Copy link
Copy Markdown
Contributor

Prerequisites (merge order)

None — branches directly off staging; can merge independently.


Closes #652.

Stacked on #677 (T3). First real MCP tool — agents can call index_repo(path_or_url, branch) over stdio.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Summary by CodeRabbit

  • New Features

    • Added index_repo tool for indexing code repositories from local paths or HTTP(S) git URLs
    • Tool returns graph metadata including node/edge counts and detected file extensions
    • Tools now auto-register on server startup, ensuring consistent availability across import methods
  • Tests

    • Added comprehensive test coverage for repository indexing and server registration

First real MCP tool. Wraps the existing Project / SourceAnalyzer
pipeline so AI agents can call `index_repo(path_or_url, branch)` over
stdio to populate code-graph for a repo.

- `api/mcp/tools/structural.py` (NEW) — registers `index_repo` on
  the shared FastMCP app. Accepts local paths or git URLs;
  auto-detects branch from local git checkouts via T17's
  `detect_branch`; honors `ALLOWED_ANALYSIS_DIR` for sandboxing.
  Non-git folders are handled by driving SourceAnalyzer directly
  (Project requires a git repo).
- `api/mcp/tools/__init__.py` (NEW) — package marker; importing it
  registers every tool module's `@app.tool()` decorators.
- `api/mcp/server.py` — imports tools at module load so both direct
  `from api.mcp.server import app` and `cgraph-mcp` stdio entry
  point see the same tool list.
- `tests/mcp/test_index_repo.py` (NEW) — 5 tests: local-path happy
  path, missing-path error, ALLOWED_ANALYSIS_DIR sandboxing,
  in-process app registration, JSON serialisability.
- `tests/mcp/test_scaffold.py` — replaced the "zero tools"
  assertion with a presence check for `index_repo` so it stays
  stable as T5-T8 / T11 add more tools.

Return shape:
  {project_name, branch, graph_name, num_nodes, num_edges,
   languages_detected, mode}

`incremental` parameter is accepted now and forwarded once T18
lands; the current full-reindex path ignores it and always returns
`mode="full"`.

All 8 tests pass against FalkorDB on 6390.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@coderabbitai

coderabbitai Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR implements the index_repo MCP tool for indexing code repositories into the code-graph. The tool auto-registers at import time, validates local paths and HTTP(S) URLs, routes git repositories through the existing Project abstraction and non-git folders through SourceAnalyzer, enforces sandbox restrictions via ALLOWED_ANALYSIS_DIR, auto-detects branches, and returns repository metadata including node/edge counts and detected languages.

Changes

Index Repo MCP Tool Implementation and Testing

Layer / File(s) Summary
Tool registration wiring
api/mcp/server.py, api/mcp/tools/__init__.py
FastMCP app auto-registers tools at import time by importing the local tools module after app creation, ensuring shared tool list across direct import and stdio entry points.
Index repo tool with validation and indexing logic
api/mcp/tools/structural.py
Implements index_repo with URL/path validation, HTTP(S)-only enforcement, local path resolution and sandboxing via ALLOWED_ANALYSIS_DIR, git vs non-git routing, automatic branch detection from HEAD, fallback for repos without remotes, and metadata computation with graph statistics and language detection. Runs indexing in executor to keep MCP loop responsive.
FalkorDB reachability fixture
tests/mcp/conftest.py
Adds require_falkordb() fixture that checks FalkorDB connectivity and skips dependent tests with host/port information when database is unreachable.
Comprehensive tool tests
tests/mcp/test_index_repo.py
Tests local non-git indexing with metadata validation, missing path error handling, ALLOWED_ANALYSIS_DIR sandbox enforcement, tool registration and input schema verification via app.list_tools(), JSON serialization, SSH URL rejection with actionable error, and local git repos without remote as fallback.
Scaffold test update
tests/mcp/test_scaffold.py
Updates stdio server smoke test to assert tool registration success by verifying index_repo appears in tool list after handshake, replacing zero-tools assertion.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • gkorland
  • galshubeli

🐰 A tool to index the code so bold,
With branches and paths that it will hold,
From git repos deep to folders plain,
The graph grows sharp with data's rain.
FastMCP rings the registration bell,
And all the tests run oh so well! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 76.19% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(mcp): index_repo tool (T4 #652)' clearly and specifically describes the main change: implementing the index_repo MCP tool.
Linked Issues check ✅ Passed The PR fully implements all coding objectives from #652: index_repo tool with correct signature, branch auto-detection, Project wrapping via executor, tool registration, and comprehensive test coverage including unit, integration, and protocol round-trip tests.
Out of Scope Changes check ✅ Passed All changes are scoped to #652 requirements: new MCP tool registration, index_repo implementation with required parameters, test fixtures, and necessary supporting files (tools/init.py, conftest.py). No unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dvirdukhan/mcp-t4-index-repo

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread api/mcp/server.py

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the first real MCP tool (index_repo) to let MCP clients index a repository/directory into FalkorDB and query subsequent tools against a branch-scoped graph, and wires tool registration into the MCP server startup.

Changes:

  • Introduces api/mcp/tools/structural.py with an index_repo(path_or_url, branch, incremental, ignore) FastMCP tool and supporting helpers.
  • Registers MCP tools on server import (api/mcp/server.py) via the new api/mcp/tools package.
  • Updates MCP scaffold smoke test expectations and adds initial index_repo tests.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
api/mcp/tools/structural.py Implements and registers the index_repo MCP tool and response payload helpers.
api/mcp/tools/__init__.py Imports tool modules to trigger registration on import.
api/mcp/server.py Imports the tools package after app creation so stdio startup exposes registered tools.
tests/mcp/test_scaffold.py Updates scaffold handshake test to expect at least index_repo in list_tools().
tests/mcp/test_index_repo.py Adds tests for index_repo shape/errors/registration/JSON-serializability.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread api/mcp/tools/structural.py
Comment thread api/mcp/tools/structural.py
Comment thread api/mcp/tools/structural.py
Comment thread api/mcp/tools/structural.py Outdated
Comment thread tests/mcp/test_index_repo.py Outdated
Comment thread tests/mcp/test_index_repo.py
Comment thread api/mcp/tools/structural.py Outdated
Comment thread api/mcp/tools/structural.py
DvirDukhan and others added 2 commits June 8, 2026 13:52
- num_nodes: count nodes generically (MATCH (n)) instead of summing only
  File+Class+Function, so non-Python repos (Method/Interface/Enum/
  Constructor nodes) aren't under-reported; symmetric with _count_edges
  (galshubeli, Copilot).
- Reject non-http(s) git URLs (git@.../ssh://...) with an actionable error
  instead of letting Project.from_git_repository raise a confusing
  "invalid url" from validators.url(); align the docstring/description to
  advertise http(s) URLs only. Silently rewriting to https would change
  auth semantics for private repos (Copilot).
- Local git checkout with no configured remote: fall back to
  Project(url=None) instead of crashing on remotes[0] IndexError, so
  remote-less repos can still be indexed (galshubeli, Copilot).
- Document that `incremental` is accepted but not yet honored (full
  reindex always); mode is always "full" until T18 (Copilot).
- Skip DB-backed index_repo tests when FalkorDB is unreachable via a new
  require_falkordb fixture, keeping the unit subset runnable in dev/CI
  without the service (Copilot).
- Add tests for the SSH-URL rejection and the no-remote git path (the
  .git branch previously had no coverage).
- Fix a pre-existing assertion referencing expected_contract["counts_min"]
  (the fixture key is "counts") so the local-path test passes when
  FalkorDB is reachable.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
api/mcp/tools/structural.py (1)

50-54: 💤 Low value

Broad exception handling is acceptable for metadata helpers.

Ruff flags the generic Exception catches (BLE001) at lines 53, 67, and 82. These are in best-effort metadata collection helpers that return safe defaults (0 or []) on failure, ensuring a query error doesn't crash the tool. More specific exception types (e.g., FalkorDB-specific errors) would be marginally better but require dependency on the graph client's exception hierarchy. Given the defensive intent and low risk, the current broad catches are acceptable.

Also applies to: 63-69, 78-83

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/mcp/tools/structural.py` around lines 50 - 54, The generic Exception
catches around the metadata helper queries (the try/except blocks that call
graph.g.query and return safe defaults like int(rows[0][0]) or []) are
intentionally broad; update those except clauses by adding a short comment
explaining the defensive intent and suppressing the Ruff BLE001 warning for
these specific handlers (e.g., add an inline noqa or pragma referencing BLE001)
so tooling stops flagging them while keeping the existing safe-default behavior
for functions that query the graph (the blocks using graph.g.query and returning
0 or []).

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@api/mcp/tools/structural.py`:
- Around line 50-54: The generic Exception catches around the metadata helper
queries (the try/except blocks that call graph.g.query and return safe defaults
like int(rows[0][0]) or []) are intentionally broad; update those except clauses
by adding a short comment explaining the defensive intent and suppressing the
Ruff BLE001 warning for these specific handlers (e.g., add an inline noqa or
pragma referencing BLE001) so tooling stops flagging them while keeping the
existing safe-default behavior for functions that query the graph (the blocks
using graph.g.query and returning 0 or []).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dbafd9e0-6d40-429a-bb27-064808c2b0fd

📥 Commits

Reviewing files that changed from the base of the PR and between c4cc07c and cc841b6.

📒 Files selected for processing (6)
  • api/mcp/server.py
  • api/mcp/tools/__init__.py
  • api/mcp/tools/structural.py
  • tests/mcp/conftest.py
  • tests/mcp/test_index_repo.py
  • tests/mcp/test_scaffold.py

@galshubeli galshubeli merged commit cabe9a1 into staging Jun 8, 2026
11 of 13 checks passed
@galshubeli galshubeli deleted the dvirdukhan/mcp-t4-index-repo branch June 8, 2026 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[MCP T4] index_repo MCP tool

3 participants