Skip to content

falkordblite mcp latest#706

Merged
DvirDukhan merged 8 commits into
stagingfrom
naseem/falkordblite-mcp-latest
Jun 9, 2026
Merged

falkordblite mcp latest#706
DvirDukhan merged 8 commits into
stagingfrom
naseem/falkordblite-mcp-latest

Conversation

@Naseem77

@Naseem77 Naseem77 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

Release Notes

  • New Features

    • Added optional embedded database backend for lighter deployments without external server dependencies.
    • Enhanced code search with free-text hybrid ranking for better relevance.
    • Added new symbol discovery and code navigation tools for structural exploration.
    • Added context-size benchmark tool for performance evaluation.
  • Documentation

    • Updated setup instructions with embedded database configuration and TCP exposure options.
    • Expanded environment variable documentation for new database backend settings.
  • Refactor

    • Centralized database connection logic for improved maintainability.
  • Tests

    • Added comprehensive test coverage for new code navigation tools.

DvirDukhan and others added 8 commits June 8, 2026 17:50
…bors

Nav baseline rebased onto the #681 (MCP tools) + #698 (analyzer IMPORTS/OVERRIDES
edges) confluence, dropping the duplicate analyzer-edges copy (722c8a3, byte-identical
to #698's ce4ecd9) and the T13/T14 onboarding-template edits (deferred).

Adds get_neighbors (symbol-id neighbor expansion over CALLS/IMPORTS/EXTENDS/OVERRIDES),
get_file_neighbors (file-level structural coupling), get_importers/get_overrides spikes,
and a hybrid (BM25 + structural) ranking for search_code. Cherry-pick of 799b218.

PR-split note: get_file_neighbors + _resolve_file + FILE_NEIGHBOR_RELS form a
self-contained trailing block (PR N2); everything else is PR N1.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
structural.py:
- search_code: relevance floor — drop files with zero raw lexical signal
  (name-exact + path-overlap + raw BM25) so a nonsense query returns [] instead
  of centrality-ranked noise.
- _hybrid_components: deterministic representative-symbol selection via a stable
  sort key (exact query-id match, then src_start, name, src_end) — no longer
  depends on FalkorDB row order; fixes nondeterministic snippet/name.
- find_path: keep the node label (a path is bare nodes with no per-hop relation,
  so the label is the only type signal); restores the edge-leak guard.
- get_file_neighbors: include file_id in both early-return shapes for a
  consistent schema.

tests:
- _find_id resolves a symbol id directly from the graph (search_code is
  file-oriented and no longer returns per-symbol ids) in test_query_tools.py and
  test_impact_analysis.py, with a uniqueness assertion.
- search_code tests use the file-oriented query= API.

53/53 mcp tests pass against live FalkorDB; ruff clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Bridges human-readable names to the integer symbol_id that get_neighbors,
impact_analysis and find_path require (those take an id, not a name; search_code
returns files). Accepts a simple name or dotted qualname (last segment used);
optional file arg flags/orders in-file matches first so a wrong file filter
degrades visibly rather than silently routing to an arbitrary same-named symbol.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Benchmarking showed the MCP ask tool failing on every call (the spawned server
env carries only FalkorDB coordinates, no LLM key) and, when keyed, returning
File-level fuzzy matches rather than the structural answers the nav workflow
needs — an LLM round-trip per call for no signal. Expose only the deterministic
structural tools; GraphRAG ask stays on the HTTP /api/chat path. #681's ask
implementation is left intact (re-enable by re-adding the import).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Validate/coerce limit: reject non-integers, accept stringified ints, and
  clamp to 1..FIND_SYMBOL_DB_LIMIT so a negative value no longer flips
  rows[:limit] into a surprising tail slice (_clamp_find_symbol_limit).
- Reject empty symbol names up front (empty, whitespace, or a trailing-dot
  qualname) instead of silently querying for "".
- Always include file_match in the response (False when no file filter is
  requested) so the documented shape is stable, and sort deterministically
  (in-file matches first, then file/line/name/id) so ordering no longer
  depends on FalkorDB row order.
- Add tests/mcp/test_find_symbol.py covering simple + dotted-name resolution,
  file disambiguation + ordering, response-shape stability, deterministic
  order, and limit/empty-name validation.

70/70 mcp tests pass against live FalkorDB; ruff clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces FalkorDB Lite embedded backend support and expands MCP structural tools. It centralizes database connection logic via factory functions in a new api/db.py module, then refactors all connection creation across the codebase to use these factories. The MCP tools layer gains hybrid search ranking for code queries, three new tools (symbol resolution, neighbor traversal, file coupling), and comprehensive test coverage.

Changes

FalkorDB Lite Backend & Connection Abstraction

Layer / File(s) Summary
Backend abstraction module and configuration
api/db.py, pyproject.toml, .env.template
New api/db.py exports factory functions (create_falkordb(), create_async_falkordb(), create_redis_connection(), create_async_redis_connection(), graphrag_connection_kwargs()) that select between embedded FalkorDBLite and remote FalkorDB based on CODE_GRAPH_DB_BACKEND. Adds optional light dependency group for falkordblite>=0.10.0. Environment template documents new backend selector and lite-specific configuration.
CLI and setup documentation
api/cli.py, .env.template, README.md
ensure-db CLI now detects and initializes lite backend, emitting JSON success/error payloads. .env.template adds CODE_GRAPH_DB_BACKEND, FALKORDB_LITE_PATH, and optional FALKORDB_LITE_HOST/FALKORDB_LITE_PORT. README.md documents Option C setup with uv sync --extra light and environment variable guidance.
Connection factory adoption
api/git_utils/git_graph.py, api/graph.py, api/info.py, api/llm.py, api/migrations/per_branch.py
Refactored all database connection instantiation to use factory functions instead of inline environment-based construction, removing duplicate connection logic across modules. Sync and async paths now consistently route through centralized factories.
Graph module cleanup
api/graph.py
Parameter Graph.get_sub_graph(l) renamed to limit and wired into Cypher query; set_file_coverage no longer stores unused _query return value; AsyncGraphQuery.close() ensured to await async cleanup.

MCP Structural Tools Expansion

Layer / File(s) Summary
Hybrid search ranking infrastructure
api/mcp/tools/structural.py
Introduces _hybrid_rank() combining exact identifier matches, BM25 scoring over tokenized text, path token overlap, call-graph centrality, and path penalties (tests/vendor/legacy dirs). Adds _relativize() for path normalization and _read_snippet() for bounded file-line extraction; extends _node_summary() to support relativized paths, optional labels, and lazy snippet payloads.
New structural tools
api/mcp/tools/structural.py
Adds find_symbol(name, project, file, branch, limit) for Function/Class name resolution with file-scoped matching; get_neighbors(symbol_id, project, relation, direction, branch, limit) for unified bidirectional relation traversal; get_file_neighbors(file, project, branch, limit) for discovering structurally coupled files via aggregated 1-hop relations. All return snippets and maintain deterministic ordering.
Refactored code search
api/mcp/tools/structural.py
Replaces prefix-based search_code(prefix) with free-text search_code(query) using hybrid ranking; returns files with scores and lazy snippets. Updates impact_analysis and find_path to use enhanced _node_summary() with relativized paths and optional labels. Consolidates per-edge tools into unified get_neighbors.
MCP tool surface clarification
api/mcp/tools/__init__.py
Documents that GraphRAG-backed ask tool is excluded from MCP surface (available only on HTTP /api/chat); only deterministic structural tools are exposed via MCP protocol.
find_symbol test suite
tests/mcp/test_find_symbol.py
Comprehensive tests for input validation (empty/whitespace/dotted-name rejection, limit clamping/coercion), symbol resolution, dotted qualname equivalence, file_match flag behavior (in-file vs. global candidates), deterministic ordering, and MCP server registration.
Query tools test suite
tests/mcp/test_query_tools.py
Tests refactored search_code as free-text query with file-level hit matching, JSON serializability, and repo-relative path validation. Verifies get_neighbors equivalence to legacy per-edge tools. Tests get_file_neighbors cross-file dependency discovery, path vs. file-id consistency, and empty results. Updates _find_id helper to query FalkorDB directly.
Impact analysis test update
tests/mcp/test_impact_analysis.py
Refactors _find_id test helper to resolve symbol IDs via direct FalkorDB Cypher query instead of search_code, ensuring uniqueness and proper resource cleanup.

MCP Context Benchmark

Layer / File(s) Summary
MCP context benchmark
benchmarks/mcp_context_benchmark.py
New deterministic benchmark comparing MCP-style CodeGraph context (via find_symbol, get_neighbors, get_file_neighbors) vs. plain source-file context. Includes fixed test targets, token estimation, per-question deltas, and JSON summary reporting. CLI accepts --project, --branch, --root.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • FalkorDB/code-graph#677: Adds the shared MCP sample_project fixture and test infrastructure (conftest.py, expected.yaml) that the structural tool tests in this PR depend on.
  • FalkorDB/code-graph#679: Modifies search_code tool in api/mcp/tools/structural.py with a different prefix-search approach; this PR's hybrid free-text implementation directly overlaps.
  • FalkorDB/code-graph#675: Updates api/graph.py Graph initialization with per-branch naming; this PR refactors the same initialization to use create_falkordb() factories.

Suggested reviewers

  • gkorland
  • galshubeli

🐰 Hybrid ranking hops through the code graph,
Lite backends nestle in local caches,
Symbols resolve with deterministic grace,
While neighbors gather 'round the structured place—
A framework built for swift and clear embrace.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'falkordblite mcp latest' is vague and lacks specificity about the actual changes made in the pull request. Provide a more descriptive title that clearly summarizes the main changes, such as 'Add FalkorDBLite embedded database backend support with MCP tools' or 'Integrate FalkorDBLite embedded backend and MCP structural query tools'.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch naseem/falkordblite-mcp-latest

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
api/git_utils/git_graph.py (1)

197-198: ⚠️ Potential issue | 🔴 Critical

Check AsyncGitGraph.close()—callers still invoke it.

AsyncGitGraph is instantiated in api/index.py and tests/index.py, and both paths explicitly do finally: await git_graph.close(). Since those direct calls still exist, removing AsyncGitGraph.close() would break callers unless the method wasn’t actually removed (or the call sites were updated accordingly).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/git_utils/git_graph.py` around lines 197 - 198, AsyncGitGraph.close() was
removed/changed but callers in api/index.py and tests/index.py still call
finally: await git_graph.close(); restore compatibility by re-adding an async
close(self) -> None method on class AsyncGitGraph that simply awaits the
underlying resource shutdown (e.g., await self.db.aclose()) or by providing an
equivalent async wrapper method with the same name so existing call sites
continue to work; ensure the method signature remains async def close(self) ->
None and references self.db.aclose() to properly close resources.
🧹 Nitpick comments (5)
api/mcp/tools/structural.py (1)

133-133: 💤 Low value

Consider adding strict=True to zip() for safety.

Ruff flags this zip() call without an explicit strict= parameter. Since docs is built directly from files on line 121, the lengths always match, but making this explicit prevents silent bugs if the construction ever changes.

🔒 Proposed fix
-    for f, d in zip(files, docs):
+    for f, d in zip(files, docs, strict=True):
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/mcp/tools/structural.py` at line 133, The zip iteration over files and
docs should be made explicit about length expectations to prevent silent
mismatches: update the loop using zip(files, docs) (the for loop that iterates
variables f and d) to call zip with strict=True so mismatched lengths raise an
error; locate the loop in structural.py where files and docs are paired and add
the strict=True argument to the zip() call.

Source: Linters/SAST tools

benchmarks/mcp_context_benchmark.py (3)

30-32: ⚡ Quick win

Move logging configuration after imports to avoid masking import-time warnings.

Calling logging.disable(logging.CRITICAL) before importing api.mcp.tools.structural will suppress any warnings or errors that might occur during module initialization. This makes debugging harder if the MCP tools fail to import correctly.

♻️ Suggested refactor
 from __future__ import annotations
 
 import argparse
 import asyncio
 import json
 import logging
 from dataclasses import dataclass
 from pathlib import Path
 from typing import Any
 
-logging.disable(logging.CRITICAL)
-
-from api.mcp.tools.structural import find_symbol, get_file_neighbors, get_neighbors  # noqa: E402
+from api.mcp.tools.structural import find_symbol, get_file_neighbors, get_neighbors

Then in main(), add logging.disable(logging.CRITICAL) at the beginning:

def main() -> None:
    logging.disable(logging.CRITICAL)
    parser = argparse.ArgumentParser(description=__doc__)
    # ... rest of main
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmarks/mcp_context_benchmark.py` around lines 30 - 32, Move the
logging.disable(logging.CRITICAL) call so it does not run before imports: remove
the top-level logging.disable from the module and instead call
logging.disable(logging.CRITICAL) at the start of main() (the function that
parses args and runs the benchmark). This ensures imports like
api.mcp.tools.structural (and functions find_symbol, get_file_neighbors,
get_neighbors) can emit import-time warnings/errors while still silencing logs
during actual benchmark runtime.

109-109: 💤 Low value

Consider defensive dictionary access for symbol_id.

The code assumes symbols[0] contains a "symbol_id" key. While the internal MCP tool contract likely guarantees this, defensive access with .get() or a try/except block would make the benchmark more robust against API changes.

🛡️ Optional defensive access
-    symbol_id = symbols[0]["symbol_id"]
+    symbol_id = symbols[0].get("symbol_id")
+    if not symbol_id:
+        return payload
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmarks/mcp_context_benchmark.py` at line 109, The line setting symbol_id
from symbols[0]["symbol_id"] should be made defensive: verify symbols is a
non-empty list and use dict.get or a try/except to read "symbol_id" from
symbols[0] (e.g., check if symbols and symbols[0].get("symbol_id") is not None),
and if missing either raise a clear ValueError or handle the case (skip/abort)
so symbol_id is never accessed from a missing key; update the symbol_id
assignment and any downstream uses to rely on this validated value (symbols,
symbol_id).

143-185: ⚖️ Poor tradeoff

Consider adding error handling for individual question failures.

The benchmark will stop and print no output if any graph_context() or read_plain_context() call raises an exception. Wrapping the per-question logic in a try/except block would allow the benchmark to continue and report partial results, improving the user experience when some targets fail.

Optional error handling
     for question in QUESTIONS:
-        if question.target not in graph_contexts:
-            graph_contexts[question.target] = compact_json(
-                await graph_context(question, project, branch)
-            )
-        graph_text = graph_contexts[question.target]
-        plain_text = read_plain_context(root, question.target.file)
-        graph_tokens = estimate_tokens(graph_text)
-        plain_tokens = estimate_tokens(plain_text)
-        graph_total += graph_tokens
-        plain_total += plain_tokens
-        rows.append(
-            {
-                "question": question.text,
-                "symbol": question.target.symbol,
-                "file": question.target.file,
-                "graph_tokens": graph_tokens,
-                "plain_tokens": plain_tokens,
-                "delta_tokens": plain_tokens - graph_tokens,
-                "reduction_pct": round((1 - graph_tokens / plain_tokens) * 100, 1)
-                if plain_tokens
-                else 0,
-            }
-        )
+        try:
+            if question.target not in graph_contexts:
+                graph_contexts[question.target] = compact_json(
+                    await graph_context(question, project, branch)
+                )
+            graph_text = graph_contexts[question.target]
+            plain_text = read_plain_context(root, question.target.file)
+            graph_tokens = estimate_tokens(graph_text)
+            plain_tokens = estimate_tokens(plain_text)
+            graph_total += graph_tokens
+            plain_total += plain_tokens
+            rows.append(
+                {
+                    "question": question.text,
+                    "symbol": question.target.symbol,
+                    "file": question.target.file,
+                    "graph_tokens": graph_tokens,
+                    "plain_tokens": plain_tokens,
+                    "delta_tokens": plain_tokens - graph_tokens,
+                    "reduction_pct": round((1 - graph_tokens / plain_tokens) * 100, 1)
+                    if plain_tokens
+                    else 0,
+                }
+            )
+        except Exception as e:
+            print(f"Warning: Skipped question due to error: {e}", file=sys.stderr)
+            continue
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmarks/mcp_context_benchmark.py` around lines 143 - 185, The per-question
loop in run() can abort the whole benchmark if graph_context() or
read_plain_context() raises; wrap the body that computes graph_text/plain_text,
token estimates and rows.append in a try/except around each question (using
question from QUESTIONS and the graph_contexts cache) and on exception log or
record the error (e.g., include an entry in rows with question/target and an
error field or a skipped flag) then continue; ensure graph_total/plain_total and
question counts used in the summary reflect only successfully processed items or
include skipped counts so the final summary still prints partial results even if
some targets fail.
api/db.py (1)

82-85: Downgrade teardown risk: close() is async in the pinned lite dependency, but keep a safe fallback wrapper anyway.

The current create_async_falkordb() fallback client.aclose = client.close runs only when aclose is missing; for the repo’s light extra (falkordblite>=0.10.0,<1.0.0), upstream AsyncFalkorDB.close()/aclose() are both async coroutines, so await db.aclose() should work as-is. Still, the fallback is brittle across versions—wrap close() to await any returned awaitable (as in the proposed diff) to make lite teardown resilient if a particular build omits aclose or implements close() differently.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/db.py` around lines 82 - 85, In create_async_falkordb(), replace the
brittle direct assignment client.aclose = client.close with a wrapper that makes
client.aclose an async function which calls client.close() and awaits the result
if it's awaitable; this ensures await db.aclose() works whether close() is sync,
returns an awaitable, or is already async—locate the create_async_falkordb
function and update the fallback logic where client.aclose is set.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@api/cli.py`:
- Line 71: The broad catch `except Exception as e:` in the CLI startup should
either be narrowed to the specific exceptions you expect (e.g., SystemExit,
KeyboardInterrupt, ValueError, or startup-related errors) inside the CLI
entrypoint (where `except Exception as e:` is used) or, if you intentionally
need a catch-all to guarantee JSON output, annotate that line with `# noqa:
BLE001` and add a one-sentence rationale comment explaining why the broad catch
is required (e.g., "Ensure any startup error is serialized to JSON for the CLI
contract"). Update the handler to re-raise critical exceptions when appropriate
or convert them to the JSON error response and ensure the comment references the
decision.

In `@benchmarks/mcp_context_benchmark.py`:
- Around line 90-94: The function read_plain_context currently may call
path.read_text() on a non-existent file and has redundant path logic; update
read_plain_context to (1) resolve path deterministically: choose root.parent /
file_hint if file_hint startswith f"{root.name}/" else root / file_hint, (2) if
that path does not exist try fallback root.parent / file_hint only if it
differs, and (3) if the final path still does not exist handle the error by
raising a clear FileNotFoundError (or return a controlled empty string) that
includes the resolved path, root, and file_hint so callers can handle the
missing file gracefully; refer to read_plain_context, path, root, and file_hint
when making changes.

---

Outside diff comments:
In `@api/git_utils/git_graph.py`:
- Around line 197-198: AsyncGitGraph.close() was removed/changed but callers in
api/index.py and tests/index.py still call finally: await git_graph.close();
restore compatibility by re-adding an async close(self) -> None method on class
AsyncGitGraph that simply awaits the underlying resource shutdown (e.g., await
self.db.aclose()) or by providing an equivalent async wrapper method with the
same name so existing call sites continue to work; ensure the method signature
remains async def close(self) -> None and references self.db.aclose() to
properly close resources.

---

Nitpick comments:
In `@api/db.py`:
- Around line 82-85: In create_async_falkordb(), replace the brittle direct
assignment client.aclose = client.close with a wrapper that makes client.aclose
an async function which calls client.close() and awaits the result if it's
awaitable; this ensures await db.aclose() works whether close() is sync, returns
an awaitable, or is already async—locate the create_async_falkordb function and
update the fallback logic where client.aclose is set.

In `@api/mcp/tools/structural.py`:
- Line 133: The zip iteration over files and docs should be made explicit about
length expectations to prevent silent mismatches: update the loop using
zip(files, docs) (the for loop that iterates variables f and d) to call zip with
strict=True so mismatched lengths raise an error; locate the loop in
structural.py where files and docs are paired and add the strict=True argument
to the zip() call.

In `@benchmarks/mcp_context_benchmark.py`:
- Around line 30-32: Move the logging.disable(logging.CRITICAL) call so it does
not run before imports: remove the top-level logging.disable from the module and
instead call logging.disable(logging.CRITICAL) at the start of main() (the
function that parses args and runs the benchmark). This ensures imports like
api.mcp.tools.structural (and functions find_symbol, get_file_neighbors,
get_neighbors) can emit import-time warnings/errors while still silencing logs
during actual benchmark runtime.
- Line 109: The line setting symbol_id from symbols[0]["symbol_id"] should be
made defensive: verify symbols is a non-empty list and use dict.get or a
try/except to read "symbol_id" from symbols[0] (e.g., check if symbols and
symbols[0].get("symbol_id") is not None), and if missing either raise a clear
ValueError or handle the case (skip/abort) so symbol_id is never accessed from a
missing key; update the symbol_id assignment and any downstream uses to rely on
this validated value (symbols, symbol_id).
- Around line 143-185: The per-question loop in run() can abort the whole
benchmark if graph_context() or read_plain_context() raises; wrap the body that
computes graph_text/plain_text, token estimates and rows.append in a try/except
around each question (using question from QUESTIONS and the graph_contexts
cache) and on exception log or record the error (e.g., include an entry in rows
with question/target and an error field or a skipped flag) then continue; ensure
graph_total/plain_total and question counts used in the summary reflect only
successfully processed items or include skipped counts so the final summary
still prints partial results even if some targets fail.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f236285a-56e6-440c-b729-9439d51eb095

📥 Commits

Reviewing files that changed from the base of the PR and between 126c672 and 8ec9744.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (16)
  • .env.template
  • README.md
  • api/cli.py
  • api/db.py
  • api/git_utils/git_graph.py
  • api/graph.py
  • api/info.py
  • api/llm.py
  • api/mcp/tools/__init__.py
  • api/mcp/tools/structural.py
  • api/migrations/per_branch.py
  • benchmarks/mcp_context_benchmark.py
  • pyproject.toml
  • tests/mcp/test_find_symbol.py
  • tests/mcp/test_impact_analysis.py
  • tests/mcp/test_query_tools.py

Comment thread api/cli.py
try:
db = create_falkordb()
db.connection.ping()
except Exception as e:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Handle BLE001 explicitly for the intentional catch-all path.

If this broad catch is intentional to guarantee JSON output, annotate it (# noqa: BLE001) with a short rationale; otherwise narrow to expected startup exceptions.

As per coding guidelines, "Run Ruff linter for Python code to check style and quality."

🧰 Tools
🪛 Ruff (0.15.15)

[warning] 71-71: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@api/cli.py` at line 71, The broad catch `except Exception as e:` in the CLI
startup should either be narrowed to the specific exceptions you expect (e.g.,
SystemExit, KeyboardInterrupt, ValueError, or startup-related errors) inside the
CLI entrypoint (where `except Exception as e:` is used) or, if you intentionally
need a catch-all to guarantee JSON output, annotate that line with `# noqa:
BLE001` and add a one-sentence rationale comment explaining why the broad catch
is required (e.g., "Ensure any startup error is serialized to JSON for the CLI
contract"). Update the handler to re-raise critical exceptions when appropriate
or convert them to the JSON error response and ensure the comment references the
decision.

Sources: Coding guidelines, Linters/SAST tools

Comment on lines +90 to +94
def read_plain_context(root: Path, file_hint: str) -> str:
path = root.parent / file_hint if file_hint.startswith(f"{root.name}/") else root / file_hint
if not path.exists():
path = root.parent / file_hint
return path.read_text(errors="replace")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add error handling for missing files and simplify path resolution logic.

If the final path does not exist, path.read_text() will raise an unhandled FileNotFoundError, causing the benchmark to crash. Additionally, the path resolution logic with the fallback on lines 92-93 is complex and difficult to reason about—the fallback root.parent / file_hint might produce the same path as the first branch of the conditional.

🛡️ Proposed fix with error handling and simplified logic
 def read_plain_context(root: Path, file_hint: str) -> str:
-    path = root.parent / file_hint if file_hint.startswith(f"{root.name}/") else root / file_hint
-    if not path.exists():
-        path = root.parent / file_hint
-    return path.read_text(errors="replace")
+    """Read plain file content, trying multiple path resolutions."""
+    candidates = [
+        root.parent / file_hint,
+        root / file_hint,
+    ]
+    for path in candidates:
+        if path.exists():
+            return path.read_text(errors="replace")
+    # Return empty string or raise a clearer error
+    return f"# File not found: {file_hint}\n"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmarks/mcp_context_benchmark.py` around lines 90 - 94, The function
read_plain_context currently may call path.read_text() on a non-existent file
and has redundant path logic; update read_plain_context to (1) resolve path
deterministically: choose root.parent / file_hint if file_hint startswith
f"{root.name}/" else root / file_hint, (2) if that path does not exist try
fallback root.parent / file_hint only if it differs, and (3) if the final path
still does not exist handle the error by raising a clear FileNotFoundError (or
return a controlled empty string) that includes the resolved path, root, and
file_hint so callers can handle the missing file gracefully; refer to
read_plain_context, path, root, and file_hint when making changes.

@DvirDukhan DvirDukhan merged commit c9d6948 into staging Jun 9, 2026
14 of 16 checks passed
@DvirDukhan DvirDukhan deleted the naseem/falkordblite-mcp-latest branch June 9, 2026 09:50
@DvirDukhan DvirDukhan restored the naseem/falkordblite-mcp-latest branch June 9, 2026 12:41
@DvirDukhan DvirDukhan requested a review from Copilot June 9, 2026 12:42

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an optional embedded FalkorDBLite backend for lightweight deployments, and significantly expands/reshapes the MCP “structural” navigation surface (file-oriented search, symbol resolution, unified neighbor traversal, and file-level coupling), with updated docs, tests, and a benchmarking script.

Changes:

  • Add optional light extra (falkordblite) and centralize DB/Redis connection creation behind a new backend-selection module.
  • Refactor/extend MCP structural tools: search_code becomes free-text + hybrid-ranked and file-oriented; add find_symbol, get_neighbors, and get_file_neighbors; add path relativization + snippets.
  • Update README/.env template plus expand MCP test coverage and add a deterministic context-size benchmark.

Reviewed changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
uv.lock Locks the new optional falkordblite dependency and its transitive deps.
pyproject.toml Adds light optional extra for FalkorDBLite.
api/db.py New backend-selection + connection factory layer (FalkorDB vs FalkorDBLite, sync/async, Redis compat, GraphRAG host/port).
api/graph.py Switches graph connections to centralized factories; minor cleanup/rename.
api/info.py Routes Redis metadata operations through centralized connection factories.
api/llm.py Uses centralized GraphRAG connection kwargs (supports Lite with TCP exposure requirement).
api/git_utils/git_graph.py Switches git-graph DB connections to centralized factories.
api/migrations/per_branch.py Centralizes migration DB connection creation via create_falkordb().
api/cli.py Adds Lite-aware ensure-db behavior (initializes embedded backend).
api/mcp/tools/structural.py Major MCP tool changes: hybrid free-text file ranking, new tools, snippets, relativized paths, unified neighbors, file-level coupling.
api/mcp/tools/__init__.py Clarifies MCP surface intentionally excludes GraphRAG-backed ask.
tests/mcp/test_query_tools.py Updates tests for new search_code contract; adds coverage for _relativize, get_neighbors, get_file_neighbors.
tests/mcp/test_impact_analysis.py Updates symbol-id resolution to graph query since search_code is file-oriented.
tests/mcp/test_find_symbol.py New test suite for find_symbol tool behavior/ordering/validation.
benchmarks/mcp_context_benchmark.py New deterministic benchmark comparing “graph context” vs “plain file context”.
README.md Documents Lite backend setup and new env vars.
.env.template Adds Lite backend configuration variables and comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +879 to +903
rels = [relation] if isinstance(relation, str) else list(relation)
direction = (direction or "OUT").upper()
if direction == "BOTH":
dirs = ["OUT", "IN"]
elif direction in ("IN", "OUT"):
dirs = [direction]
else:
raise ValueError(
f"direction must be 'IN', 'OUT', or 'BOTH', got: {direction!r}"
)

seen: set[Any] = set()
out: list[dict[str, Any]] = []
for d in dirs:
for rel in rels:
rows = await _neighbors_payload(project, branch, symbol_id, rel, d, limit)
for row in rows:
key = (row.get("id"), row.get("relation"), row.get("direction"))
if key in seen:
continue
seen.add(key)
out.append(row)
if len(out) >= limit:
return out
return out
Comment on lines +189 to +203
files_res = await g._query("MATCH (f:File) RETURN f.path, ID(f)")
for row in files_res.result_set:
ap = row[0]
if not ap:
continue
rp = rel(ap)
abs_of[rp] = ap
file_id_of[rp] = row[1]
pt = _tokenize(rp.replace("/", " "))
pathtok[rp] = pt
bodytok[rp].extend(pt)

sym_res = await g._query(
"MATCH (n) WHERE n:Function OR n:Class "
"RETURN n.name, n.path, n.doc, n.src_start, n.src_end"
Comment thread api/db.py
Comment on lines +113 to +118
def create_async_redis_connection() -> Any:
"""Create an async Redis-compatible connection for metadata operations."""
if is_lite_backend():
return create_async_falkordb().connection

import redis.asyncio as aioredis
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants