bench(parity): cg HTTP and cg-mcp share the same 8-verb surface by DvirDukhan · Pull Request #696 · FalkorDB/code-graph

DvirDukhan · 2026-05-29T18:03:51Z

Prerequisites (merge order)

Merge in order — this PR is stacked on:

bench: 3-config SWE-bench harness (baseline/lsp/code_graph_mcp) #693 — 4-config SWE-bench harness
feat(api): /api/v2/* MCP-parity endpoints #695 — /api/v2/* MCP-parity endpoints

Base: #695.

Runtime prerequisite: exercises the shared 8-verb surface, so the MCP nav product stack (#701/#702) must be deployed/indexed to run the parity arm.

Summary

Pairs with #api-v2 (the /api/v2/* MCP-parity endpoints). With those endpoints in place, the SWE-bench harness can now run the HTTP-transport sibling (cg) on the same verb surface as the stdio-MCP sibling (cg-mcp), so a head-to-head benchmark measures transport overhead rather than API-surface differences.

Changes

bench/agents/code_graph_adapter.py — add v2 client methods on CodeGraphClient that POST to the new /api/v2/* endpoints (search_code, get_callers, get_callees, get_dependencies, impact_analysis, find_path_v2, ask_v2). Existing UI-shaped methods kept for back-compat with tests/test_cli.py.
bench/cli/cg.py — rewrite to expose the 8 MCP-style verbs (index_repo, search_code, get_callers, get_callees, get_dependencies, impact_analysis, find_path, ask) alongside the legacy UI verbs. Mirrors cg_mcp.py's _compact_list / _strip_worktree_prefix helpers so token compaction is byte-identical between transports.
bench/runners/mini_runner.py — INSTANCE_TEMPLATE_CODE_GRAPH now documents the new verb surface. The cg track exports PROJECT_NAME + BRANCH like the MCP track, and indexes via /api/analyze_folder with explicit branch=_default so both tracks share the code:<project>:<branch> graph namespace.
bench/tools/code_graph/system_preamble.md — rewritten to mirror bench/tools/code_graph_mcp/system_preamble.md verb-for-verb.

Validation

Parity verified byte-for-byte on a pre-indexed pytest-6202 graph: cg search_code/get_callers/get_callees/impact_analysis returns identical output to the cg-mcp equivalents (1 KB payload diff'd). All 27 existing bench + CLI tests still pass.

Stacked

Base: dvirdukhan/api-v2-mcp-parity (needs the v2 endpoints).

Pairs with #api-v2 (api/v2/* MCP-parity endpoints). With those endpoints in place, the bench harness can now run the HTTP-transport sibling (cg) on the same verb surface as the stdio-MCP sibling (cg-mcp), so a head-to-head benchmark measures *transport overhead* rather than API-surface differences. Changes: * bench/agents/code_graph_adapter.py — add v2 client methods on CodeGraphClient that POST to the new /api/v2/* endpoints (search_code, get_callers, get_callees, get_dependencies, impact_analysis, find_path_v2, ask_v2). Existing UI-shaped methods (graph_entities, get_neighbors, find_paths, ...) kept for back-compat with tests/test_cli.py. * bench/cli/cg.py — rewrite to expose the 8 MCP-style verbs (index_repo, search_code, get_callers, get_callees, get_dependencies, impact_analysis, find_path, ask) alongside the legacy UI verbs. Mirrors cg_mcp.py's _compact_list / _strip_worktree_prefix helpers so token compaction is byte-identical between transports. * bench/runners/mini_runner.py — INSTANCE_TEMPLATE_CODE_GRAPH now documents the new verb surface. The cg track exports PROJECT_NAME + BRANCH like the MCP track, and indexes via /api/analyze_folder with explicit branch=_default so both tracks share the code:<project>:<branch> graph namespace. * bench/tools/code_graph/system_preamble.md — rewritten to mirror bench/tools/code_graph_mcp/system_preamble.md verb-for-verb. Parity verified byte-for-byte on a pre-indexed pytest-6202 graph: cg search_code/get_callers/get_callees/impact_analysis returns identical output to the cg-mcp equivalents (1 KB payload diff'd). All 27 existing bench + CLI tests still pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

coderabbitai · 2026-05-29T18:04:00Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d1882f54-58fb-48bc-84a5-52a1e5cef952

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dvirdukhan/bench-mcp-parity

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Iter3 root-cause: with the verb surfaces and tool outputs now byte-identical between the HTTP (cg) and MCP (cg-mcp) tracks, the remaining token gap traced entirely to reading strategy. On 2/10 instances the agent fell into a 19x full-file `cat` loop instead of reading the bounded span the graph already pointed at, inflating input tokens 3-4x on those instances. Both preambles now explicitly forbid `cat`-ing a whole source file and require `sed -n 'START,ENDp'` anchored on the graph's line number. This attacks the actual token driver and applies equally to both transports so a head-to-head stays apples-to-apples. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

sample_instances() was called with only `stage` (size from STAGE_SIZES), then the result was sliced `[:limit]`. That let --limit shrink the sample below the stage size but never grow it, so `--stage calibration --limit 40` silently ran just 10 instances. Pass n=args.limit straight into sample_instances so the limit sets the exact sample size (falling back to the stage size when unset). Because random.sample is prefix-stable for our seed, the n=10 calibration set stays a subset of the n=40 set, so existing trajectories/indexed graphs still resume-skip cleanly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ace) Cascades the corrected #695 (which removed /api/v2/ask) and aligns the benchmark's parity surface with the now-ask-less MCP/HTTP tool set. Removes the GraphRAG `ask` verb from both transports so cg and cg-mcp expose the same 7 structural verbs (index_repo, search_code, get_callers, get_callees, get_dependencies, impact_analysis, find_path): - bench/agents/code_graph_adapter.py: drop ask_v2() (was POST /api/v2/ask) - bench/agents/code_graph_mcp_adapter.py: drop ask() (was call_tool("ask")) - bench/cli/cg.py, cg_mcp.py: drop the `ask` subcommand + handler + docs - scripts/mcp_smoke.py: drop "ask" from the expected MCP tool set - system_preamble.md / tools.yaml / AGENTS.md: 8 -> 7 verbs Tests: tests/mcp (54) and bench suites (40) pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

DvirDukhan · 2026-06-09T11:27:15Z

Closing: we're standardizing the code-graph benchmark on the MCP arm (the real-world agent integration), so HTTP↔MCP bench parity isn't needed. The MCP arm in #693 is aligned to the live consolidated tool surface (get_neighbors/find_symbol/get_file_neighbors). Branch preserved if we revisit.

DvirDukhan and others added 2 commits May 30, 2026 07:58

DvirDukhan mentioned this pull request Jun 6, 2026

feat(mcp): hybrid search_code + get_neighbors + get_file_neighbors #701

Merged

DvirDukhan closed this Jun 9, 2026

This was referenced Jun 9, 2026

feat(api): /api/v2/* MCP-parity endpoints #695

Closed

bench: 3-config SWE-bench harness (baseline/lsp/code_graph_mcp) #693

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench(parity): cg HTTP and cg-mcp share the same 8-verb surface#696

bench(parity): cg HTTP and cg-mcp share the same 8-verb surface#696
DvirDukhan wants to merge 4 commits into
dvirdukhan/api-v2-mcp-parityfrom
dvirdukhan/bench-mcp-parity

DvirDukhan commented May 29, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 29, 2026 •

edited

Loading

Review skipped

Uh oh!

DvirDukhan commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DvirDukhan commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Prerequisites (merge order)

Summary

Changes

Validation

Stacked

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

DvirDukhan commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DvirDukhan commented May 29, 2026 •

edited

Loading

coderabbitai Bot commented May 29, 2026 •

edited

Loading