Migrate backend from Flask to FastAPI#609
Conversation
Remove @public_access decorator from analyze_repo and switch_commit endpoints so that all three mutating endpoints (analyze_folder, analyze_repo, switch_commit) require a valid token even when CODE_GRAPH_PUBLIC=1. Read-only endpoints remain publicly accessible. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughReplaced Flask with FastAPI (exported Changes
Sequence Diagram(s)sequenceDiagram
rect rgba(135,206,235,0.5)
participant Client
end
rect rgba(144,238,144,0.5)
participant API as FastAPI
participant Auth as Auth Dependency
end
rect rgba(255,228,196,0.5)
participant Graph as Graph Service
participant Git as Git Service
end
Client->>API: HTTP request to /api/*
API->>Auth: resolve dependency (public_or_auth / token_required)
Auth-->>API: allow or raise HTTPException
alt allowed
API->>Graph: query or mutate graph
Graph-->>API: data/result
API->>Git: repo operations (analyze, switch, list_commits)
Git-->>API: repo response
API-->>Client: JSONResponse or FileResponse
else rejected
API-->>Client: HTTP 401/422 response
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- Replace Flask with FastAPI + Uvicorn in pyproject.toml - Rewrite api/index.py: Pydantic request models, Depends() auth, FileResponse SPA serving, JSONResponse error responses - Auth decorators become dependency injection: public_or_auth (public endpoints), token_required (mutating endpoints) - Rewrite tests/index.py: FastAPI app factory, explicit imports - Update 7 endpoint test files: TestClient, .json() method, /api/ URL prefix - Update Makefile, start.sh, CI workflow: flask run -> uvicorn api.index:app - Endpoints remain synchronous (FastAPI auto-threads blocking calls) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (5)
api/index.py (1)
119-121: Uselogging.exceptionfor automatic traceback capture.When logging exceptions,
logging.exceptionautomatically includes the traceback, making debugging easier.♻️ Suggested improvement
except Exception as e: - logging.error("Error retrieving sub-graph for repo '%s': %s", repo, e) + logging.exception("Error retrieving sub-graph for repo '%s'", repo) return JSONResponse({"status": "Internal server error"}, status_code=500)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/index.py` around lines 119 - 121, Replace the logging.error call in the except block that handles exceptions from retrieving the sub-graph (the except Exception as e: handler) with logging.exception so the traceback is captured automatically; keep the existing message/context (including repo) and the subsequent return of JSONResponse({"status": "Internal server error"}, status_code=500) unchanged.tests/endpoints/test_list_commits.py (1)
26-28: Minor: Avoid callingresponse.json()multiple times.Each call to
response.json()re-parses the response body. Cache the result for cleaner code:♻️ Suggested improvement
# Start with an empty DB response = client.post("/api/list_commits", json={ "repo": "git_repo" }) - status = response.json()["status"] - commits = response.json()["commits"] + data = response.json() + status = data["status"] + commits = data["commits"]Also applies to: 44-46
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/endpoints/test_list_commits.py` around lines 26 - 28, The test calls response.json() multiple times (e.g., in test_list_commits where response is used to get "status" and "commits"); change to parse once into a variable (e.g., data = response.json()) and then use data["status"] and data["commits"]; apply the same change for the other occurrence around the references at lines 44-46 so each response is parsed only once.tests/index.py (1)
39-43: Unusedauthorizationparameter inpublic_access.The
authorizationparameter is received but never used. If this dependency is meant to optionally validate tokens in public mode, the logic should be updated. Otherwise, consider removing the parameter.♻️ Suggested fix
-def public_access(authorization: str | None = Header(None)): +def public_access(): """Test-specific: requires CODE_GRAPH_PUBLIC=1 (no token fallback).""" public = os.environ.get("CODE_GRAPH_PUBLIC", "0") if public != "1": raise HTTPException(status_code=401, detail="Unauthorized")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/index.py` around lines 39 - 43, The function public_access has an unused authorization parameter (typed str | None from Header) — either remove the parameter from the signature and any FastAPI dependency usage if no token handling is intended, or implement the intended validation: when CODE_GRAPH_PUBLIC == "1" accept requests without a token but when an authorization header is present validate it (reuse existing token validation helpers) and raise HTTPException(401) on failure; update the public_access function accordingly (referencing the authorization parameter and the CODE_GRAPH_PUBLIC check).tests/endpoints/test_list_repos.py (1)
23-31: Consider explicit auth testing to improve test robustness.This test calls
/api/list_reposwithout anAuthorizationheader. The endpoint usestoken_required, but the test passes becauseSECRET_TOKENis unset in the test environment. WhenSECRET_TOKENisNone,_verify_token(None)returnsTrue, bypassing authentication.While this works in the current test setup, the test doesn't explicitly verify authentication behavior. If
SECRET_TOKENis ever set during testing, this test would fail unexpectedly.Consider adding an explicit test for the authenticated case to improve robustness:
def test_list_repos_with_auth(client): """Verify endpoint accepts valid authentication token.""" import os os.environ["SECRET_TOKEN"] = "test-secret" response = client.get("/api/list_repos", headers={"Authorization": "Bearer test-secret"}) assert response.status_code == 200 del os.environ["SECRET_TOKEN"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/endpoints/test_list_repos.py` around lines 23 - 31, The current test_list_repos calls the /api/list_repos endpoint without an Authorization header and relies on SECRET_TOKEN being unset, which masks authentication behavior; add an explicit authenticated test (e.g., test_list_repos_with_auth) that sets os.environ["SECRET_TOKEN"]="test-secret", calls client.get("/api/list_repos", headers={"Authorization":"Bearer test-secret"}), asserts a 200 response and expected body, and then deletes the env var to clean up; also consider adding a negative test that sets SECRET_TOKEN and calls without the header to assert a 401 using the token_required/_verify_token flow.tests/endpoints/test_find_paths.py (1)
36-42: Pin the 422 to the rejected field.Right now these checks pass for any validation error on the request body. Asserting that the error location ends in
src/destwould keep the contract specific and make future schema regressions easier to diagnose.Suggested assertion tightening
response = client.post("/api/find_paths", json={"repo": "GraphRAG-SDK", "src": "invalid", "dest": 0}) assert response.status_code == 422 + assert response.json()["detail"][0]["loc"][-1] == "src" response = client.post("/api/find_paths", json={"repo": "GraphRAG-SDK", "src": 0, "dest": "invalid"}) assert response.status_code == 422 + assert response.json()["detail"][0]["loc"][-1] == "dest"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/endpoints/test_find_paths.py` around lines 36 - 42, Tighten the test assertions to ensure the 422 is due to the specific field validation by checking the response JSON validation error location: after posting to client.post("/api/find_paths", json={"repo": "GraphRAG-SDK", "src": "invalid", "dest": 0}) inspect response.json() for the error entry whose "loc" ends with "src" and assert that, and similarly for the second request assert the validation error "loc" ends with "dest" so the test targets the exact rejected field (update the assertions around response and response.json() in tests/endpoints/test_find_paths.py).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/index.py`:
- Around line 252-260: The serve_spa handler uses the untrusted full_path joined
to STATIC_DIR causing a path traversal vulnerability; either replace this manual
handler with FastAPI's StaticFiles mount (app.mount("/",
StaticFiles(directory=STATIC_DIR, html=True), name="spa")) to get secure static
serving, or if keeping serve_spa, resolve and validate the requested path by
calling (STATIC_DIR / full_path).resolve() and ensure it startswith
STATIC_DIR.resolve() before returning FileResponse (fall back to INDEX_HTML or
404); reference the serve_spa function, STATIC_DIR, INDEX_HTML, and FileResponse
when making the change.
- Around line 199-214: The analyze_folder endpoint uses user-supplied data.path
directly (see AnalyzeFolderRequest and analyze_folder) which allows path
traversal; fix by resolving and normalizing the path (use
Path(data.path).resolve() or os.path.realpath) and verifying it is a subpath of
an allowed base directory (compare resolved path.parents or use
os.path.commonpath against a configured BASE_DIR/project root) before calling
os.path.isdir or passing the path to SourceAnalyzer.analyze_local_folder; if the
check fails return the existing 400 response. Also document/obtain the BASE_DIR
constant used for the comparison.
- Around line 191-196: The current exception handling around ask(data.repo,
data.msg) leaks internal error details by returning str(e) to the client;
instead, log the full exception internally (using the existing logger or Python
logging.exception) inside the except block and return a generic error message
(e.g., "Internal server error") in the JSONResponse. Update the try/except that
wraps call to ask(...) and replace the returned str(e) with a non-sensitive
message while ensuring the exception is recorded via logging.exception or
processLogger.error with the exception context.
In `@tests/endpoints/test_repo_info.py`:
- Around line 21-26: The test currently uses the test app wiring and doesn't
exercise the real auth dependency; update test_repo_info to exercise the
production route wiring and also cover the private/token case: instantiate the
real app returned by create_app() (or otherwise use the app that wires the route
with Depends(public_or_auth)) and call POST /api/repo_info with no Authorization
header to assert public access, then call it again against the token-required
wiring (or by providing a valid token via token_required) to assert the endpoint
enforces auth in private mode—use the existing test function name
test_repo_info, the app factory create_app, and reference the dependency
public_or_auth/token_required and endpoint "/api/repo_info" to locate where to
change the client setup and requests.
In `@tests/index.py`:
- Around line 155-159: The current test endpoint calls ask(data.repo, data.msg)
and returns str(e) to clients on exception, leaking internal details; update the
except block in tests/index.py to log the full exception internally (use an
existing logger or logging.exception/logger.error with the exception) and return
a generic error JSONResponse like {"status":"error","response":"Internal server
error"} with status_code=500, referencing the ask call and the JSONResponse
return to locate the change.
- Around line 161-173: The analyze_folder endpoint uses data.path directly
causing a path traversal risk; before checking os.path.isdir or passing the path
to SourceAnalyzer.analyze_local_folder, resolve and validate the path: create a
Path from data.path, call .resolve(strict=False) (or canonicalize) and verify it
is a subpath of an allowed base directory (or of the project workspace) and that
the resolved path is an existing directory; if validation fails return the same
400 JSONResponse. Update references in the analyze_folder handler
(AnalyzeFolderRequest, analyze_folder, SourceAnalyzer.analyze_local_folder) to
use the resolved/validated Path instance instead of the raw input string.
- Around line 175-186: Tests currently mount analyze_repo and switch_commit with
public_access instead of the production token_required, which bypasses auth;
update the test setup so the test app uses the same auth behavior as production
by either importing the production FastAPI app from api.index (so analyze_repo
and switch_commit use token_required) or, if keeping a test app, remove
public_access from those route decorators and instead use FastAPI
dependency_overrides to replace token_required with a test mock; reference the
analyze_repo and switch_commit route definitions and the public_access and
token_required dependency functions when making the change so tests exercise the
real auth logic.
---
Nitpick comments:
In `@api/index.py`:
- Around line 119-121: Replace the logging.error call in the except block that
handles exceptions from retrieving the sub-graph (the except Exception as e:
handler) with logging.exception so the traceback is captured automatically; keep
the existing message/context (including repo) and the subsequent return of
JSONResponse({"status": "Internal server error"}, status_code=500) unchanged.
In `@tests/endpoints/test_find_paths.py`:
- Around line 36-42: Tighten the test assertions to ensure the 422 is due to the
specific field validation by checking the response JSON validation error
location: after posting to client.post("/api/find_paths", json={"repo":
"GraphRAG-SDK", "src": "invalid", "dest": 0}) inspect response.json() for the
error entry whose "loc" ends with "src" and assert that, and similarly for the
second request assert the validation error "loc" ends with "dest" so the test
targets the exact rejected field (update the assertions around response and
response.json() in tests/endpoints/test_find_paths.py).
In `@tests/endpoints/test_list_commits.py`:
- Around line 26-28: The test calls response.json() multiple times (e.g., in
test_list_commits where response is used to get "status" and "commits"); change
to parse once into a variable (e.g., data = response.json()) and then use
data["status"] and data["commits"]; apply the same change for the other
occurrence around the references at lines 44-46 so each response is parsed only
once.
In `@tests/endpoints/test_list_repos.py`:
- Around line 23-31: The current test_list_repos calls the /api/list_repos
endpoint without an Authorization header and relies on SECRET_TOKEN being unset,
which masks authentication behavior; add an explicit authenticated test (e.g.,
test_list_repos_with_auth) that sets os.environ["SECRET_TOKEN"]="test-secret",
calls client.get("/api/list_repos", headers={"Authorization":"Bearer
test-secret"}), asserts a 200 response and expected body, and then deletes the
env var to clean up; also consider adding a negative test that sets SECRET_TOKEN
and calls without the header to assert a 401 using the
token_required/_verify_token flow.
In `@tests/index.py`:
- Around line 39-43: The function public_access has an unused authorization
parameter (typed str | None from Header) — either remove the parameter from the
signature and any FastAPI dependency usage if no token handling is intended, or
implement the intended validation: when CODE_GRAPH_PUBLIC == "1" accept requests
without a token but when an authorization header is present validate it (reuse
existing token validation helpers) and raise HTTPException(401) on failure;
update the public_access function accordingly (referencing the authorization
parameter and the CODE_GRAPH_PUBLIC check).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 72958536-c9a5-4d09-a3dc-2673bd0d48a7
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (13)
.github/workflows/playwright.ymlMakefileapi/index.pypyproject.tomlstart.shtests/endpoints/test_auto_complete.pytests/endpoints/test_find_paths.pytests/endpoints/test_get_neighbors.pytests/endpoints/test_graph_entities.pytests/endpoints/test_list_commits.pytests/endpoints/test_list_repos.pytests/endpoints/test_repo_info.pytests/index.py
There was a problem hiding this comment.
Pull request overview
This PR migrates the backend runtime from Flask to FastAPI and tightens authentication so that mutating endpoints consistently require a valid token, even when running in public mode (CODE_GRAPH_PUBLIC=1).
Changes:
- Replace Flask app/decorators with FastAPI routes + dependency-based auth (
public_or_authvstoken_required) and add SPA static serving. - Ensure
/api/analyze_repoand/api/switch_commitrequire authentication (no public-mode bypass). - Update scripts, CI, and endpoint tests to use Uvicorn/TestClient and
/api/*paths; add FastAPI/Uvicorn/httpx dependencies.
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
api/index.py |
Migrates API to FastAPI, centralizes auth via dependencies, adds SPA catch-all route. |
tests/index.py |
Reworks test app factory to FastAPI + Pydantic models and dependency-based auth. |
tests/endpoints/test_*.py |
Switches endpoint tests to Starlette TestClient and /api/* routes. |
pyproject.toml |
Replaces Flask dependency with FastAPI + Uvicorn; adds httpx for tests. |
uv.lock |
Locks new FastAPI/Uvicorn dependency tree; removes Flask-related packages. |
Makefile |
Updates dev/prod run targets to use Uvicorn. |
start.sh |
Updates container startup to Uvicorn. |
.github/workflows/playwright.yml |
Updates CI server startup step from Flask to Uvicorn. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- serve_spa: resolve path and verify it stays within STATIC_DIR - chat endpoint: log exception server-side, return generic error to client - Apply same fixes to test app Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
6dc37b7 to
0704421
Compare
Match production auth by using token_required (not public_access) for analyze_repo and switch_commit in the test app. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…gging - serve_spa: replace str().startswith() with Path.is_relative_to() to prevent prefix-collision path traversal (e.g. dist_evil matching dist) - analyze_folder: add ALLOWED_ANALYSIS_DIR containment check to prevent traversal outside allowed base directory - Upgrade logging.error to logging.exception in graph_entities and chat handlers for full traceback capture - Remove unused public_access function from tests/index.py - Mirror analyze_folder path validation in test app - Add auth tests for repo_info (public_or_auth) and list_repos endpoints - Tighten test_find_paths 422 assertions to verify field-specific validation - Parse response.json() once per response in test_list_commits Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/index.py (1)
49-79: Consider extracting shared Pydantic models to avoid duplication.The request models (
RepoRequest,NeighborsRequest, etc.) are duplicated betweenapi/index.pyandtests/index.py. Consider extracting them to a shared module likeapi/models.pyto reduce maintenance burden.♻️ Suggested structure
Create
api/models.py:from pydantic import BaseModel class RepoRequest(BaseModel): repo: str class NeighborsRequest(BaseModel): repo: str node_ids: list[int] # ... other modelsThen import in both files:
from api.models import RepoRequest, NeighborsRequest, ...🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/index.py` around lines 49 - 79, Extract the duplicated Pydantic request models (RepoRequest, NeighborsRequest, AutoCompleteRequest, FindPathsRequest, ChatRequest, AnalyzeFolderRequest, AnalyzeRepoRequest, SwitchCommitRequest) into a single shared module (e.g., api.models) and replace the local class definitions in tests/index.py and api/index.py with imports from that module; create api.models containing the same BaseModel subclasses, update imports in both files to `from api.models import ...`, and ensure any type annotations (e.g., list[int]) remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/index.py`:
- Around line 49-79: Extract the duplicated Pydantic request models
(RepoRequest, NeighborsRequest, AutoCompleteRequest, FindPathsRequest,
ChatRequest, AnalyzeFolderRequest, AnalyzeRepoRequest, SwitchCommitRequest) into
a single shared module (e.g., api.models) and replace the local class
definitions in tests/index.py and api/index.py with imports from that module;
create api.models containing the same BaseModel subclasses, update imports in
both files to `from api.models import ...`, and ensure any type annotations
(e.g., list[int]) remain unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b364e30c-9715-45ca-8666-4bee15953662
📒 Files selected for processing (6)
api/index.pytests/endpoints/test_find_paths.pytests/endpoints/test_list_commits.pytests/endpoints/test_list_repos.pytests/endpoints/test_repo_info.pytests/index.py
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/endpoints/test_find_paths.py
- tests/endpoints/test_repo_info.py
Use 'import api.index' consistently instead of mixing with 'from api.index import'. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/endpoints/test_repo_info.py (1)
15-15: Trailing whitespace.Lines 15 and 28 have trailing spaces after
["status"].🧹 Suggested cleanup
- status = response.json()["status"] + status = response.json()["status"]- status = data["status"] + status = data["status"]Also applies to: 28-28
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/endpoints/test_repo_info.py` at line 15, Remove the trailing whitespace characters after the expression response.json()["status"] in the status assignment(s) (the variable status) so the lines contain no extra spaces; update both occurrences where status = response.json()["status"] appears to eliminate the trailing space(s).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/endpoints/test_list_repos.py`:
- Around line 53-61: The test test_list_repos_with_auth currently only asserts
response.status_code != 401 which can false-pass; change it to assert a concrete
success response (e.g., response.status_code == 200) and validate the returned
payload shape (e.g., response.json() is a list or contains expected keys) after
calling client.get("/api/list_repos") with the monkeypatched
api.index.SECRET_TOKEN. To avoid flaky failures from backend/data access, before
the request monkeypatch the endpoint's datastore call used by the
/api/list_repos route (the function in api.index that queries repos) to return a
deterministic result (e.g., empty list or a small fake repo list), then assert
the explicit status and JSON contents.
---
Nitpick comments:
In `@tests/endpoints/test_repo_info.py`:
- Line 15: Remove the trailing whitespace characters after the expression
response.json()["status"] in the status assignment(s) (the variable status) so
the lines contain no extra spaces; update both occurrences where status =
response.json()["status"] appears to eliminate the trailing space(s).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 382eb1c3-cc3a-44a4-84fd-694687e636d1
📒 Files selected for processing (2)
tests/endpoints/test_list_repos.pytests/endpoints/test_repo_info.py
- test_list_repos_with_auth: monkeypatch get_repos for deterministic result, assert 200 status and exact payload instead of just != 401 - test_repo_info.py: remove trailing whitespace on status assignments - test_list_repos.py: remove trailing whitespace on status assignment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
fix #608
Summary
Remove
@public_accessdecorator fromanalyze_repoandswitch_commitso that all three mutating endpoints always require a valid token, even whenCODE_GRAPH_PUBLIC=1.Changes
POST /api/analyze_repo— removed@public_access(clones repos & writes to DB)POST /api/switch_commit— removed@public_access(replays graph mutations)POST /api/analyze_folder— already required token (no change)The 8 read-only endpoints remain publicly accessible in public mode.
Summary by CodeRabbit