Security hardening, correctness fixes, and documentation improvements by MO2k4 · Pull Request #16 · colbymchenry/codegraph

MO2k4 · 2026-02-06T07:01:12Z

Hey @colbymchenry — first off, thank you for building CodeGraph. The idea of a local-first semantic code graph that plugs straight into Claude Code is genuinely useful, and the tree-sitter + SQLite foundation is solid. I've been using it on a few projects and wanted to give back, so I spent some time doing a thorough security and correctness audit. Here's what came out of it.

Summary

This PR is the result of a deep security and correctness audit of the CodeGraph codebase. It addresses 19 issues, adds comprehensive test coverage for all fixes and resolves all npm audit vulnerabilities.

Key stats:

18 files changed, ~2,000 lines added, ~650 removed
246 tests passing (up from 239), 0 npm audit vulnerabilities (down from 5)
3 new test files: security.test.ts (33 tests), installer.test.ts (8 tests), worktree tests in sync.test.ts (6 tests)

Changes

Fix git worktree support for hooks (`src/sync/git-hooks.ts`)

The GitHooksManager constructor hard-coded .git as a directory, which broke in git worktrees where .git is a file containing a gitdir: pointer. Hooks would fail to install or detect in worktree checkouts.

Fix: Added resolveGitDir() that reads the .git file, follows the gitdir: pointer (supporting both absolute and relative paths), and navigates up from worktrees/<name> to the main repo's .git directory. Hooks are now correctly installed in the shared hooks directory.

Cross-process file locking for database writes (`src/utils.ts`, `src/index.ts`)

The in-memory indexMutex only protected a single process. Multiple concurrent processes (git hook background syncs, CLI commands, MCP server) could all write to the same SQLite database simultaneously, causing SQLITE_BUSY errors or data corruption.

Fix: Added a FileLock class that uses filesystem-level advisory locking via fs.writeFileSync(path, pid, { flag: 'wx' }) for atomic lock creation. Stale locks are detected by checking if the owning PID is still alive. All write operations (indexAll, indexFiles, sync) are now wrapped with fileLock.withLockAsync(), and the lock is released in close().

Path traversal vulnerability in MCP file reads (`src/context/index.ts`)

path.join(this.projectRoot, node.filePath) did not prevent directory escape. A tampered database entry with ../../etc/passwd in file_path would cause reads outside the project root.

Fix: Changed path.join to path.resolve and added validation that the resolved path starts with the project root directory. Blocked attempts are logged as warnings.

Input validation on MCP tool parameters (`src/mcp/tools.ts`)

All MCP handlers cast args without type or bounds checks. Unbounded depth on codegraph_impact could exhaust memory. Non-string query values could cause runtime errors.

Fix: Added validateString() and validateNumber() helper methods. All 6 tool handlers now validate input types and clamp bounds (limit: 1–100, depth: 1–10, maxNodes: 1–100). Invalid inputs return structured error results instead of crashing.

Interactive confirmation before global npm install (`src/installer/index.ts`)

The installer ran npm install -g silently without user confirmation, which is a supply chain risk.

Fix: Added an explicit user confirmation prompt before running the global install.

Pinned tree-sitter-liquid dependency (`package.json`)

tree-sitter-liquid tracked the default branch of a GitHub repo with no version pin. This is a native C add-on compiled during install — an unpinned dependency is a supply chain risk.

Fix: Pinned to specific commit hash d6ebde3974742cd1b61b55d1d94aab1dacb41056.

Atomic writes for installer config files (`src/installer/config-writer.ts`)

fs.writeFileSync() on ~/.claude.json and other config files could leave them partially written on crash.

Fix: Added atomicWriteFileSync() helper that writes to a temp file (.tmp.<pid>) then uses fs.renameSync() for an atomic swap. All config file writes now use this helper.

Replace hand-rolled glob-to-regex with picomatch (`src/extraction/index.ts`, `src/config.ts`)

Custom glob-to-regex conversion via new RegExp() was vulnerable to catastrophic backtracking (ReDoS) with crafted patterns in .codegraph/config.json.

Fix: Replaced both matchesGlob() in extraction and matchesPattern() in config with picomatch.isMatch(). Added picomatch as a production dependency.

Fix silent error swallowing in installer (`src/installer/config-writer.ts`)

readJsonFile() returned {} for any failure — couldn't distinguish "file doesn't exist" from "file is corrupted". A corrupted ~/.claude.json would be silently discarded.

Fix: readJsonFile() now distinguishes ENOENT (returns {}) from parse errors (warns the user and creates a .backup of the corrupted file before overwriting).

Add error boundaries around JSON.parse on DB values (`src/db/queries.ts`)

JSON.parse() on decorators, typeParameters, and metadata columns had no per-field error handling. A single corrupted row would crash the entire process.

Fix: Added safeJsonParse() helper that returns undefined on failure and logs a warning with the row context for debugging. All 5 bare JSON.parse() calls in query result mapping now use this helper.

Fix resource cleanup on error paths (`src/bin/codegraph.ts`, `src/mcp/index.ts`)

CLI commands didn't call cg.close() on error. The MCP server's stop() method could skip cleanup if cg.close() threw.

Fix: All CLI command handlers now use let cg: CodeGraph | null = null with try/catch/finally blocks that call cg?.close() in the finally path. The MCP server stop() wraps cg.close() in its own try/catch.

Add symlink cycle detection in directory scanning (`src/extraction/index.ts`)

fs.readdirSync() followed symlinks without cycle detection. A symlink to a parent directory would cause an infinite loop.

Fix: Added visitedDirs Set tracking real paths via fs.realpathSync(). Already-visited directories are skipped. Symlinks to files are handled correctly (indexed if they match include patterns), and broken symlinks are logged and skipped.

Fix CLAUDE.md section replacement regex (`src/installer/config-writer.ts`)

The regex \n## [^#] used to find the next section header incorrectly matched ### Subtopic lines, potentially overwriting unrelated content after a CodeGraph section with subsections.

Fix: Changed to negative lookahead \n## (?!#) which correctly matches ## Header but not ### Subsection. The installer now properly preserves content after the CodeGraph section even when nested headers are present.

Qualify "100% local" claim in README

The @xenova/transformers library downloads a ~100MB embedding model from Hugging Face on first use. This is now documented with a footnote on the tagline and in the "100% Local" feature description.

Remove non-existent `gitHooksEnabled` config option from README

This option was listed in the README configuration table but was never implemented in the actual config schema (src/types.ts). Git hooks are controlled via codegraph init --no-hooks and codegraph hooks install/remove CLI commands.

Document and surface large file skip behavior

Files exceeding maxFileSize (default 1MB) were silently skipped during indexing with no indication to the user. The CLI now reports how many files were skipped due to size limits after indexing, with guidance to adjust maxFileSize in config. Added documentation in the README troubleshooting section.

Improve git hook failure visibility (`src/sync/git-hooks.ts`, `src/bin/codegraph.ts`)

The post-commit hook redirected all stderr to /dev/null, making sync failures completely invisible. Now errors are appended to .codegraph/sync.log. The codegraph hooks status command also checks this log and shows recent errors.

Add worktree test coverage (`tests/sync.test.ts`)

The resolveGitDir() worktree path calculation had no test coverage. Added 6 tests covering: absolute and relative gitdir paths, regular repos, missing .git, invalid .git file content, and hook installation from a worktree targeting the main repo's hooks directory.

Document benchmark methodology (README)

Added a collapsible section explaining how the headline token/tool-call numbers were measured, and how to run the precision/recall evaluation suite against the included fixture projects.

Upgrade vitest to v4.0.18 (`package.json`)

Resolved all 5 moderate severity vulnerabilities in the esbuild/vite dependency chain (GHSA-67mh-4wv8-2f99). npm audit now reports 0 vulnerabilities.

Test plan

All 246 tests pass across 10 test files
npm audit reports 0 vulnerabilities
npm run build compiles cleanly with no TypeScript errors
Manual verification: run codegraph init --index on a sample project
Manual verification: run the MCP server with Claude Code and test tool calls
Manual verification: create a symlink cycle in a project directory and confirm indexing completes without hanging

Introduce FileLock class that uses a PID-tracked lock file (.codegraph/codegraph.lock) to prevent concurrent database writes from multiple processes (git hooks, CLI, MCP server). The lock detects stale locks by checking if the owning PID is still alive, and uses O_EXCL (wx flag) for atomic lock creation to prevent race conditions. Write operations (indexAll, indexFiles, sync) now acquire the file lock before the in-process mutex. The lock is also released on close().

Validate that resolved file paths stay within the project root before reading. Previously, a tampered database with "../" sequences in file_path could cause reads outside the project directory via the codegraph_node and codegraph_context MCP tools. Now uses path.resolve() and verifies the result starts with the project root path before proceeding with file reads.

All MCP tool handlers now validate parameter types and bounds before processing. String parameters (query, symbol, task) are checked for type and non-emptiness. Numeric parameters (limit, depth, maxNodes) are validated with Number.isFinite and clamped to safe ranges (limit: 1-100, depth: 1-10) to prevent resource exhaustion from unbounded graph traversals.

The installer now prompts the user for confirmation before running npm install -g. Previously, this ran silently which is a supply chain risk. If the user declines, the installer skips the global install and uses npx as a fallback.

Pin the GitHub dependency to commit d6ebde39 instead of tracking the default branch. This prevents a compromised upstream repo from injecting malicious code via the native C add-on compiled during npm install.

Replace direct fs.writeFileSync calls with an atomicWriteFileSync helper that writes to a temporary file then renames it into place. This prevents config file corruption (claude.json, settings.json, CLAUDE.md) if the process crashes mid-write, since rename is atomic on POSIX systems.

Cover FileLock (acquire/release, stale lock detection, withLock, error cleanup), path traversal prevention in ContextBuilder, MCP input validation (type checks, empty strings, bounds clamping for all tool handlers), and atomic write correctness. 22 new tests.

The custom glob-to-regex conversion using new RegExp() was vulnerable to catastrophic backtracking (ReDoS) with crafted patterns. Replace both instances (src/extraction/index.ts matchesGlob and src/config.ts matchesPattern) with picomatch, a well-tested glob matching library. Adds tests verifying standard glob patterns, complex brace expansion, dot files, and a ReDoS pattern that completes in <100ms.

Distinguish between missing files (return {}) and corrupted files (warn user, create .backup copy, then return {}). Previously all errors were silently ignored, making it impossible to diagnose corrupted claude.json or settings.json files. Adds installer tests covering corruption handling, backup creation, and preservation of existing config fields.

Wrap all JSON.parse calls on database column values (decorators, typeParameters, metadata, errors) with safeJsonParse() that catches parse errors and returns undefined instead of crashing the process. Logs a warning with the column context for debugging. Adds tests that inject malformed JSON into nodes, edges, and file records and verify graceful degradation.

All CLI commands (init, index, sync, status, query, context, hooks) now use try/finally blocks to ensure cg.close() is called even when errors occur or process.exit() is invoked. Previously, error paths leaked database connections. MCP server stop() now wraps cg.close() in try/finally to ensure transport cleanup happens even if database close fails.

Track visited real paths (via fs.realpathSync) during directory traversal to prevent infinite loops from symlinks pointing to parent directories. Properly handle symlinks to directories (follow once), symlinks to files (index them), and broken symlinks (skip gracefully). Adds tests for symlink cycles, valid directory symlinks, and broken symlink handling.

The unmarked section detection regex /\n## [^#]/ incorrectly matched ### subsection headers, causing content after CodeGraph subsections to be overwritten. Changed to /\n## (?!#)/ (negative lookahead) so only h2-level headers terminate the section boundary. Adds tests for sections with ### subsections and for standard section replacement.

Semantic search uses @xenova/transformers which downloads a ~100MB embedding model on first use. This is now documented in the tagline footnote and the 100% Local feature description.

This option was documented but never implemented in the config schema. Git hooks are controlled via the `codegraph init --no-hooks` CLI flag and `codegraph hooks install/remove` commands instead.

Files exceeding maxFileSize (default 1MB) are now reported in the CLI output after indexing, with guidance to adjust the config. Added documentation in the README troubleshooting section and config table.

Previously, the post-commit hook redirected all stderr to /dev/null, making sync failures invisible. Now errors are appended to .codegraph/sync.log. The 'codegraph hooks status' command also shows recent errors from this log file.

Tests cover absolute and relative gitdir paths, regular repos, missing .git, invalid .git file content, and hook installation from a worktree targeting the main repo's hooks directory.

Explains how the headline token/tool-call numbers were measured and how to run the precision/recall evaluation suite against the included TypeScript and Python fixture projects.

Upgrades vitest from ^2.0.0 to ^4.0.18, which resolves all 5 moderate severity vulnerabilities in the esbuild/vite dependency chain (GHSA-67mh-4wv8-2f99). npm audit now reports 0 vulnerabilities.

These files were added prematurely and contain inaccuracies (e.g., referencing non-existent gitHooksEnabled config option, undocumented CLI flags). Removing until they can be properly aligned with the actual codebase.

rickross · 2026-02-06T11:53:20Z

Hi Martin, this is truly outstanding security work! I spent some time on this project yesterday and have made several additions and improvements. We made it use multiple workers, so it has sped up by quite a bit.

We've just merged your PR into our fork, and with some effort have gotten things aligned. Because we had added and changed so much, it was a little tricky to get everything merged. We have reached out to @colbymchenry but haven't heard back from him yet, so we don't know if he will want to fold in our changes into his main branch.

I would love for you to take a look at https://github.com/rickross/codegraph and check that the key elements of your PR are correctly merged. I hope you will, and you can reach me directly at rickross@gmail.com if you wish.

MO2k4 · 2026-02-06T16:53:56Z

That is very cool, I will definitely take a deeper look on monday, but first weekend is calling 😊

@rickross Did you encounter any issues or why did you remove those changes?

…idation, and logging - Fix console.log in vectors/search.ts that corrupts MCP JSON-RPC stdout stream - Add shared isPathWithinRoot() utility and apply it to indexFile, ResolutionContext, and context builder to prevent path traversal via ../ sequences - Add symlink boundary check in scanDirectory to block symlinks escaping project root - Reject import paths that resolve outside project root in import-resolver - Validate ensureSubdirectory name against .. and path separators - Use url.fileURLToPath() for proper MCP URI parsing instead of naive regex - Bound findPath BFS to 10,000 visited nodes; change default maxDepth from Infinity to 20 - Replace hand-rolled regex with picomatch in findByQualifiedName to eliminate ReDoS - Harden FTS5 sanitization to strip :, ^ and filter AND/OR/NOT/NEAR operators - Use safeJsonParse consistently for unresolved_refs.candidates column - Validate MCP kind parameter against NodeKind enum server-side - Validate config with validateConfig() after Object.assign in init/initSync/updateConfig

rickross · 2026-02-09T12:08:05Z

That is very cool, I will definitely take a deeper look on monday, but first weekend is calling 😊

@rickross Did you encounter any issues or why did you remove those changes?

Hi Martin, I'm not sure which changes you mean? I spent quite a while on enhancing codegraph this weekend so my set of AI agents could get enhanced value out of using it on our several open codebases. The result has now diverged somewhat from @colbymchenry original version, and it is quite a bit faster and has begun producing better resolutions. I will likely continue to enhance this as I find the time. I'm still hoping to hear from @colbymchenry but now will just forge ahead, I guess.

…locking Implements security improvements inspired by PR #16 (credit: MO2k4): - Add validatePathWithinRoot() to prevent path traversal attacks in extraction and context building - Clamp MCP tool inputs (limit, depth, maxDepth) to sane ranges - Use atomic writes (temp file + rename) for config saves - Add symlink cycle detection in directory scanning to prevent infinite loops - Replace all JSON.parse calls in db/queries.ts with safeJsonParse fallbacks to handle corrupted database metadata gracefully - Add cross-process FileLock for DB write operations (indexAll, indexFiles, sync) to prevent concurrent writes from CLI, MCP server, and git hooks - Remove unused path import from context/index.ts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

colbymchenry · 2026-02-10T05:24:40Z

@MO2k4 I made some significant updates to CodeGraph over the last few weeks. Adding support for Windows systems, Dart, and among other things. I setup another PR here: #19 giving you credit and mentioning your ideas. Thank you for the great ideas!

colbymchenry · 2026-02-10T06:08:58Z

Hey @MO2k4, thanks for this PR! We've incorporated your security hardening work (path traversal validation, input clamping, safe JSON parsing, file locking) into the codebase. It's been merged into main via PR #19, which also cherry-picked improvements from PR #15. Appreciate the contribution!

MO2k4 added 22 commits February 6, 2026 07:43

Add comprehensive documentation and fix git worktree support

8f91cd2

Require user confirmation before global npm install

57045d3

The installer now prompts the user for confirmation before running npm install -g. Previously, this ran silently which is a supply chain risk. If the user declines, the installer skips the global install and uses npx as a fallback.

Pin tree-sitter-liquid to specific commit hash

3a6bdd6

Pin the GitHub dependency to commit d6ebde39 instead of tracking the default branch. This prevents a compromised upstream repo from injecting malicious code via the native C add-on compiled during npm install.

Qualify '100% local' claim with model download note

754f195

Semantic search uses @xenova/transformers which downloads a ~100MB embedding model on first use. This is now documented in the tagline footnote and the 100% Local feature description.

Remove non-existent gitHooksEnabled from README config table

140aaef

This option was documented but never implemented in the config schema. Git hooks are controlled via the `codegraph init --no-hooks` CLI flag and `codegraph hooks install/remove` commands instead.

Document and surface large file skip behavior during indexing

011c594

Files exceeding maxFileSize (default 1MB) are now reported in the CLI output after indexing, with guidance to adjust the config. Added documentation in the README troubleshooting section and config table.

Log git hook sync errors to .codegraph/sync.log

0428702

Previously, the post-commit hook redirected all stderr to /dev/null, making sync failures invisible. Now errors are appended to .codegraph/sync.log. The 'codegraph hooks status' command also shows recent errors from this log file.

Add worktree test coverage for git hooks manager

5ca1b9e

Tests cover absolute and relative gitdir paths, regular repos, missing .git, invalid .git file content, and hook installation from a worktree targeting the main repo's hooks directory.

Document benchmark methodology and how to run evaluation suite

45d1a2f

Explains how the headline token/tool-call numbers were measured and how to run the precision/recall evaluation suite against the included TypeScript and Python fixture projects.

Upgrade vitest to v4.0.18 to resolve audit vulnerabilities

fa0c8a9

Upgrades vitest from ^2.0.0 to ^4.0.18, which resolves all 5 moderate severity vulnerabilities in the esbuild/vite dependency chain (GHSA-67mh-4wv8-2f99). npm audit now reports 0 vulnerabilities.

Remove USAGE_GUIDE.md and QUICK_REFERENCE.md

271ebbd

These files were added prematurely and contain inaccuracies (e.g., referencing non-existent gitHooksEnabled config option, undocumented CLI flags). Removing until they can be properly aligned with the actual codebase.

colbymchenry mentioned this pull request Feb 10, 2026

Security hardening: path validation, input clamping, safe JSON, file locking #19

Merged

4 tasks

colbymchenry closed this Feb 10, 2026

antonygiomarxdev mentioned this pull request May 25, 2026

Extraer ResultStore: escrituras SQLite detrás de interfaz antonygiomarxdev/citadel#15

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Security hardening, correctness fixes, and documentation improvements#16

Security hardening, correctness fixes, and documentation improvements#16
MO2k4 wants to merge 23 commits into
colbymchenry:mainfrom
MO2k4:security-hardening-and-docs-improvements

MO2k4 commented Feb 6, 2026

Uh oh!

rickross commented Feb 6, 2026

Uh oh!

MO2k4 commented Feb 6, 2026 •

edited

Loading

Uh oh!

rickross commented Feb 9, 2026

Uh oh!

colbymchenry commented Feb 10, 2026

Uh oh!

colbymchenry commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

MO2k4 commented Feb 6, 2026

Summary

Changes

Fix git worktree support for hooks (src/sync/git-hooks.ts)

Cross-process file locking for database writes (src/utils.ts, src/index.ts)

Path traversal vulnerability in MCP file reads (src/context/index.ts)

Input validation on MCP tool parameters (src/mcp/tools.ts)

Interactive confirmation before global npm install (src/installer/index.ts)

Pinned tree-sitter-liquid dependency (package.json)

Atomic writes for installer config files (src/installer/config-writer.ts)

Replace hand-rolled glob-to-regex with picomatch (src/extraction/index.ts, src/config.ts)

Fix silent error swallowing in installer (src/installer/config-writer.ts)

Add error boundaries around JSON.parse on DB values (src/db/queries.ts)

Fix resource cleanup on error paths (src/bin/codegraph.ts, src/mcp/index.ts)

Add symlink cycle detection in directory scanning (src/extraction/index.ts)

Fix CLAUDE.md section replacement regex (src/installer/config-writer.ts)

Qualify "100% local" claim in README

Remove non-existent gitHooksEnabled config option from README

Document and surface large file skip behavior

Improve git hook failure visibility (src/sync/git-hooks.ts, src/bin/codegraph.ts)

Add worktree test coverage (__tests__/sync.test.ts)

Document benchmark methodology (README)

Upgrade vitest to v4.0.18 (package.json)

Test plan

Uh oh!

rickross commented Feb 6, 2026

Uh oh!

MO2k4 commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rickross commented Feb 9, 2026

Uh oh!

colbymchenry commented Feb 10, 2026

Uh oh!

colbymchenry commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix git worktree support for hooks (`src/sync/git-hooks.ts`)

Cross-process file locking for database writes (`src/utils.ts`, `src/index.ts`)

Path traversal vulnerability in MCP file reads (`src/context/index.ts`)

Input validation on MCP tool parameters (`src/mcp/tools.ts`)

Interactive confirmation before global npm install (`src/installer/index.ts`)

Pinned tree-sitter-liquid dependency (`package.json`)

Atomic writes for installer config files (`src/installer/config-writer.ts`)

Replace hand-rolled glob-to-regex with picomatch (`src/extraction/index.ts`, `src/config.ts`)

Fix silent error swallowing in installer (`src/installer/config-writer.ts`)

Add error boundaries around JSON.parse on DB values (`src/db/queries.ts`)

Fix resource cleanup on error paths (`src/bin/codegraph.ts`, `src/mcp/index.ts`)

Add symlink cycle detection in directory scanning (`src/extraction/index.ts`)

Fix CLAUDE.md section replacement regex (`src/installer/config-writer.ts`)

Remove non-existent `gitHooksEnabled` config option from README

Improve git hook failure visibility (`src/sync/git-hooks.ts`, `src/bin/codegraph.ts`)

Add worktree test coverage (`tests/sync.test.ts`)

Upgrade vitest to v4.0.18 (`package.json`)

MO2k4 commented Feb 6, 2026 •

edited

Loading