Security hardening, correctness fixes, and documentation improvements#16
Security hardening, correctness fixes, and documentation improvements#16MO2k4 wants to merge 23 commits into
Conversation
Introduce FileLock class that uses a PID-tracked lock file (.codegraph/codegraph.lock) to prevent concurrent database writes from multiple processes (git hooks, CLI, MCP server). The lock detects stale locks by checking if the owning PID is still alive, and uses O_EXCL (wx flag) for atomic lock creation to prevent race conditions. Write operations (indexAll, indexFiles, sync) now acquire the file lock before the in-process mutex. The lock is also released on close().
Validate that resolved file paths stay within the project root before reading. Previously, a tampered database with "../" sequences in file_path could cause reads outside the project directory via the codegraph_node and codegraph_context MCP tools. Now uses path.resolve() and verifies the result starts with the project root path before proceeding with file reads.
All MCP tool handlers now validate parameter types and bounds before processing. String parameters (query, symbol, task) are checked for type and non-emptiness. Numeric parameters (limit, depth, maxNodes) are validated with Number.isFinite and clamped to safe ranges (limit: 1-100, depth: 1-10) to prevent resource exhaustion from unbounded graph traversals.
The installer now prompts the user for confirmation before running npm install -g. Previously, this ran silently which is a supply chain risk. If the user declines, the installer skips the global install and uses npx as a fallback.
Pin the GitHub dependency to commit d6ebde39 instead of tracking the default branch. This prevents a compromised upstream repo from injecting malicious code via the native C add-on compiled during npm install.
Replace direct fs.writeFileSync calls with an atomicWriteFileSync helper that writes to a temporary file then renames it into place. This prevents config file corruption (claude.json, settings.json, CLAUDE.md) if the process crashes mid-write, since rename is atomic on POSIX systems.
Cover FileLock (acquire/release, stale lock detection, withLock, error cleanup), path traversal prevention in ContextBuilder, MCP input validation (type checks, empty strings, bounds clamping for all tool handlers), and atomic write correctness. 22 new tests.
The custom glob-to-regex conversion using new RegExp() was vulnerable to catastrophic backtracking (ReDoS) with crafted patterns. Replace both instances (src/extraction/index.ts matchesGlob and src/config.ts matchesPattern) with picomatch, a well-tested glob matching library. Adds tests verifying standard glob patterns, complex brace expansion, dot files, and a ReDoS pattern that completes in <100ms.
Distinguish between missing files (return {}) and corrupted files
(warn user, create .backup copy, then return {}). Previously all
errors were silently ignored, making it impossible to diagnose
corrupted claude.json or settings.json files.
Adds installer tests covering corruption handling, backup creation,
and preservation of existing config fields.
Wrap all JSON.parse calls on database column values (decorators, typeParameters, metadata, errors) with safeJsonParse() that catches parse errors and returns undefined instead of crashing the process. Logs a warning with the column context for debugging. Adds tests that inject malformed JSON into nodes, edges, and file records and verify graceful degradation.
All CLI commands (init, index, sync, status, query, context, hooks) now use try/finally blocks to ensure cg.close() is called even when errors occur or process.exit() is invoked. Previously, error paths leaked database connections. MCP server stop() now wraps cg.close() in try/finally to ensure transport cleanup happens even if database close fails.
Track visited real paths (via fs.realpathSync) during directory traversal to prevent infinite loops from symlinks pointing to parent directories. Properly handle symlinks to directories (follow once), symlinks to files (index them), and broken symlinks (skip gracefully). Adds tests for symlink cycles, valid directory symlinks, and broken symlink handling.
The unmarked section detection regex /\n## [^#]/ incorrectly matched ### subsection headers, causing content after CodeGraph subsections to be overwritten. Changed to /\n## (?!#)/ (negative lookahead) so only h2-level headers terminate the section boundary. Adds tests for sections with ### subsections and for standard section replacement.
Semantic search uses @xenova/transformers which downloads a ~100MB embedding model on first use. This is now documented in the tagline footnote and the 100% Local feature description.
This option was documented but never implemented in the config schema. Git hooks are controlled via the `codegraph init --no-hooks` CLI flag and `codegraph hooks install/remove` commands instead.
Files exceeding maxFileSize (default 1MB) are now reported in the CLI output after indexing, with guidance to adjust the config. Added documentation in the README troubleshooting section and config table.
Previously, the post-commit hook redirected all stderr to /dev/null, making sync failures invisible. Now errors are appended to .codegraph/sync.log. The 'codegraph hooks status' command also shows recent errors from this log file.
Tests cover absolute and relative gitdir paths, regular repos, missing .git, invalid .git file content, and hook installation from a worktree targeting the main repo's hooks directory.
Explains how the headline token/tool-call numbers were measured and how to run the precision/recall evaluation suite against the included TypeScript and Python fixture projects.
Upgrades vitest from ^2.0.0 to ^4.0.18, which resolves all 5 moderate severity vulnerabilities in the esbuild/vite dependency chain (GHSA-67mh-4wv8-2f99). npm audit now reports 0 vulnerabilities.
These files were added prematurely and contain inaccuracies (e.g., referencing non-existent gitHooksEnabled config option, undocumented CLI flags). Removing until they can be properly aligned with the actual codebase.
|
Hi Martin, this is truly outstanding security work! I spent some time on this project yesterday and have made several additions and improvements. We made it use multiple workers, so it has sped up by quite a bit. We've just merged your PR into our fork, and with some effort have gotten things aligned. Because we had added and changed so much, it was a little tricky to get everything merged. We have reached out to @colbymchenry but haven't heard back from him yet, so we don't know if he will want to fold in our changes into his main branch. I would love for you to take a look at https://github.com/rickross/codegraph and check that the key elements of your PR are correctly merged. I hope you will, and you can reach me directly at rickross@gmail.com if you wish. |
|
That is very cool, I will definitely take a deeper look on monday, but first weekend is calling 😊 @rickross Did you encounter any issues or why did you remove those changes? |
…idation, and logging - Fix console.log in vectors/search.ts that corrupts MCP JSON-RPC stdout stream - Add shared isPathWithinRoot() utility and apply it to indexFile, ResolutionContext, and context builder to prevent path traversal via ../ sequences - Add symlink boundary check in scanDirectory to block symlinks escaping project root - Reject import paths that resolve outside project root in import-resolver - Validate ensureSubdirectory name against .. and path separators - Use url.fileURLToPath() for proper MCP URI parsing instead of naive regex - Bound findPath BFS to 10,000 visited nodes; change default maxDepth from Infinity to 20 - Replace hand-rolled regex with picomatch in findByQualifiedName to eliminate ReDoS - Harden FTS5 sanitization to strip :, ^ and filter AND/OR/NOT/NEAR operators - Use safeJsonParse consistently for unresolved_refs.candidates column - Validate MCP kind parameter against NodeKind enum server-side - Validate config with validateConfig() after Object.assign in init/initSync/updateConfig
Hi Martin, I'm not sure which changes you mean? I spent quite a while on enhancing codegraph this weekend so my set of AI agents could get enhanced value out of using it on our several open codebases. The result has now diverged somewhat from @colbymchenry original version, and it is quite a bit faster and has begun producing better resolutions. I will likely continue to enhance this as I find the time. I'm still hoping to hear from @colbymchenry but now will just forge ahead, I guess. |
…locking Implements security improvements inspired by PR #16 (credit: MO2k4): - Add validatePathWithinRoot() to prevent path traversal attacks in extraction and context building - Clamp MCP tool inputs (limit, depth, maxDepth) to sane ranges - Use atomic writes (temp file + rename) for config saves - Add symlink cycle detection in directory scanning to prevent infinite loops - Replace all JSON.parse calls in db/queries.ts with safeJsonParse fallbacks to handle corrupted database metadata gracefully - Add cross-process FileLock for DB write operations (indexAll, indexFiles, sync) to prevent concurrent writes from CLI, MCP server, and git hooks - Remove unused path import from context/index.ts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Hey @colbymchenry — first off, thank you for building CodeGraph. The idea of a local-first semantic code graph that plugs straight into Claude Code is genuinely useful, and the tree-sitter + SQLite foundation is solid. I've been using it on a few projects and wanted to give back, so I spent some time doing a thorough security and correctness audit. Here's what came out of it.
Summary
This PR is the result of a deep security and correctness audit of the CodeGraph codebase. It addresses 19 issues, adds comprehensive test coverage for all fixes and resolves all
npm auditvulnerabilities.Key stats:
security.test.ts(33 tests),installer.test.ts(8 tests), worktree tests insync.test.ts(6 tests)Changes
Fix git worktree support for hooks (
src/sync/git-hooks.ts)The
GitHooksManagerconstructor hard-coded.gitas a directory, which broke in git worktrees where.gitis a file containing agitdir:pointer. Hooks would fail to install or detect in worktree checkouts.Fix: Added
resolveGitDir()that reads the.gitfile, follows thegitdir:pointer (supporting both absolute and relative paths), and navigates up fromworktrees/<name>to the main repo's.gitdirectory. Hooks are now correctly installed in the shared hooks directory.Cross-process file locking for database writes (
src/utils.ts,src/index.ts)The in-memory
indexMutexonly protected a single process. Multiple concurrent processes (git hook background syncs, CLI commands, MCP server) could all write to the same SQLite database simultaneously, causingSQLITE_BUSYerrors or data corruption.Fix: Added a
FileLockclass that uses filesystem-level advisory locking viafs.writeFileSync(path, pid, { flag: 'wx' })for atomic lock creation. Stale locks are detected by checking if the owning PID is still alive. All write operations (indexAll,indexFiles,sync) are now wrapped withfileLock.withLockAsync(), and the lock is released inclose().Path traversal vulnerability in MCP file reads (
src/context/index.ts)path.join(this.projectRoot, node.filePath)did not prevent directory escape. A tampered database entry with../../etc/passwdinfile_pathwould cause reads outside the project root.Fix: Changed
path.jointopath.resolveand added validation that the resolved path starts with the project root directory. Blocked attempts are logged as warnings.Input validation on MCP tool parameters (
src/mcp/tools.ts)All MCP handlers cast
argswithout type or bounds checks. Unboundeddepthoncodegraph_impactcould exhaust memory. Non-stringqueryvalues could cause runtime errors.Fix: Added
validateString()andvalidateNumber()helper methods. All 6 tool handlers now validate input types and clamp bounds (limit: 1–100, depth: 1–10, maxNodes: 1–100). Invalid inputs return structured error results instead of crashing.Interactive confirmation before global npm install (
src/installer/index.ts)The installer ran
npm install -gsilently without user confirmation, which is a supply chain risk.Fix: Added an explicit user confirmation prompt before running the global install.
Pinned tree-sitter-liquid dependency (
package.json)tree-sitter-liquidtracked the default branch of a GitHub repo with no version pin. This is a native C add-on compiled during install — an unpinned dependency is a supply chain risk.Fix: Pinned to specific commit hash
d6ebde3974742cd1b61b55d1d94aab1dacb41056.Atomic writes for installer config files (
src/installer/config-writer.ts)fs.writeFileSync()on~/.claude.jsonand other config files could leave them partially written on crash.Fix: Added
atomicWriteFileSync()helper that writes to a temp file (.tmp.<pid>) then usesfs.renameSync()for an atomic swap. All config file writes now use this helper.Replace hand-rolled glob-to-regex with picomatch (
src/extraction/index.ts,src/config.ts)Custom glob-to-regex conversion via
new RegExp()was vulnerable to catastrophic backtracking (ReDoS) with crafted patterns in.codegraph/config.json.Fix: Replaced both
matchesGlob()in extraction andmatchesPattern()in config withpicomatch.isMatch(). Addedpicomatchas a production dependency.Fix silent error swallowing in installer (
src/installer/config-writer.ts)readJsonFile()returned{}for any failure — couldn't distinguish "file doesn't exist" from "file is corrupted". A corrupted~/.claude.jsonwould be silently discarded.Fix:
readJsonFile()now distinguishesENOENT(returns{}) from parse errors (warns the user and creates a.backupof the corrupted file before overwriting).Add error boundaries around JSON.parse on DB values (
src/db/queries.ts)JSON.parse()ondecorators,typeParameters, andmetadatacolumns had no per-field error handling. A single corrupted row would crash the entire process.Fix: Added
safeJsonParse()helper that returnsundefinedon failure and logs a warning with the row context for debugging. All 5 bareJSON.parse()calls in query result mapping now use this helper.Fix resource cleanup on error paths (
src/bin/codegraph.ts,src/mcp/index.ts)CLI commands didn't call
cg.close()on error. The MCP server'sstop()method could skip cleanup ifcg.close()threw.Fix: All CLI command handlers now use
let cg: CodeGraph | null = nullwithtry/catch/finallyblocks that callcg?.close()in thefinallypath. The MCP serverstop()wrapscg.close()in its own try/catch.Add symlink cycle detection in directory scanning (
src/extraction/index.ts)fs.readdirSync()followed symlinks without cycle detection. A symlink to a parent directory would cause an infinite loop.Fix: Added
visitedDirsSet tracking real paths viafs.realpathSync(). Already-visited directories are skipped. Symlinks to files are handled correctly (indexed if they match include patterns), and broken symlinks are logged and skipped.Fix CLAUDE.md section replacement regex (
src/installer/config-writer.ts)The regex
\n## [^#]used to find the next section header incorrectly matched### Subtopiclines, potentially overwriting unrelated content after a CodeGraph section with subsections.Fix: Changed to negative lookahead
\n## (?!#)which correctly matches## Headerbut not### Subsection. The installer now properly preserves content after the CodeGraph section even when nested headers are present.Qualify "100% local" claim in README
The
@xenova/transformerslibrary downloads a ~100MB embedding model from Hugging Face on first use. This is now documented with a footnote on the tagline and in the "100% Local" feature description.Remove non-existent
gitHooksEnabledconfig option from READMEThis option was listed in the README configuration table but was never implemented in the actual config schema (
src/types.ts). Git hooks are controlled viacodegraph init --no-hooksandcodegraph hooks install/removeCLI commands.Document and surface large file skip behavior
Files exceeding
maxFileSize(default 1MB) were silently skipped during indexing with no indication to the user. The CLI now reports how many files were skipped due to size limits after indexing, with guidance to adjustmaxFileSizein config. Added documentation in the README troubleshooting section.Improve git hook failure visibility (
src/sync/git-hooks.ts,src/bin/codegraph.ts)The post-commit hook redirected all stderr to
/dev/null, making sync failures completely invisible. Now errors are appended to.codegraph/sync.log. Thecodegraph hooks statuscommand also checks this log and shows recent errors.Add worktree test coverage (
__tests__/sync.test.ts)The
resolveGitDir()worktree path calculation had no test coverage. Added 6 tests covering: absolute and relativegitdirpaths, regular repos, missing.git, invalid.gitfile content, and hook installation from a worktree targeting the main repo's hooks directory.Document benchmark methodology (README)
Added a collapsible section explaining how the headline token/tool-call numbers were measured, and how to run the precision/recall evaluation suite against the included fixture projects.
Upgrade vitest to v4.0.18 (
package.json)Resolved all 5 moderate severity vulnerabilities in the esbuild/vite dependency chain (GHSA-67mh-4wv8-2f99).
npm auditnow reports 0 vulnerabilities.Test plan
npm auditreports 0 vulnerabilitiesnpm run buildcompiles cleanly with no TypeScript errorscodegraph init --indexon a sample project