Skip to content

Security hardening, correctness fixes, and documentation improvements#16

Closed
MO2k4 wants to merge 23 commits into
colbymchenry:mainfrom
MO2k4:security-hardening-and-docs-improvements
Closed

Security hardening, correctness fixes, and documentation improvements#16
MO2k4 wants to merge 23 commits into
colbymchenry:mainfrom
MO2k4:security-hardening-and-docs-improvements

Conversation

@MO2k4

@MO2k4 MO2k4 commented Feb 6, 2026

Copy link
Copy Markdown
Contributor

Hey @colbymchenry — first off, thank you for building CodeGraph. The idea of a local-first semantic code graph that plugs straight into Claude Code is genuinely useful, and the tree-sitter + SQLite foundation is solid. I've been using it on a few projects and wanted to give back, so I spent some time doing a thorough security and correctness audit. Here's what came out of it.

Summary

This PR is the result of a deep security and correctness audit of the CodeGraph codebase. It addresses 19 issues, adds comprehensive test coverage for all fixes and resolves all npm audit vulnerabilities.

Key stats:

  • 18 files changed, ~2,000 lines added, ~650 removed
  • 246 tests passing (up from 239), 0 npm audit vulnerabilities (down from 5)
  • 3 new test files: security.test.ts (33 tests), installer.test.ts (8 tests), worktree tests in sync.test.ts (6 tests)

Changes

Fix git worktree support for hooks (src/sync/git-hooks.ts)

The GitHooksManager constructor hard-coded .git as a directory, which broke in git worktrees where .git is a file containing a gitdir: pointer. Hooks would fail to install or detect in worktree checkouts.

Fix: Added resolveGitDir() that reads the .git file, follows the gitdir: pointer (supporting both absolute and relative paths), and navigates up from worktrees/<name> to the main repo's .git directory. Hooks are now correctly installed in the shared hooks directory.

Cross-process file locking for database writes (src/utils.ts, src/index.ts)

The in-memory indexMutex only protected a single process. Multiple concurrent processes (git hook background syncs, CLI commands, MCP server) could all write to the same SQLite database simultaneously, causing SQLITE_BUSY errors or data corruption.

Fix: Added a FileLock class that uses filesystem-level advisory locking via fs.writeFileSync(path, pid, { flag: 'wx' }) for atomic lock creation. Stale locks are detected by checking if the owning PID is still alive. All write operations (indexAll, indexFiles, sync) are now wrapped with fileLock.withLockAsync(), and the lock is released in close().

Path traversal vulnerability in MCP file reads (src/context/index.ts)

path.join(this.projectRoot, node.filePath) did not prevent directory escape. A tampered database entry with ../../etc/passwd in file_path would cause reads outside the project root.

Fix: Changed path.join to path.resolve and added validation that the resolved path starts with the project root directory. Blocked attempts are logged as warnings.

Input validation on MCP tool parameters (src/mcp/tools.ts)

All MCP handlers cast args without type or bounds checks. Unbounded depth on codegraph_impact could exhaust memory. Non-string query values could cause runtime errors.

Fix: Added validateString() and validateNumber() helper methods. All 6 tool handlers now validate input types and clamp bounds (limit: 1–100, depth: 1–10, maxNodes: 1–100). Invalid inputs return structured error results instead of crashing.

Interactive confirmation before global npm install (src/installer/index.ts)

The installer ran npm install -g silently without user confirmation, which is a supply chain risk.

Fix: Added an explicit user confirmation prompt before running the global install.

Pinned tree-sitter-liquid dependency (package.json)

tree-sitter-liquid tracked the default branch of a GitHub repo with no version pin. This is a native C add-on compiled during install — an unpinned dependency is a supply chain risk.

Fix: Pinned to specific commit hash d6ebde3974742cd1b61b55d1d94aab1dacb41056.

Atomic writes for installer config files (src/installer/config-writer.ts)

fs.writeFileSync() on ~/.claude.json and other config files could leave them partially written on crash.

Fix: Added atomicWriteFileSync() helper that writes to a temp file (.tmp.<pid>) then uses fs.renameSync() for an atomic swap. All config file writes now use this helper.

Replace hand-rolled glob-to-regex with picomatch (src/extraction/index.ts, src/config.ts)

Custom glob-to-regex conversion via new RegExp() was vulnerable to catastrophic backtracking (ReDoS) with crafted patterns in .codegraph/config.json.

Fix: Replaced both matchesGlob() in extraction and matchesPattern() in config with picomatch.isMatch(). Added picomatch as a production dependency.

Fix silent error swallowing in installer (src/installer/config-writer.ts)

readJsonFile() returned {} for any failure — couldn't distinguish "file doesn't exist" from "file is corrupted". A corrupted ~/.claude.json would be silently discarded.

Fix: readJsonFile() now distinguishes ENOENT (returns {}) from parse errors (warns the user and creates a .backup of the corrupted file before overwriting).

Add error boundaries around JSON.parse on DB values (src/db/queries.ts)

JSON.parse() on decorators, typeParameters, and metadata columns had no per-field error handling. A single corrupted row would crash the entire process.

Fix: Added safeJsonParse() helper that returns undefined on failure and logs a warning with the row context for debugging. All 5 bare JSON.parse() calls in query result mapping now use this helper.

Fix resource cleanup on error paths (src/bin/codegraph.ts, src/mcp/index.ts)

CLI commands didn't call cg.close() on error. The MCP server's stop() method could skip cleanup if cg.close() threw.

Fix: All CLI command handlers now use let cg: CodeGraph | null = null with try/catch/finally blocks that call cg?.close() in the finally path. The MCP server stop() wraps cg.close() in its own try/catch.

Add symlink cycle detection in directory scanning (src/extraction/index.ts)

fs.readdirSync() followed symlinks without cycle detection. A symlink to a parent directory would cause an infinite loop.

Fix: Added visitedDirs Set tracking real paths via fs.realpathSync(). Already-visited directories are skipped. Symlinks to files are handled correctly (indexed if they match include patterns), and broken symlinks are logged and skipped.

Fix CLAUDE.md section replacement regex (src/installer/config-writer.ts)

The regex \n## [^#] used to find the next section header incorrectly matched ### Subtopic lines, potentially overwriting unrelated content after a CodeGraph section with subsections.

Fix: Changed to negative lookahead \n## (?!#) which correctly matches ## Header but not ### Subsection. The installer now properly preserves content after the CodeGraph section even when nested headers are present.

Qualify "100% local" claim in README

The @xenova/transformers library downloads a ~100MB embedding model from Hugging Face on first use. This is now documented with a footnote on the tagline and in the "100% Local" feature description.

Remove non-existent gitHooksEnabled config option from README

This option was listed in the README configuration table but was never implemented in the actual config schema (src/types.ts). Git hooks are controlled via codegraph init --no-hooks and codegraph hooks install/remove CLI commands.

Document and surface large file skip behavior

Files exceeding maxFileSize (default 1MB) were silently skipped during indexing with no indication to the user. The CLI now reports how many files were skipped due to size limits after indexing, with guidance to adjust maxFileSize in config. Added documentation in the README troubleshooting section.

Improve git hook failure visibility (src/sync/git-hooks.ts, src/bin/codegraph.ts)

The post-commit hook redirected all stderr to /dev/null, making sync failures completely invisible. Now errors are appended to .codegraph/sync.log. The codegraph hooks status command also checks this log and shows recent errors.

Add worktree test coverage (__tests__/sync.test.ts)

The resolveGitDir() worktree path calculation had no test coverage. Added 6 tests covering: absolute and relative gitdir paths, regular repos, missing .git, invalid .git file content, and hook installation from a worktree targeting the main repo's hooks directory.

Document benchmark methodology (README)

Added a collapsible section explaining how the headline token/tool-call numbers were measured, and how to run the precision/recall evaluation suite against the included fixture projects.

Upgrade vitest to v4.0.18 (package.json)

Resolved all 5 moderate severity vulnerabilities in the esbuild/vite dependency chain (GHSA-67mh-4wv8-2f99). npm audit now reports 0 vulnerabilities.


Test plan

  • All 246 tests pass across 10 test files
  • npm audit reports 0 vulnerabilities
  • npm run build compiles cleanly with no TypeScript errors
  • Manual verification: run codegraph init --index on a sample project
  • Manual verification: run the MCP server with Claude Code and test tool calls
  • Manual verification: create a symlink cycle in a project directory and confirm indexing completes without hanging

MO2k4 added 22 commits February 6, 2026 07:43
Introduce FileLock class that uses a PID-tracked lock file
(.codegraph/codegraph.lock) to prevent concurrent database writes
from multiple processes (git hooks, CLI, MCP server).

The lock detects stale locks by checking if the owning PID is still
alive, and uses O_EXCL (wx flag) for atomic lock creation to prevent
race conditions.

Write operations (indexAll, indexFiles, sync) now acquire the file lock
before the in-process mutex. The lock is also released on close().
Validate that resolved file paths stay within the project root
before reading. Previously, a tampered database with "../" sequences
in file_path could cause reads outside the project directory via
the codegraph_node and codegraph_context MCP tools.

Now uses path.resolve() and verifies the result starts with the
project root path before proceeding with file reads.
All MCP tool handlers now validate parameter types and bounds before
processing. String parameters (query, symbol, task) are checked for
type and non-emptiness. Numeric parameters (limit, depth, maxNodes)
are validated with Number.isFinite and clamped to safe ranges
(limit: 1-100, depth: 1-10) to prevent resource exhaustion from
unbounded graph traversals.
The installer now prompts the user for confirmation before running
npm install -g. Previously, this ran silently which is a supply
chain risk. If the user declines, the installer skips the global
install and uses npx as a fallback.
Pin the GitHub dependency to commit d6ebde39 instead of tracking
the default branch. This prevents a compromised upstream repo from
injecting malicious code via the native C add-on compiled during
npm install.
Replace direct fs.writeFileSync calls with an atomicWriteFileSync
helper that writes to a temporary file then renames it into place.
This prevents config file corruption (claude.json, settings.json,
CLAUDE.md) if the process crashes mid-write, since rename is atomic
on POSIX systems.
Cover FileLock (acquire/release, stale lock detection, withLock,
error cleanup), path traversal prevention in ContextBuilder, MCP
input validation (type checks, empty strings, bounds clamping for
all tool handlers), and atomic write correctness.

22 new tests.
The custom glob-to-regex conversion using new RegExp() was vulnerable
to catastrophic backtracking (ReDoS) with crafted patterns. Replace
both instances (src/extraction/index.ts matchesGlob and src/config.ts
matchesPattern) with picomatch, a well-tested glob matching library.

Adds tests verifying standard glob patterns, complex brace expansion,
dot files, and a ReDoS pattern that completes in <100ms.
Distinguish between missing files (return {}) and corrupted files
(warn user, create .backup copy, then return {}). Previously all
errors were silently ignored, making it impossible to diagnose
corrupted claude.json or settings.json files.

Adds installer tests covering corruption handling, backup creation,
and preservation of existing config fields.
Wrap all JSON.parse calls on database column values (decorators,
typeParameters, metadata, errors) with safeJsonParse() that catches
parse errors and returns undefined instead of crashing the process.
Logs a warning with the column context for debugging.

Adds tests that inject malformed JSON into nodes, edges, and file
records and verify graceful degradation.
All CLI commands (init, index, sync, status, query, context, hooks)
now use try/finally blocks to ensure cg.close() is called even when
errors occur or process.exit() is invoked. Previously, error paths
leaked database connections.

MCP server stop() now wraps cg.close() in try/finally to ensure
transport cleanup happens even if database close fails.
Track visited real paths (via fs.realpathSync) during directory
traversal to prevent infinite loops from symlinks pointing to parent
directories. Properly handle symlinks to directories (follow once),
symlinks to files (index them), and broken symlinks (skip gracefully).

Adds tests for symlink cycles, valid directory symlinks, and broken
symlink handling.
The unmarked section detection regex /\n## [^#]/ incorrectly matched
### subsection headers, causing content after CodeGraph subsections
to be overwritten. Changed to /\n## (?!#)/ (negative lookahead) so
only h2-level headers terminate the section boundary.

Adds tests for sections with ### subsections and for standard
section replacement.
Semantic search uses @xenova/transformers which downloads a ~100MB
embedding model on first use. This is now documented in the tagline
footnote and the 100% Local feature description.
This option was documented but never implemented in the config schema.
Git hooks are controlled via the `codegraph init --no-hooks` CLI flag
and `codegraph hooks install/remove` commands instead.
Files exceeding maxFileSize (default 1MB) are now reported in the CLI
output after indexing, with guidance to adjust the config. Added
documentation in the README troubleshooting section and config table.
Previously, the post-commit hook redirected all stderr to /dev/null,
making sync failures invisible. Now errors are appended to
.codegraph/sync.log. The 'codegraph hooks status' command also shows
recent errors from this log file.
Tests cover absolute and relative gitdir paths, regular repos, missing
.git, invalid .git file content, and hook installation from a worktree
targeting the main repo's hooks directory.
Explains how the headline token/tool-call numbers were measured and
how to run the precision/recall evaluation suite against the included
TypeScript and Python fixture projects.
Upgrades vitest from ^2.0.0 to ^4.0.18, which resolves all 5 moderate
severity vulnerabilities in the esbuild/vite dependency chain
(GHSA-67mh-4wv8-2f99). npm audit now reports 0 vulnerabilities.
These files were added prematurely and contain inaccuracies
(e.g., referencing non-existent gitHooksEnabled config option,
undocumented CLI flags). Removing until they can be properly
aligned with the actual codebase.
@rickross

rickross commented Feb 6, 2026

Copy link
Copy Markdown

Hi Martin, this is truly outstanding security work! I spent some time on this project yesterday and have made several additions and improvements. We made it use multiple workers, so it has sped up by quite a bit.

We've just merged your PR into our fork, and with some effort have gotten things aligned. Because we had added and changed so much, it was a little tricky to get everything merged. We have reached out to @colbymchenry but haven't heard back from him yet, so we don't know if he will want to fold in our changes into his main branch.

I would love for you to take a look at https://github.com/rickross/codegraph and check that the key elements of your PR are correctly merged. I hope you will, and you can reach me directly at rickross@gmail.com if you wish.

@MO2k4

MO2k4 commented Feb 6, 2026

Copy link
Copy Markdown
Contributor Author

That is very cool, I will definitely take a deeper look on monday, but first weekend is calling 😊

@rickross Did you encounter any issues or why did you remove those changes?

…idation, and logging

- Fix console.log in vectors/search.ts that corrupts MCP JSON-RPC stdout stream
- Add shared isPathWithinRoot() utility and apply it to indexFile, ResolutionContext,
  and context builder to prevent path traversal via ../ sequences
- Add symlink boundary check in scanDirectory to block symlinks escaping project root
- Reject import paths that resolve outside project root in import-resolver
- Validate ensureSubdirectory name against .. and path separators
- Use url.fileURLToPath() for proper MCP URI parsing instead of naive regex
- Bound findPath BFS to 10,000 visited nodes; change default maxDepth from Infinity to 20
- Replace hand-rolled regex with picomatch in findByQualifiedName to eliminate ReDoS
- Harden FTS5 sanitization to strip :, ^ and filter AND/OR/NOT/NEAR operators
- Use safeJsonParse consistently for unresolved_refs.candidates column
- Validate MCP kind parameter against NodeKind enum server-side
- Validate config with validateConfig() after Object.assign in init/initSync/updateConfig
@rickross

rickross commented Feb 9, 2026

Copy link
Copy Markdown

That is very cool, I will definitely take a deeper look on monday, but first weekend is calling 😊

@rickross Did you encounter any issues or why did you remove those changes?

Hi Martin, I'm not sure which changes you mean? I spent quite a while on enhancing codegraph this weekend so my set of AI agents could get enhanced value out of using it on our several open codebases. The result has now diverged somewhat from @colbymchenry original version, and it is quite a bit faster and has begun producing better resolutions. I will likely continue to enhance this as I find the time. I'm still hoping to hear from @colbymchenry but now will just forge ahead, I guess.

colbymchenry added a commit that referenced this pull request Feb 10, 2026
…locking

Implements security improvements inspired by PR #16 (credit: MO2k4):

- Add validatePathWithinRoot() to prevent path traversal attacks in
  extraction and context building
- Clamp MCP tool inputs (limit, depth, maxDepth) to sane ranges
- Use atomic writes (temp file + rename) for config saves
- Add symlink cycle detection in directory scanning to prevent infinite loops
- Replace all JSON.parse calls in db/queries.ts with safeJsonParse fallbacks
  to handle corrupted database metadata gracefully
- Add cross-process FileLock for DB write operations (indexAll, indexFiles,
  sync) to prevent concurrent writes from CLI, MCP server, and git hooks
- Remove unused path import from context/index.ts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@colbymchenry

Copy link
Copy Markdown
Owner

@MO2k4 I made some significant updates to CodeGraph over the last few weeks. Adding support for Windows systems, Dart, and among other things. I setup another PR here: #19 giving you credit and mentioning your ideas. Thank you for the great ideas!

@colbymchenry

Copy link
Copy Markdown
Owner

Hey @MO2k4, thanks for this PR! We've incorporated your security hardening work (path traversal validation, input clamping, safe JSON parsing, file locking) into the codebase. It's been merged into main via PR #19, which also cherry-picked improvements from PR #15. Appreciate the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants