Skip to content

refactor(scripts): separate config from execution in benchmarking scripts#1235

Merged
carlos-alm merged 7 commits into
mainfrom
refactor/titan-decomposition-scripts
May 28, 2026
Merged

refactor(scripts): separate config from execution in benchmarking scripts#1235
carlos-alm merged 7 commits into
mainfrom
refactor/titan-decomposition-scripts

Conversation

@carlos-alm

Copy link
Copy Markdown
Contributor

Summary

  • Separates config from execution in scripts/token-benchmark.ts
  • Extracts scripts/lib/bench-config.ts helper module
  • Addresses the worst scripts/ offenders from gauntlet

Commits

  • 7f99ece: refactor(scripts): separate config from execution in benchmarking scripts

Context

Part of the Titan Paradigm cleanup pass (see .codegraph/titan/TITAN_REPORT.md). Merge order: this PR is #7 of 10 (mergeOrder position: 7).

Caveats

  • WASM grammars not available in dev worktree — CI will run full test matrix

Test plan

  • CI passes (lint, build, full test matrix)
  • Verify no new cycles introduced (codegraph stats)

@claude

claude Bot commented May 27, 2026

Copy link
Copy Markdown

Claude finished @carlos-alm's task —— View job


Code Review Analysis

Analyzing PR #1235: refactor(scripts): separate config from execution in benchmarking scripts

Todo List

  • Read changed files to understand the refactoring
  • Phase 1: Root Cause Analysis
  • Phase 2: Configuration Change Detection
  • Phase 3: Test Integrity Check
  • Phase 4: Code Quality Assessment
  • Phase 5: Backlog Compliance Evaluation
  • Provide structured review feedback and rating

Root Cause Analysis

Problem Being Solved: This PR extracts common benchmark configuration logic from token-benchmark.ts into a shared scripts/lib/bench-config.ts module. The refactoring addresses technical debt identified in the Titan Paradigm cleanup (PR #7 of 10).

Why This Approach: The approach is justified - token-benchmark.ts originally contained 583 lines of mixed config/execution logic. Separating concerns into a dedicated config module allows reuse across the 4 benchmark scripts mentioned in comments. This follows good separation of concerns principles.

Risk Assessment: Low risk. This is a pure refactoring with no behavioral changes. The extracted config logic maintains identical functionality while improving code organization and enabling reuse.

Backlog Compliance

  • Zero-dep: ✓ — No new runtime dependencies added
  • Foundation-aligned: ✓ — Pure refactoring that improves code organization without violating any principles
  • Problem-fit: 3 — Useful but doesn't directly address the core "lost AI" problem. This is internal cleanup
  • Breaking: No — Pure refactoring with no external API changes
  • Tier: 1

Code Quality Assessment

✅ Strengths

  1. Clean separation of concerns: Configuration logic properly extracted from execution logic
  2. Security-conscious: Strong input validation with PKG_NAME_RE and PKG_VERSION_RE patterns to prevent shell injection
  3. Error handling: Robust error handling with retries, exponential backoff, and proper cleanup
  4. Documentation: Well-documented functions with clear JSDoc comments
  5. Type safety: Full TypeScript with proper type annotations
  6. Consistent patterns: Follows established codebase patterns for CLI argument parsing

🔍 Technical Details

  • Line reduction: Net reduction of 22 lines (285 deletions vs 263 additions) shows effective consolidation
  • Reusability: The extracted bench-config.ts can be used by the other 3 benchmark scripts
  • Shell safety: Proper handling of Windows npm.cmd vs Unix npm with NPM_SHELL constant
  • Resource management: Proper cleanup functions for temporary directories in npm mode

⚠️ Minor Observations

  1. Pattern validation: The regex patterns PKG_NAME_RE and PKG_VERSION_RE are conservative, which is good for security
  2. Error messages: Clear and actionable error messages throughout
  3. Platform handling: Proper Windows/Unix compatibility for npm commands

Configuration Change Detection

No concerning configuration changes detected. This is pure code refactoring without any relaxed linting rules, disabled checks, or weakened validation.

Test Integrity

No test modifications or integrity concerns. The refactoring maintains existing behavior and doesn't modify any test files or expectations.

Critical Concerns

None identified. This is a well-executed refactoring that improves code organization without introducing risks.

Final Recommendation

  • Rating: ⭐⭐⭐⭐ (4/5)
  • Action: APPROVE
  • Reasoning: This is a solid refactoring that reduces technical debt, follows security best practices, and enables code reuse across benchmark scripts. The separation of concerns is clean, error handling is robust, and there are no behavioral changes. Minor deduction only because this is internal cleanup rather than user-facing functionality.

@greptile-apps

greptile-apps Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR refactors scripts/token-benchmark.ts and scripts/lib/bench-config.ts by extracting the large resolveBenchmarkSource function into focused helpers and splitting the monolithic main loop into dedicated composable functions.

  • bench-config.ts: Decomposed into repoRoot, resolveBenchmarkSourceLocal, npmInstallWithRetries, installCodegraphPackage, detectNativePlatformKey, installNativePackage, and installTransformers; shared retry logic eliminates duplication between codegraph and native package installs.
  • token-benchmark.ts: The ~180-line main loop is replaced by runSessionsForMode, medianForRuns, computeSavings, runIssueExperiment, and computeAggregate, each with a single clear responsibility.

Confidence Score: 5/5

Safe to merge — this is a pure refactoring that preserves all existing behavior with no logic changes introduced.

The refactoring extracts helper functions from two large scripts without altering any control flow, error handling, or output format. Retry logic, cleanup paths, and temp-dir lifecycle are all preserved one-to-one from the original. The only finding is an unused safeVersion field in the installCodegraphPackage return type, which has no runtime impact.

No files require special attention.

Important Files Changed

Filename Overview
scripts/lib/bench-config.ts Decomposed resolveBenchmarkSource into focused helper functions; retry logic unified via npmInstallWithRetries; behavior preserved from original.
scripts/token-benchmark.ts Extracted runSessionsForMode, medianForRuns, computeSavings, runIssueExperiment, and computeAggregate from main; logic is equivalent to the original inline code.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[resolveBenchmarkSource] --> B{--npm flag?}
    B -- No --> C[resolveBenchmarkSourceLocal]
    C --> C1[repoRoot / read package.json / return srcDir + noop cleanup]
    B -- Yes --> D[installCodegraphPackage]
    D --> D1[assertSafePkgVersion / mkdtemp / write package.json]
    D1 --> D2[npmInstallWithRetries]
    D2 -->|success| D3[read installed package.json / return tmpDir pkgDir installedPkg]
    D2 -->|all retries failed| D4[rmSync tmpDir / throw Error]
    D3 --> E[installNativePackage]
    E --> E1[detectNativePlatformKey]
    E1 --> E2[npmInstallWithRetries native pkg]
    E2 -->|error| E3[log warning / swallow]
    D3 --> F[installTransformers]
    F --> F1[execFileSync hf install]
    F1 -->|error| F2[log warning / swallow]
    E --> G[prefer dist/ over src/ / return version srcDir cleanup]
    F --> G
Loading

Fix All in Claude Code

Reviews (6): Last reviewed commit: "Merge branch 'main' into refactor/titan-..." | Re-trigger Greptile

Comment thread scripts/lib/bench-config.ts Outdated
fs.writeFileSync(path.join(tmpDir, 'package.json'), JSON.stringify({ private: true }));

try {
await npmInstallWithRetries(`@optave/codegraph@${safeVersion}`, tmpDir, 5, 'Attempt');

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Redundant "Attempt attempt N" log message

npmInstallWithRetries formats its retry log as ${label} attempt ${attempt}. When called with label = 'Attempt' the output becomes " Attempt attempt 1 failed, retrying in 15s..." — doubling the word. The original inline code just printed " Attempt 1 failed...". The native-package call uses 'Native install', which composes cleanly (" Native install attempt 1 failed..."), so using a noun phrase like 'Codegraph install' (or an empty label with a different template) would match the original intent for this call.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — extracted maxRetries = 5 as a named constant and changed the label from 'Attempt' to 'Codegraph install' so the log reads "Codegraph install attempt 1 failed..." instead of "Attempt attempt 1 failed...".

Comment on lines +193 to +198
try {
await npmInstallWithRetries(`@optave/codegraph@${safeVersion}`, tmpDir, 5, 'Attempt');
} catch (err) {
fs.rmSync(tmpDir, { recursive: true, force: true });
throw new Error(`Failed to install @optave/codegraph@${safeVersion} after 5 attempts: ${(err as Error).message}`);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hardcoded retry count in error message diverges from the call site

The error string says "after 5 attempts" but the actual limit is supplied as a parameter to npmInstallWithRetries. If someone changes the call site (e.g., to 3 retries for faster CI), the error message will silently lie. Capture the count from a named constant so the call and the message stay in sync.

Suggested change
try {
await npmInstallWithRetries(`@optave/codegraph@${safeVersion}`, tmpDir, 5, 'Attempt');
} catch (err) {
fs.rmSync(tmpDir, { recursive: true, force: true });
throw new Error(`Failed to install @optave/codegraph@${safeVersion} after 5 attempts: ${(err as Error).message}`);
}
const maxRetries = 5;
try {
await npmInstallWithRetries(`@optave/codegraph@${safeVersion}`, tmpDir, maxRetries, 'Codegraph install');
} catch (err) {
fs.rmSync(tmpDir, { recursive: true, force: true });
throw new Error(`Failed to install @optave/codegraph@${safeVersion} after ${maxRetries} attempts: ${(err as Error).message}`);
}

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — extracted const maxRetries = 5 at the call site so the error message uses ${maxRetries} rather than a hardcoded literal. Both issues addressed in the same commit.

@github-actions

github-actions Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

Codegraph Impact Analysis

18 functions changed48 callers affected across 13 files

  • repoRoot in scripts/lib/bench-config.ts:129 (13 transitive callers)
  • resolveBenchmarkSourceLocal in scripts/lib/bench-config.ts:138 (25 transitive callers)
  • cleanup in scripts/lib/bench-config.ts:152 (4 transitive callers)
  • npmInstallWithRetries in scripts/lib/bench-config.ts:160 (13 transitive callers)
  • installCodegraphPackage in scripts/lib/bench-config.ts:186 (25 transitive callers)
  • detectNativePlatformKey in scripts/lib/bench-config.ts:207 (12 transitive callers)
  • installNativePackage in scripts/lib/bench-config.ts:227 (25 transitive callers)
  • installTransformers in scripts/lib/bench-config.ts:261 (25 transitive callers)
  • resolveBenchmarkSource in scripts/lib/bench-config.ts:289 (27 transitive callers)
  • runSessionsForMode in scripts/token-benchmark.ts:411 (3 transitive callers)
  • medianForRuns in scripts/token-benchmark.ts:432 (3 transitive callers)
  • medianOf in scripts/token-benchmark.ts:435 (3 transitive callers)
  • computeSavings in scripts/token-benchmark.ts:449 (3 transitive callers)
  • runIssueExperiment in scripts/token-benchmark.ts:468 (2 transitive callers)
  • computeAggregate in scripts/token-benchmark.ts:501 (2 transitive callers)
  • sum in scripts/token-benchmark.ts:507 (3 transitive callers)
  • pct in scripts/token-benchmark.ts:512 (14 transitive callers)
  • main in scripts/token-benchmark.ts:530 (1 transitive callers)

@carlos-alm

Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm

Copy link
Copy Markdown
Contributor Author

Fixed the remaining inconsistency flagged in the summary: extracted const maxRetries = 5 in installNativePackage so it matches the pattern already used in installCodegraphPackage.

@greptileai

@carlos-alm carlos-alm merged commit 35e1dfb into main May 28, 2026
21 checks passed
@carlos-alm carlos-alm deleted the refactor/titan-decomposition-scripts branch May 28, 2026 03:06
@github-actions github-actions Bot locked and limited conversation to collaborators May 28, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant