test: add Hypothesis fuzz coverage and macos-latest CI matrix#75
test: add Hypothesis fuzz coverage and macos-latest CI matrix#75clean6378-max-it wants to merge 6 commits into
Conversation
Add tests/test_parser_fuzz.py with strategies for malformed/truncated JSONL, unknown record types, missing/extra fields, deep nesting, long lines, empty lines, and null bytes. Harden parse_session against adversarial inputs found by fuzz (non-dict JSON values, unhashable type keys, non-str metadata fields). Extend CI matrix to macos-latest for pytest, integration-tests, js-tests, and lint-and-audit. Document Ubuntu + Windows + macOS CI in CONTRIBUTING.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR hardens the JSONL parser against malformed inputs with defensive type coercion and validation, adds comprehensive Hypothesis fuzz testing to verify crash resilience, extends CI matrix coverage to macOS, and updates deployment tooling for cross-platform Node compatibility. ChangesParser Robustness and Cross-Platform CI
Sequence Diagram(s)sequenceDiagram
participant GHA as GitHub Actions
participant Matrix as os matrix<br/>(ubuntu/windows/macos)
participant NPMInstall as npm ci
participant RollupDetect as Rollup platform<br/>detection
participant RollupInstall as native package<br/>install
GHA->>Matrix: expand to macos-latest
Matrix->>NPMInstall: run on each platform
NPMInstall->>RollupDetect: invoke post-install step
RollupDetect->>RollupInstall: select matching native binary
RollupInstall-->>Matrix: complete with platform binary
sequenceDiagram
participant Hypothesis as Hypothesis engine
participant Strategy as fuzz strategies
participant TempFile as temp JSONL
participant Parser as parse_session
participant Result as SessionDict
Hypothesis->>Strategy: generate adversarial inputs
Strategy->>TempFile: write malformed/fuzzed JSONL
TempFile->>Parser: invoke with temp path
Parser->>Result: return without unhandled exception
Result-->>Hypothesis: validate session structure
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
requirements-dev.txt (1)
8-8: Update: Hypothesis>=6.100.0 is compatible with current releases and shows no known advisories
- PyPI latest Hypothesis is 6.155.2; the
hypothesis>=6.100.0lower bound will allow installing current versions unless pinned elsewhere.- No security advisories were returned for Hypothesis from the queried GitHub security feed (PIP ecosystem).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@requirements-dev.txt` at line 8, Update the dev requirements to allow modern Hypothesis releases by ensuring the dependency line for hypothesis uses the relaxed lower bound "hypothesis>=6.100.0" (package name: hypothesis) so current PyPI versions (e.g., 6.155.2) can be installed; modify requirements-dev.txt to replace any stricter pin for hypothesis with this >=6.100.0 bound and commit the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/test_parser_fuzz.py`:
- Around line 22-23: The helper _fuzz_jsonl_path uses tempfile.mkdtemp() which
leaks directories; replace it by using pytest's tmp_path fixture: remove or
change _fuzz_jsonl_path to accept a Path parameter (e.g., tmp_path) and return
tmp_path / name, then update each test that currently calls _fuzz_jsonl_path to
accept tmp_path in its signature and call the helper with tmp_path (or inline
use tmp_path / name) so temporary directories are managed and cleaned up by
pytest.
---
Nitpick comments:
In `@requirements-dev.txt`:
- Line 8: Update the dev requirements to allow modern Hypothesis releases by
ensuring the dependency line for hypothesis uses the relaxed lower bound
"hypothesis>=6.100.0" (package name: hypothesis) so current PyPI versions (e.g.,
6.155.2) can be installed; modify requirements-dev.txt to replace any stricter
pin for hypothesis with this >=6.100.0 bound and commit the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ce05ebaf-529d-411f-b105-1071ec83a6c6
📒 Files selected for processing (7)
.github/workflows/ci.ymlCONTRIBUTING.mdpyproject.tomlrequirements-dev.txttests/conftest.pytests/test_parser_fuzz.pyutils/jsonl_parser.py
Install platform Rollup binary after npm ci; drop Linux-only optionalDep. Use pytest tmp_path in test_parser_fuzz.py instead of leaking mkdtemp dirs.
|
@CodeRabbit. please review this PR in more depth |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -e
echo "=== tests/test_parser_fuzz.py ==="
cat tests/test_parser_fuzz.py
echo ""
echo "=== utils/jsonl_parser.py ==="
cat utils/jsonl_parser.py
echo ""
echo "=== tests/conftest.py ==="
cat tests/conftest.py
echo ""
echo "=== .github/workflows/ci.yml ==="
cat .github/workflows/ci.yml
echo ""
echo "=== pyproject.toml ==="
cat pyproject.toml
echo ""
echo "=== requirements-dev.txt ==="
cat requirements-dev.txt
echo ""
echo "=== package.json ==="
cat package.json
echo ""
echo "=== CONTRIBUTING.md ==="
cat CONTRIBUTING.mdLength of output: 40248
Deep Review: PR
|
| Source | max_examples |
deadline |
Effective in CI? |
|---|---|---|---|
pyproject.toml [tool.hypothesis] |
200 | default | ❌ overridden |
conftest.py CI profile (max_examples=100, deadline=None) |
100 | None | ❌ overridden by decorator |
FUZZ_SETTINGS decorator on every test |
200 | 5000 ms | ✅ wins |
The conftest.py CI profile is completely overridden by @FUZZ_SETTINGS applied to every test. Each fuzz test will always run 200 examples, even on CI. The effort put into conftest.py and the CI-profile setup is wasted.
Fix: Either remove @FUZZ_SETTINGS from tests and rely on the profile + pyproject.toml, or use settings(parent=settings.get_profile("ci"), ...) in CI-aware settings, or don't apply deadline / max_examples in the decorator and let them come from the profile:
# Option A: remove the decorator entirely and rely on conftest.py profile
`@given`(st.lists(st.text(min_size=0, max_size=500), min_size=0, max_size=30))
def test_raw_line_soup_does_not_crash(tmp_path: Path, lines: list[str]) -> None: ...
# Option B: only suppress health checks (not override counts/deadlines)
FUZZ_SETTINGS = settings(suppress_health_check=[HealthCheck.function_scoped_fixture])2. Token arithmetic TypeError — unhandled exception under fuzz
In utils/jsonl_parser.py, _process_assistant uses or 0 for token counting:
metadata["total_input_tokens"] += usage.get("input_tokens") or 0If usage["input_tokens"] is a truthy non-numeric value (e.g., the string "five", or a non-empty list), then:
"five" or 0 → "five"
int += "five" → TypeError: unsupported operand type(s) for +=: 'int' and 'str'
This is reachable via test_structured_entries_with_fuzzed_fields: the strategy can produce msg["usage"] = {"input_tokens": "non-empty-string"} (via st.dictionaries(st.text(), _json_value)). The same issue affects output_tokens, cache_read_input_tokens, cache_creation_input_tokens, and the two ephemeral cache fields in cache_creation.
Fix — coerce with int():
def _safe_int(val: Any) -> int:
try:
return int(val) if isinstance(val, (int, float)) else 0
except (TypeError, ValueError):
return 0
metadata["total_input_tokens"] += _safe_int(usage.get("input_tokens"))
metadata["total_output_tokens"] += _safe_int(usage.get("output_tokens"))
# ... and the four other token fields🟡 Medium
3. ALLOWED_EXCEPTIONS — dead code branch
ALLOWED_EXCEPTIONS: tuple[type[BaseException], ...] = ()
def _parse_file_without_crash(path: str) -> None:
try:
parse_session(path)
except Exception as exc:
if ALLOWED_EXCEPTIONS and isinstance(exc, ALLOWED_EXCEPTIONS): # always False
return
raise AssertionError(...) from excif ALLOWED_EXCEPTIONS on an empty tuple short-circuits to False immediately — the inner branch is never reachable. Either populate the tuple with actually-allowed exceptions or remove the conditional entirely to simplify the helper.
4. sys.path.insert anti-pattern in the test file
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))No other test file in the project does this (pytest resolves the path via testpaths = ["tests"] in pyproject.toml). This can shadow installed packages and makes the intent unclear. If conftest.py already sets up the path correctly for all other tests, this line is unnecessary and should be removed.
5. test_unknown_record_type_is_graceful — imprecise assertion
assert len(session["messages"]) >= 1The JSONL has exactly 2 lines. The "totally-new-claude-record" type falls through all elif branches in parse_session and produces no message entry; only the "user" entry generates one. The count should be exactly 1. >= 1 would pass even if, say, the unknown-type path accidentally started inserting extra messages.
# More precise:
assert len(session["messages"]) == 16. CI profile deadline=None vs decorator deadline=5000 inconsistency
Even if the CI profile were effective, it sets deadline=None (disable per-example timeout) while FUZZ_SETTINGS sets deadline=5000 ms. The intent appears contradictory — CI should be less strict on timing, not more.
🟢 Low / Suggestions
7. _fuzz_jsonl_path is a trivial one-liner
def _fuzz_jsonl_path(tmp_path: Path, name: str) -> Path:
return tmp_path / nameThis adds no logic. Inline it (or keep it only if you plan to add validation logic later).
8. test_truncated_json_line — truncation logic is off-by-meaning
half = json.dumps(prefix)[: max(1, len(prefix) // 2)]This truncates the JSON-encoded representation of prefix. For short or empty prefix, len(prefix) // 2 = 0 → max(1, 0) = 1, yielding just " (one double-quote). While this does exercise malformed input, it doesn't closely model real-world truncated writes (partial JSON objects mid-key or mid-value). A closer model would be directly slicing a serialized JSON object:
full_line = json.dumps({"type": "user", "message": {"content": prefix}})
truncated = full_line[: max(1, len(full_line) // 2)]9. Rollup case fallback exits with exit 1 for unknown platforms
If a linux-arm64 or other non-listed runner is ever added to the matrix, the CI job will fail at the Rollup installation step rather than continuing. Consider a softer fallback:
*) echo "Warning: no pre-built Rollup binary for $(node -p "process.platform + '-' + process.arch"), skipping."; exit 0 ;;10. Three coverage artifacts without a merge step
integration-tests now uploads coverage-report-ubuntu-latest, coverage-report-windows-latest, and coverage-report-macos-latest. If a Codecov upload or coverage comparison is added later, merging them will require extra steps. Consider whether macOS/Windows coverage needs to be uploaded at all (the Ubuntu artifact is sufficient for line-coverage reporting).
Summary Table
| # | Severity | File | Issue |
|---|---|---|---|
| 1 | 🔴 High | conftest.py, test_parser_fuzz.py, pyproject.toml |
Per-test @FUZZ_SETTINGS overrides CI profile — max_examples=200 always runs in CI |
| 2 | 🔴 High | utils/jsonl_parser.py |
Token += usage.get(key) or 0 crashes on truthy non-int values (e.g., strings from fuzz) |
| 3 | 🟡 Medium | tests/test_parser_fuzz.py |
ALLOWED_EXCEPTIONS = () makes inner exception branch dead code |
| 4 | 🟡 Medium | tests/test_parser_fuzz.py |
sys.path.insert anti-pattern — unnecessary in this project |
| 5 | 🟡 Medium | tests/test_parser_fuzz.py |
>= 1 assertion too loose in test_unknown_record_type_is_graceful |
| 6 | 🟡 Medium | conftest.py, test_parser_fuzz.py |
CI profile deadline=None vs. decorator deadline=5000 are contradictory |
| 7 | 🟢 Low | tests/test_parser_fuzz.py |
_fuzz_jsonl_path is a trivial wrapper — inline it |
| 8 | 🟢 Low | tests/test_parser_fuzz.py |
Truncation in test_truncated_json_line cuts JSON-encoded bytes, not a JSON object |
| 9 | 🟢 Low | .github/workflows/ci.yml |
Rollup case exits 1 for unknown platforms instead of graceful skip |
| 10 | 🟢 Low | .github/workflows/ci.yml |
3 OS coverage artifacts uploaded with no consumer to merge them |
The hardening changes in utils/jsonl_parser.py (skipping non-dict lines, stringifying type keys, guarding model/tier/stop-reason to non-empty strings) are solid improvements. Issue #2 above is the most important remaining gap — it's exactly the class of crash the fuzz suite is meant to prevent, and it's reachable with the existing strategies.
|
CR #4 (sys.path anti-pattern): factually wrong — tests/test_jsonl_parser.py:8 does the identical sys.path.insert, so it's the existing project convention. Removing it risks import breakage. |
Add _safe_int() so non-numeric usage fields coerce to 0 instead of raising TypeError during token accumulation; apply to all token fields. Drop max_examples/deadline from the test decorator so the conftest ci/dev profile actually governs fuzz runtime. Inline _fuzz_jsonl_path, tighten truncation model and unknown-type assertion, simplify ALLOWED_EXCEPTIONS, and note macOS in the CONTRIBUTING PR checklist.
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Verify the user's claim about sys.path.insert in test_jsonl_parser.py
head -15 tests/test_jsonl_parser.py
echo ""
echo "=== Check pre-existing coverage artifact naming in ci.yml ==="
grep -n "coverage-report" .github/workflows/ci.ymlLength of output: 752
Apologies for the noise on those three. 🐇 (´• ω •`) ✏️ Learnings added
|
Guard _safe_int against NaN/Infinity (json.loads accepts these literals; int(nan)/int(inf) raise), broaden fuzz floats to cover them, and add an explicit non-finite regression test. Revert entry_counts to truthiness so empty-string types stay skipped, wrap service_tier in the message payload, and drop the no-op [tool.hypothesis] block from pyproject.toml.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@utils/jsonl_parser.py`:
- Around line 50-53: The helper _safe_int currently returns negative ints or
converts floats to ints without clamping, so update _safe_int to ensure any
returned numeric value is non-negative by clamping negatives to 0 (for
isinstance(val, int) and after converting floats with math.isfinite). Locate
_safe_int in utils/jsonl_parser.py and change both the int branch and the float
branch to return max(0, int(value)) (or 0 for non-finite floats); also search
the file for the same int/float handling patterns (the similar blocks noted in
the review) and apply the same non-negative clamping to those functions/branches
so token/accounting counters can never go below zero.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a9cae128-7856-4b1b-a828-9c9efb71d22b
📒 Files selected for processing (2)
tests/test_parser_fuzz.pyutils/jsonl_parser.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_parser_fuzz.py
Negative usage token values from adversarial JSONL must not reduce session metadata totals; apply max(0, ...) in _safe_int and add regression test.
|
@CodeRabbit. do you think my last commits since @timon0305 's comments addressed all? Currently they are not resolved yet so I am asking you to check thosee again and if they are addressed ,I will resolve t hose off clearly. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
echo "=== utils/jsonl_parser.py (relevant sections) ==="
grep -n "_safe_int\|or 0\|input_tokens\|output_tokens\|cache_read\|cache_creation" utils/jsonl_parser.py
echo ""
echo "=== tests/test_parser_fuzz.py (full) ==="
cat tests/test_parser_fuzz.py
echo ""
echo "=== tests/conftest.py ==="
cat tests/conftest.py
echo ""
echo "=== pyproject.toml ==="
cat pyproject.tomlLength of output: 16679 🏁 Script executed: #!/bin/bash
echo "=== _safe_int implementation ==="
sed -n '40,60p' utils/jsonl_parser.pyLength of output: 901
Status of open review items
One remaining minor note (issue
|
closes #71
Summary
parse_session(malformed/truncated JSONL, unknown types, adversarial fields)utils/jsonl_parser.pyso fuzzed inputs do not raise unhandled exceptionsmacos-latest(pytest, integration, vitest, lint-and-audit)Sprint items
Closes Chen W2 #4, #6 (single PR)
Test plan
pytest tests/test_parser_fuzz.py -vpytest -qmypy -p api -p utils -p modelsruff check .npm testmacos-latestmatrix legs green for pytest, integration-tests, js-tests, lint-and-auditSummary by CodeRabbit
Bug Fixes
Tests
Chores