refactor(models): narrow Any-typed parse boundary with TypedDict unions#72
Conversation
Replace NotRequired[Any] on MessageDict.tool_result and data with per-tool and per-record unions, type guards, and dispatch-aligned narrowing so mypy strict constrains schema drift without weakening tests.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds TypedDict unions and TypeGuard predicates for tool results and record data, narrows MessageDict fields to those unions, and updates tool dispatch, JSONL parsing, and markdown rendering to use and validate the new typed contracts. ChangesType-narrowing refactor from Any to structured unions
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
models/__init__.py (1)
3-16:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winFix import sorting to pass ruff check.
The import block is flagged as unsorted by the pipeline. The imports should be in alphabetical order by module name.
📦 Proposed fix
from models.errors import ErrorResponse from models.export import ExportStateDict from models.project import ProjectDict, ProjectSessionRowDict, SessionListItemDict -from models.search import SearchHitDict from models.record_data import RecordDataUnion +from models.search import SearchHitDict from models.session import ( MessageDict, QuickSessionInfoDict, SessionDict, SessionMetadataDict, ToolUseDict, ) -from models.tool_results import ToolResultUnion from models.stats import FilesTouchedDict, SessionStatsDict +from models.tool_results import ToolResultUnion🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@models/__init__.py` around lines 3 - 16, The import block in models/__init__.py is unsorted causing Ruff to fail; reorder the import lines alphabetically by module name so modules appear as: errors (ErrorResponse), export (ExportStateDict), project (ProjectDict, ProjectSessionRowDict, SessionListItemDict), record_data (RecordDataUnion), search (SearchHitDict), session (MessageDict, QuickSessionInfoDict, SessionDict, SessionMetadataDict, ToolUseDict), stats (FilesTouchedDict, SessionStatsDict), tool_results (ToolResultUnion); update the file to reflect that alphabetical module ordering while keeping the same imported symbols.Source: Pipeline failures
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@models/record_data.py`:
- Line 3: The import line in record_data.py is unsorted and fails ruff; reorder
the imports alphabetically (by module/member name) so the from typing import ...
line lists members in sorted order (e.g., ensure Literal, TypeGuard, TypedDict
are alphabetized) or split into multiple lines per project style so ruff
import-sorting passes; update the import of Literal, TypeGuard, TypedDict
accordingly in the file.
In `@models/tool_results.py`:
- Line 7: The import block in models/tool_results.py is unsorted; reorder the
imported names in the from typing import ... line so they are alphabetically
sorted (e.g., ensure Literal, TypeGuard, TypedDict are in the correct
alphabetical order) to satisfy the ruff sorting rule and rerun the linter; keep
the single from typing import ... statement and do not add or remove imports.
---
Outside diff comments:
In `@models/__init__.py`:
- Around line 3-16: The import block in models/__init__.py is unsorted causing
Ruff to fail; reorder the import lines alphabetically by module name so modules
appear as: errors (ErrorResponse), export (ExportStateDict), project
(ProjectDict, ProjectSessionRowDict, SessionListItemDict), record_data
(RecordDataUnion), search (SearchHitDict), session (MessageDict,
QuickSessionInfoDict, SessionDict, SessionMetadataDict, ToolUseDict), stats
(FilesTouchedDict, SessionStatsDict), tool_results (ToolResultUnion); update the
file to reflect that alphabetical module ordering while keeping the same
imported symbols.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 588107cc-972b-4a26-b183-6be6a98fbc80
📒 Files selected for processing (7)
models/__init__.pymodels/record_data.pymodels/session.pymodels/tool_results.pyutils/jsonl_parser.pyutils/md_exporter.pyutils/tool_dispatch.py
|
@CodeRabbit , Can you reeviewe this PR fully? |
|
✅ Action performedFull review finished. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@models/record_data.py`:
- Around line 6-8: The TypedDict BashProgressDataDict is currently total (all
keys required) which makes the TypeGuard is_bash_progress_data unsafe — change
the TypedDict to allow missing keys by declaring it as non-total (e.g., class
BashProgressDataDict(TypedDict, total=False): with the same type:
Literal["bash_progress"] and output: str), and/or mark output as
NotRequired[str]; additionally ensure the TypeGuard function
is_bash_progress_data either checks for the presence of "output" (e.g., "output"
in data) before narrowing or leave the TypedDict non-total so mypy won't assume
output exists after the type guard.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4b2ee96e-d606-435b-9762-a7866daf186b
📒 Files selected for processing (7)
models/__init__.pymodels/record_data.pymodels/session.pymodels/tool_results.pyutils/jsonl_parser.pyutils/md_exporter.pyutils/tool_dispatch.py
There was a problem hiding this comment.
🧹 Nitpick comments (1)
models/record_data.py (1)
46-47: Don’t add extra TypeGuards for progress variants unless code actually narrows ondata["type"]
utils/jsonl_parser.py:_process_progressstoresdata: RecordDataUnionand derives onlyprogress_type = str(data.get("type", "")); there are no consumption sites that branch ondata["type"]forhook_progress,agent_progress, orsummary. Repo-wide, those discriminator strings only appear inmodels/record_data.pyTypedDict definitions (andcompact_boundaryis handled as a systemsubtype, not as recorddata.type). Also,is_bash_progress_data(models/record_data.py:46-47) is currently unused—add its intended consumption site or remove/defer it.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@models/record_data.py` around lines 46 - 47, The is_bash_progress_data TypeGuard is currently unused; either remove this TypeGuard from models/record_data.py or actually use it to narrow progress records where code branches on data["type"]. If you choose removal, delete the is_bash_progress_data function and any unused TypedDict-only discriminators; if you choose to keep it, update utils/jsonl_parser.py::_process_progress to inspect and branch on data["type"] (e.g., call is_bash_progress_data(data) and handle the bash_progress shape) so the TypeGuard is consumed and provides real narrowing.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@models/record_data.py`:
- Around line 46-47: The is_bash_progress_data TypeGuard is currently unused;
either remove this TypeGuard from models/record_data.py or actually use it to
narrow progress records where code branches on data["type"]. If you choose
removal, delete the is_bash_progress_data function and any unused TypedDict-only
discriminators; if you choose to keep it, update
utils/jsonl_parser.py::_process_progress to inspect and branch on data["type"]
(e.g., call is_bash_progress_data(data) and handle the bash_progress shape) so
the TypeGuard is consumed and provides real narrowing.
timon0305
left a comment
There was a problem hiding this comment.
Solid parse-boundary refactor — mypy strict is green, the isinstance-guarded narrowing in the parser/exporter is the right shape, and no # type: ignore crept in. Main thing: the per-tool TypeGuards in tool_results.py are unused and duplicate tool_dispatch.py's predicates (drift risk) — wire them in or drop them. Two minor notes inline.
…view) Remove duplicate _tool_result_pred_* predicates; dispatch registry uses TypeGuards from models/tool_results.py as single source of truth. Drop unused is_bash_progress_data; document ToolNameLiteral | str intent.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
utils/tool_dispatch.py (1)
148-156: ⚡ Quick winUse
.get()for consistency with the defensive pattern.Line 152 accesses
tr["task"]with bracket notation, which will raiseKeyErrorif the key is missing. Although the predicateis_task_retrieval_tool_resultguarantees"task"is present, every other builder in this file uses.get()beforeisinstancechecks (see lines 78, 104, 120, 183). Using bracket notation here is inconsistent and fragile if the predicate or dispatch logic changes.♻️ Recommended fix for consistency
def _tool_result_build_task_retrieval( tr: ToolResultDict, base: dict[str, object] ) -> dict[str, object]: result = dict(base) - task_obj = tr["task"] if isinstance(tr["task"], dict) else {} + raw_task = tr.get("task") + task_obj = raw_task if isinstance(raw_task, dict) else {} result["result_type"] = "task" result["retrieval_status"] = tr.get("retrieval_status") result["task_id"] = task_obj.get("task_id") return result🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@utils/tool_dispatch.py` around lines 148 - 156, The function _tool_result_build_task_retrieval uses bracket access tr["task"] which can raise KeyError and is inconsistent with other builders; change it to use tr.get("task") in the isinstance check and assignment (e.g., task_obj = tr.get("task") if isinstance(tr.get("task"), dict) else {}) so it follows the existing defensive pattern and preserves result_type, retrieval_status, and task_id behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@utils/tool_dispatch.py`:
- Around line 148-156: The function _tool_result_build_task_retrieval uses
bracket access tr["task"] which can raise KeyError and is inconsistent with
other builders; change it to use tr.get("task") in the isinstance check and
assignment (e.g., task_obj = tr.get("task") if isinstance(tr.get("task"), dict)
else {}) so it follows the existing defensive pattern and preserves result_type,
retrieval_status, and task_id behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e2384d6a-8575-46d5-8c3c-2feb489b014f
📒 Files selected for processing (6)
docs/architecture.mdmodels/record_data.pymodels/session.pymodels/tool_results.pytests/test_real_session_fixtures.pyutils/tool_dispatch.py
✅ Files skipped from review due to trivial changes (3)
- docs/architecture.md
- tests/test_real_session_fixtures.py
- models/record_data.py
🚧 Files skipped from review as they are similar to previous changes (1)
- models/session.py
|
Do you think reviewer's concerns are addressed correctly? @CodeRabbit |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Get all reviews on PR `#72`
gh pr view 72 --repo cppalliance/claude-code-chat-browser --json reviews,comments
# Get review comments (inline)
gh api repos/cppalliance/claude-code-chat-browser/pulls/72/comments --paginate | jq '[.[] | {id, path, line, body, user: .user.login, diff_hunk}]'
# Get issue-level comments
gh api repos/cppalliance/claude-code-chat-browser/issues/72/comments --paginate | jq '[.[] | {id, body, user: .user.login}]'Length of output: 94047
✅ Fully Addressed
|
|
Docstring coverage is enough, and others are resolved, @CodeRabbit |
|
Tip For best results, initiate chat on the files or code changes.
At this point the PR looks clean: the architectural concerns (dead TypeGuards, unsafe narrowing, import ordering, The only remaining step is getting ✏️ Learnings added
|
closes #68
Summary
Replaces
NotRequired[Any]onMessageDict.tool_resultandMessageDict.datawith typed unions so mypy strict constrains the JSONL parse boundary without weakening tests or adding# type: ignore.Rebased onto Monday's ruff/pip-audit CI merge (#70).
Problem
tool_resultanddataare the highest-risk fields in the Claude Code JSONL format — structurally complex and prone to upstream drift.Anylet schema ambiguity flow silently into rendering, export, and search.Changes
models/tool_results.py(new) — per-toolTypedDicts for all 15_TOOL_RESULT_DISPATCHshapes (Bash, Read, Write, Edit, Glob, Grep, Task variants, TodoWrite, WebFetch, WebSearch, UserInput, Plan), plusToolResultUnion,ToolNameLiteral, andTypeGuardhelpersmodels/record_data.py(new) —RecordDataUnionfor progressdatapayloads (bash_progress,hook_progress,agent_progress,summary,compact_boundary) withdict[str, object]fallbackmodels/session.py—tool_result: ToolResultUnion | None,data: RecordDataUnion; addsToolUseDictandMessageUsageDictutils/tool_dispatch.py— predicates/builders typed withToolResultDict;_parse_tool_resultacceptsToolResultUnion | Noneutils/jsonl_parser.py—is_tool_result_dictnarrowing at image-extraction site; typedToolUseDictassembly in_process_assistantutils/md_exporter.py—_render_tool_useconsumesToolUseDictwith explicitdict[str, object]narrowingUnknown upstream shapes use
dict[str, object]instead ofAnyto force explicit narrowing at use sites.Acceptance criteria
tool_resulttyped with union covering all known tool result shapes from fixturesdatatyped with union covering known record typesTypeGuard/Literal(ToolNameLiteral,is_tool_result_dict)# type: ignorein changed production codemypy -p api -p utils -p modelspassesruff check .passes (rebased on feat: add ruff and pip-audit CI gates, fix style violations, and add … #70)Test plan