fix(agent-core): realign mid-history interrupted tool calls on resume#1027
Conversation
ContextMemory.finishResume only closed the trailing open tool exchange (pendingToolResultIds describes the tail by invariant), so a tool call left unresolved mid-history would keep hasOpenToolExchange true through replay, stranding every later message in deferredMessages and aligning only the final exchange. Close pending tool calls in place at each replayed step.begin boundary (extracted as closePendingToolResults, shared with finishResume). The deferral avalanche never starts, so later messages keep their recorded order. No-op for healthy histories: the invariant guarantees no pending at a live step boundary. Known limitation: an apply_compaction recorded while an exchange is open (legacy/corrupt sessions) can still slice the owning assistant step away and leave an orphaned result; this is pre-existing, not a regression, and is better fixed at the compaction source in a follow-up.
🦋 Changeset detectedLatest commit: 8ed41c4 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
commit: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5fb8cf6e2c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // earlier step were interrupted (the invariant guarantees this never | ||
| // happens live, so this is a no-op outside replay). Close them in place | ||
| // before opening the new step so mid-history gaps stay aligned. | ||
| this.closePendingToolResults(); |
There was a problem hiding this comment.
Mirror mid-history closure in wire transcripts
This adds the new replay-time behavior to close a pending tool call before the next step.begin, but the daemon message reducer in packages/agent-core/src/services/message/transcript.ts still treats step.begin as a plain assistant append and never clears pendingToolResultIds. Because MessageService._getTranscriptEntries resumes first and then merges context.history.slice(transcript.foldedLength), a session matching the new mid-history interruption case will make /messages include the later assistant from the wire, then append the live tail from the wrong folded index, duplicating/reordering the later turn and omitting the synthetic tool error. Please mirror this closePendingToolResults() behavior in reduceWireRecords when reducing a new step.begin.
Useful? React with 👍 / 👎.
reduceWireRecords folds the wire log the same way ContextMemory folds history, and MessageService merges context.history.slice(foldedLength) onto the reduced entries. The replay-time closePendingToolResults added for mid-history interruptions is NOT re-persisted to the wire log, so the reducer's step.begin must reconstruct it: synthesize the interrupted tool result in place and flush the deferred message. Otherwise foldedLength undercounts and the live-tail merge duplicates/reorders the later turn and omits the synthetic error.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9c52224635
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // Mirrors ContextMemory.closePendingToolResults: a tool call left open when a | ||
| // new step begins was interrupted, so synthesize its error result in place. | ||
| // ContextMemory does this during replay, where the synthetic result is NOT | ||
| // re-persisted to the wire log, so the reducer must reconstruct it to keep | ||
| // `foldedLength` aligned with the live folded history. |
There was a problem hiding this comment.
Remove the service-layer helper narration
packages/agent-core/src/services/AGENTS.md says the service layer should default to no comments and must not add block/paragraph comments on internal helpers; this new five-line helper comment narrates what closePendingToolResults does rather than documenting a hidden constraint. Please reduce it to at most one short WHY line or let the helper/test names carry the explanation so new service code does not keep expanding the over-commented style that guide forbids.
Useful? React with 👍 / 👎.
Per services/AGENTS.md: no block docstrings on internal helpers; one short WHY line max.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f8518dc585
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // earlier step were interrupted (the invariant guarantees this never | ||
| // happens live, so this is a no-op outside replay). Close them in place | ||
| // before opening the new step so mid-history gaps stay aligned. | ||
| this.closePendingToolResults(); |
There was a problem hiding this comment.
Skip previously persisted interrupted results
When a damaged session has already been resumed by the previous tail-only logic, finishResume will have appended the synthetic interrupted tool.result at the end of wire.jsonl. On the next resume this new step.begin path synthesizes an in-place result for the same toolCallId, but the old tail record is still replayed later because tool.result appends even when the id is no longer pending, leaving a duplicate/orphan tool message in live history (and the mirrored /messages reducer has the same failure mode). This affects the legacy logs this repair is meant to recover, so the replay needs to detect and skip the previously persisted interrupted result for an already-closed id.
Useful? React with 👍 / 👎.
A legacy session repaired by the old tail-only finishResume has a synthetic interrupted tool.result persisted at the log tail. On the next resume the new in-place step.begin closure already answers that id, so replaying the tail copy appended a duplicate/orphan (the tool.result handler pushed unconditionally); the wire reducer had the same flaw, desyncing foldedLength and duplicating the later turn in /messages. Guard both reducers: skip a tool.result whose id is not in pendingToolResultIds. In a well-formed stream every result's id is pending (added by its tool.call), so this only drops stale duplicates and orphans (call compacted away / missing), which would otherwise make an invalid request anyway. With in-place closure running during replay, finishResume already only writes the trailing step's open calls, so no new mid-history results are persisted. Two rendering fixtures appended bare tool.results without a tool.call; made them well-formed. Adds regression tests for the legacy stale-tail duplicate in both ContextMemory and the wire reducer.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 50552623ef
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // earlier step were interrupted (the invariant guarantees this never | ||
| // happens live, so this is a no-op outside replay). Close them in place | ||
| // before opening the new step so mid-history gaps stay aligned. | ||
| this.closePendingToolResults(); |
There was a problem hiding this comment.
Teach raw wire views about in-place tool repair
When this runs during AgentRecords.restore, logRecord suppresses the synthetic tool.result, so mid-history interrupted calls are repaired only in memory and never appear in wire.jsonl. The /messages reducer was updated, but I checked apps/vis/server/src/lib/context-projector.ts and it still creates tool messages only from actual ev.type === 'tool.result' records, so the visual context endpoint will show an assistant tool call with no interrupted-error result after a resumed damaged session. Please either persist/rewrite the positioned repair record or update the raw wire projector(s) to mirror this same closure.
Useful? React with 👍 / 👎.
Add coverage for the resume/wire-reducer repair paths: multiple open calls in one interrupted step (closed in tool-call order), partial resolution (only the unresolved call synthesized), consecutive interrupted steps each closed at their own boundary, and orphan tool results (no matching call) dropped. Reducer cases also assert foldedLength stays aligned.
问题
ContextMemory.finishResume依赖pendingToolResultIds,而按不变式它只描述 tail 的那个打开的工具调用 exchange。所以补救只在历史末尾生效。当被 replay 的记录流里有一个位于中间的未闭合 tool call 时:
tool.call后pendingToolResultIds一直非空(result 永不到来);appendMessage(用户消息 / skill reminder…)因为hasOpenToolExchange()为真被塞进deferredMessages;step.begin这类 loop event 不走 deferral,直接 push 进_history;deferredMessages;finishResumeresolve 残留 pending 并 flush deferred —— 但 flush 把它们一股脑接到 tail,只有最后一段对齐,中间全错位(即"只在最后一个 turn 有效果")。改动
把"补救"从 tail-only 改成在每个 replay step 边界就地闭合:
closePendingToolResults()(原finishResume循环体);appendLoopEvent的step.begin分支开头调用它:遇到新 assistant step 而上一个 exchange 还开着,就在当前位置补合成 interrupted result 再开新 step;finishResume复用同一方法做流末尾收尾。中间 gap 在下一个 step 开始时即就地闭合 →
pendingToolResultIds立刻清空 → deferral 雪崩不再发生 → 后续消息按记录顺序回到正确位置。对健康历史零影响:不变式保证 live 的 step 边界不会有残留 pending,这行是 no-op。
测试
新增
closes an interrupted tool call mid-history so later turns stay aligned:构造"中间未闭合 tool call + 之后还有用户消息和完整 turn"的记录流,断言 history/messages 角色序列为[user, assistant, tool, user, assistant]、synthetic result 落在正确位置(非 tail)、被 defer 的用户消息归位、且expectResumeMatches()幂等一致。已验证仅回退 src 时该测试会失败(输出[user, assistant, assistant, tool, user])。tsc/oxlint干净;resume / context / compaction / session-init 套件全过。Known limitation(非本 PR 范围,非回归)
若 legacy/损坏会话里
apply_compaction是在某个 exchange 仍打开时录制的,切片可能把持有 pending call 的 assistant step 删掉,留下孤儿 result。该情形当前 tail-only 版本也一样会触发,本 PR 既未修复也未恶化(对健康会话严格 no-op)。正确的修法在 compaction 源头(切片后重新对齐pendingToolResultIds),留作后续跟进。