Skip to content

fix(agent-core): realign mid-history interrupted tool calls on resume#1027

Merged
kermanx merged 6 commits into
mainfrom
fix/mid-history-interrupted-tool-call
Jun 23, 2026
Merged

fix(agent-core): realign mid-history interrupted tool calls on resume#1027
kermanx merged 6 commits into
mainfrom
fix/mid-history-interrupted-tool-call

Conversation

@kermanx

@kermanx kermanx commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

问题

ContextMemory.finishResume 依赖 pendingToolResultIds,而按不变式它只描述 tail 的那个打开的工具调用 exchange。所以补救只在历史末尾生效。

当被 replay 的记录流里有一个位于中间的未闭合 tool call 时:

  1. replay 到该 tool.callpendingToolResultIds 一直非空(result 永不到来);
  2. 之后每条 appendMessage(用户消息 / skill reminder…)因为 hasOpenToolExchange() 为真被塞进 deferredMessages
  3. step.begin 这类 loop event 不走 deferral,直接 push 进 _history
  4. replay 结束后:中间挂着一个无 result 的 assistant tool call,后面跟着别的 assistant turn,本应穿插其间的用户消息全卡在 deferredMessages
  5. finishResume resolve 残留 pending 并 flush deferred —— 但 flush 把它们一股脑接到 tail,只有最后一段对齐,中间全错位(即"只在最后一个 turn 有效果")。

改动

把"补救"从 tail-only 改成在每个 replay step 边界就地闭合

  • 抽出 closePendingToolResults()(原 finishResume 循环体);
  • appendLoopEventstep.begin 分支开头调用它:遇到新 assistant step 而上一个 exchange 还开着,就在当前位置补合成 interrupted result 再开新 step;
  • finishResume 复用同一方法做流末尾收尾。

中间 gap 在下一个 step 开始时即就地闭合 → pendingToolResultIds 立刻清空 → deferral 雪崩不再发生 → 后续消息按记录顺序回到正确位置。

对健康历史零影响:不变式保证 live 的 step 边界不会有残留 pending,这行是 no-op。

测试

新增 closes an interrupted tool call mid-history so later turns stay aligned:构造"中间未闭合 tool call + 之后还有用户消息和完整 turn"的记录流,断言 history/messages 角色序列为 [user, assistant, tool, user, assistant]、synthetic result 落在正确位置(非 tail)、被 defer 的用户消息归位、且 expectResumeMatches() 幂等一致。已验证仅回退 src 时该测试会失败(输出 [user, assistant, assistant, tool, user])。

tsc / oxlint 干净;resume / context / compaction / session-init 套件全过。

Known limitation(非本 PR 范围,非回归)

若 legacy/损坏会话里 apply_compaction 是在某个 exchange 仍打开时录制的,切片可能把持有 pending call 的 assistant step 删掉,留下孤儿 result。该情形当前 tail-only 版本也一样会触发,本 PR 既未修复也未恶化(对健康会话严格 no-op)。正确的修法在 compaction 源头(切片后重新对齐 pendingToolResultIds),留作后续跟进。

ContextMemory.finishResume only closed the trailing open tool exchange
(pendingToolResultIds describes the tail by invariant), so a tool call
left unresolved mid-history would keep hasOpenToolExchange true through
replay, stranding every later message in deferredMessages and aligning
only the final exchange.

Close pending tool calls in place at each replayed step.begin boundary
(extracted as closePendingToolResults, shared with finishResume). The
deferral avalanche never starts, so later messages keep their recorded
order. No-op for healthy histories: the invariant guarantees no pending
at a live step boundary.

Known limitation: an apply_compaction recorded while an exchange is open
(legacy/corrupt sessions) can still slice the owning assistant step away
and leave an orphaned result; this is pre-existing, not a regression, and
is better fixed at the compaction source in a follow-up.
@changeset-bot

changeset-bot Bot commented Jun 23, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 8ed41c4

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@moonshot-ai/kimi-code Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new

pkg-pr-new Bot commented Jun 23, 2026

Copy link
Copy Markdown
pnpm dlx https://pkg.pr.new/@moonshot-ai/kimi-code@8ed41c4
npx https://pkg.pr.new/@moonshot-ai/kimi-code@8ed41c4

commit: 8ed41c4

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5fb8cf6e2c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// earlier step were interrupted (the invariant guarantees this never
// happens live, so this is a no-op outside replay). Close them in place
// before opening the new step so mid-history gaps stay aligned.
this.closePendingToolResults();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Mirror mid-history closure in wire transcripts

This adds the new replay-time behavior to close a pending tool call before the next step.begin, but the daemon message reducer in packages/agent-core/src/services/message/transcript.ts still treats step.begin as a plain assistant append and never clears pendingToolResultIds. Because MessageService._getTranscriptEntries resumes first and then merges context.history.slice(transcript.foldedLength), a session matching the new mid-history interruption case will make /messages include the later assistant from the wire, then append the live tail from the wrong folded index, duplicating/reordering the later turn and omitting the synthetic tool error. Please mirror this closePendingToolResults() behavior in reduceWireRecords when reducing a new step.begin.

Useful? React with 👍 / 👎.

kermanx added 2 commits June 23, 2026 18:25
reduceWireRecords folds the wire log the same way ContextMemory folds
history, and MessageService merges context.history.slice(foldedLength)
onto the reduced entries. The replay-time closePendingToolResults added
for mid-history interruptions is NOT re-persisted to the wire log, so the
reducer's step.begin must reconstruct it: synthesize the interrupted tool
result in place and flush the deferred message. Otherwise foldedLength
undercounts and the live-tail merge duplicates/reorders the later turn and
omits the synthetic error.
@kermanx

kermanx commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator Author

@codex

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9c52224635

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +121 to +125
// Mirrors ContextMemory.closePendingToolResults: a tool call left open when a
// new step begins was interrupted, so synthesize its error result in place.
// ContextMemory does this during replay, where the synthetic result is NOT
// re-persisted to the wire log, so the reducer must reconstruct it to keep
// `foldedLength` aligned with the live folded history.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove the service-layer helper narration

packages/agent-core/src/services/AGENTS.md says the service layer should default to no comments and must not add block/paragraph comments on internal helpers; this new five-line helper comment narrates what closePendingToolResults does rather than documenting a hidden constraint. Please reduce it to at most one short WHY line or let the helper/test names carry the explanation so new service code does not keep expanding the over-commented style that guide forbids.

Useful? React with 👍 / 👎.

Per services/AGENTS.md: no block docstrings on internal helpers; one short
WHY line max.
@kermanx

kermanx commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator Author

@codex

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f8518dc585

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// earlier step were interrupted (the invariant guarantees this never
// happens live, so this is a no-op outside replay). Close them in place
// before opening the new step so mid-history gaps stay aligned.
this.closePendingToolResults();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Skip previously persisted interrupted results

When a damaged session has already been resumed by the previous tail-only logic, finishResume will have appended the synthetic interrupted tool.result at the end of wire.jsonl. On the next resume this new step.begin path synthesizes an in-place result for the same toolCallId, but the old tail record is still replayed later because tool.result appends even when the id is no longer pending, leaving a duplicate/orphan tool message in live history (and the mirrored /messages reducer has the same failure mode). This affects the legacy logs this repair is meant to recover, so the replay needs to detect and skip the previously persisted interrupted result for an already-closed id.

Useful? React with 👍 / 👎.

A legacy session repaired by the old tail-only finishResume has a synthetic
interrupted tool.result persisted at the log tail. On the next resume the new
in-place step.begin closure already answers that id, so replaying the tail copy
appended a duplicate/orphan (the tool.result handler pushed unconditionally);
the wire reducer had the same flaw, desyncing foldedLength and duplicating the
later turn in /messages.

Guard both reducers: skip a tool.result whose id is not in pendingToolResultIds.
In a well-formed stream every result's id is pending (added by its tool.call),
so this only drops stale duplicates and orphans (call compacted away / missing),
which would otherwise make an invalid request anyway. With in-place closure
running during replay, finishResume already only writes the trailing step's open
calls, so no new mid-history results are persisted.

Two rendering fixtures appended bare tool.results without a tool.call; made them
well-formed. Adds regression tests for the legacy stale-tail duplicate in both
ContextMemory and the wire reducer.
@kermanx

kermanx commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator Author

@codex

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 50552623ef

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// earlier step were interrupted (the invariant guarantees this never
// happens live, so this is a no-op outside replay). Close them in place
// before opening the new step so mid-history gaps stay aligned.
this.closePendingToolResults();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Teach raw wire views about in-place tool repair

When this runs during AgentRecords.restore, logRecord suppresses the synthetic tool.result, so mid-history interrupted calls are repaired only in memory and never appear in wire.jsonl. The /messages reducer was updated, but I checked apps/vis/server/src/lib/context-projector.ts and it still creates tool messages only from actual ev.type === 'tool.result' records, so the visual context endpoint will show an assistant tool call with no interrupted-error result after a resumed damaged session. Please either persist/rewrite the positioned repair record or update the raw wire projector(s) to mirror this same closure.

Useful? React with 👍 / 👎.

Add coverage for the resume/wire-reducer repair paths: multiple open calls
in one interrupted step (closed in tool-call order), partial resolution
(only the unresolved call synthesized), consecutive interrupted steps each
closed at their own boundary, and orphan tool results (no matching call)
dropped. Reducer cases also assert foldedLength stays aligned.
@kermanx kermanx merged commit c240bfa into main Jun 23, 2026
11 of 12 checks passed
@kermanx kermanx deleted the fix/mid-history-interrupted-tool-call branch June 23, 2026 14:39
@github-actions github-actions Bot mentioned this pull request Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant