Skip to content

fix(runtime): close turn-lifecycle cancellation contract gaps#189

Merged
yishuiliunian merged 1 commit into
mainfrom
fix/turn-lifecycle-cancellation
Jun 2, 2026
Merged

fix(runtime): close turn-lifecycle cancellation contract gaps#189
yishuiliunian merged 1 commit into
mainfrom
fix/turn-lifecycle-cancellation

Conversation

@yishuiliunian
Copy link
Copy Markdown
Contributor

Summary

  • 一次真实会话暴露 agent「静默挂起」:continuation 一致性闸门作用于所有 turn(误吞用户输入)、取消有多入口无共享不变量、循环检测只看 input(固定路径轮询被误杀)。
  • 把 Turn 生命周期建模成有契约的状态机:trigger 显式分流、取消收口单一函数、循环检测改为 input+output 内容感知。
  • 本轮额外清理 pre-existing latent 缝(双 emit、stringly-typed cause 契约锁定、第三 bypass 例外文档化、rewind 按身份)。

Changes

  • 根因 A(continuation 闸门):turn.rs TurnTrigger::is_goal_continuation();goal_consistency.rs 仅 continuation turn 走闸门 + ContinuationSkipped 可观测 + 按 current_turn_index() 身份 rewind;event_payload.rs 新增 ContinuationSkipped
  • 根因 B(取消收口):新增 turn_cancel_finalize.rs(finalize + wire 映射);end_turn_recordpub(crate);ingest.rs/turn_observer_dispatch.rs/turn_telemetry.rs 统一;event_payload.rs 新增 TurnCancelled;cancelled turn 不再双 emit、不再喂 governance。
  • 根因 C(循环检测):loop_detector.rson_after_tools 按 (input,output) 连续相同计数,hash content+is_error+image content_key+metadata;image.rs content_key();on_turn_cancelled 重置。
  • 穷尽 match 补分支:loopal-session / loopal-view-state / loopal-acp
  • system prompt:chmod-after-Write 同批告警。
  • 测试:新增 turn_trigger / cancel_open_batch / continuation_bypass / cancel_finalize_e2e / governance_cancel_e2e / loop_detector_digest + turn_store current_turn_index 单测;loop_detector 既有用例改 before→after 周期。

Test plan

  • CI passes
  • 本地:bazel build //... --config=clippy 零警告 + bazel test //... 94/94 通过

A real session showed the agent silently "hanging": the goal-continuation
consistency gate ran on EVERY turn (not just continuations), swallowing real
user input; turn cancellation had multiple entry points with no shared
invariant; and the loop detector keyed only on tool input, so a tool re-reading
a mutating path (ReadImage on an overwritten screenshot) was falsely aborted.

Root causes addressed:
- A. Continuation gate now keys on TurnTrigger::is_goal_continuation(), so
  non-continuation turns always reach the LLM. Stale continuations are skipped
  observably (ContinuationSkipped) and rewound by identity (current_turn_index),
  not by len-1 position.
- B. All user-level cancellation funnels through finalize_turn_cancellation:
  pairs the open tool batch, resets continuation state, notifies governance
  on_turn_cancelled, emits TurnCancelled, ends the record. end_turn_record is
  pub(crate) so Cancelled construction is single-sourced; the two intentional
  bypasses (compaction host, governance abort) are documented.
- C. LoopDetector counts CONSECUTIVE identical (input, output) in on_after_tools,
  hashing content + is_error + image content_key + metadata; resets on
  on_turn_cancelled / task boundary / compaction.

A cancelled turn no longer emits both TurnCompleted and TurnCancelled, and no
longer feeds the degeneration/loop detectors.

Every fix carries unit + e2e regression coverage.
@yishuiliunian yishuiliunian merged commit dc6bdfb into main Jun 2, 2026
4 checks passed
@yishuiliunian yishuiliunian deleted the fix/turn-lifecycle-cancellation branch June 2, 2026 02:16
yishuiliunian added a commit that referenced this pull request Jun 3, 2026
…191)

* fix(tui): show "Compacting" status during compaction instead of Idle

#190 made AgentStatus backend-only and declared compact_banner the sole
signal for compacting, but the unified status line never consumed it.
Manual /compact runs as a control command in the idle phase, so status
stays WaitingForInput and the decision tree fell through to Idle with a
frozen spinner — covering auto-compaction and resume-rehydrate too.

Extract the status-label decision into a pure pick_label() and add a
Compacting tier (after Thinking, before Streaming). Feed it
compact_banner.is_some(), and add the same flag to is_agent_active so the
spinner animates for the whole compaction rather than freezing after the
750ms activity grace.

Presentation-layer only: no AgentStatus mutation, no protocol change, so
it does not reintroduce the #189/#190 turn-lifecycle desync risk.

* fix: address CI failure - rustfmt struct-update layout in tests

rustfmt expands single-line `ActivityInputs { field: x, ..base() }` to
multi-line; apply the canonical formatting.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant