feat(harness): cooperative stop button for active runs#143
Conversation
Add `run::stop` as a thin wrapper over the existing `router::abort` flag
write plus `approval::sweep_session` and a `turn::step_requested` publish.
Composer send button morphs to a stop button while a run is in flight,
with an honest "stopping..." intermediate state so the user gets immediate
feedback even though the orchestrator can only act at the next state
boundary.
Backend (turn-orchestrator):
- Shared `crate::abort::{is_set, clear, abort_signal_key}` helper.
- `run::stop` returns typed `{accepted, reason, prior_state}`; idempotent
for unknown sessions and already-stopped records.
- Abort flag checked at every transition that precedes a costly operation:
provisioning (clear), awaiting, prepare, execute (per-call).
- `steering::handle` now uses the shared helper instead of its private
copy.
- Integration tests cover payload validation, unknown-session and
already-stopped short-circuits, primitives invoked on the happy path,
and double-stop idempotency.
Frontend (harness/web):
- Composer takes `running` + `onStop` props. While running the textarea
is `readOnly` (preserves focus for screen readers) and the send button
renders as a red stop button with `aria-label`/`title`. After the user
clicks stop, the button shows "stopping..." with `aria-busy` until the
parent flips `running` back to false.
- `submit()` is gated on `running` (not just the form's `onSubmit`) so
Enter inside the textarea is a true no-op mid-run.
- `.composer-send` gets a pinned `min-width` so the send/stop/stopping
labels don't shift layout.
- Vitest cases cover the morph, click routing, no-op submission, and
reset behavior.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
skill-check — worker1 verified, 24 skipped (no docs/).
Three for three. Nicely done. |
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR adds a cooperative stop mechanism for in-flight LLM runs. Users can now click a stop button in the composer UI, triggering a backend ChangesCooperative Run Stop Mechanism
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
harness/web/src/components/Composer.test.tsx (1)
66-80: ⚡ Quick winThis test is a false positive for running-gate behavior.
It never creates a non-empty draft, so
onSendstays uncalled even without therunningguard. Please prefill text before flipping torunningand pressing Enter to validate the intended path.Stronger test pattern
- it("form submit is a no-op while running", () => { - const { onSend } = setup({ running: true }); + it("form submit is a no-op while running", () => { + const { onSend, onStop, rerender } = setup({ running: false }); + const textarea = screen.getByRole("textbox") as HTMLTextAreaElement; + fireEvent.change(textarea, { target: { value: "hello" } }); + rerender( + <Composer + disabled={false} + onSend={onSend} + cwd="" + skillRows={null} + sessionMessages={[]} + running={true} + onStop={onStop} + />, + ); // The textarea is `readOnly` while running but still focusable. Type // something programmatically (we set value via fireEvent for the test) // and try to submit via Enter. - const textarea = screen.getByPlaceholderText( - /run in flight/i, - ) as HTMLTextAreaElement; fireEvent.keyDown(textarea, { key: "Enter", code: "Enter", shiftKey: false, }); expect(onSend).not.toHaveBeenCalled(); });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@harness/web/src/components/Composer.test.tsx` around lines 66 - 80, The test currently never populates the draft so pressing Enter will never call onSend regardless of the running flag; update the test in Composer.test.tsx to first fill the textarea (use setup or fireEvent.change on the element retrieved via screen.getByPlaceholderText) with a non-empty value, then set up or flip to running: true and simulate the Enter key (fireEvent.keyDown) to assert onSend is not called; reference the existing setup({ running: true }) helper and the textarea element retrieval to locate where to insert the prefill step before firing the Enter event.turn-orchestrator/tests/run_stop.rs (2)
112-124: ⚡ Quick winFail fast when state seeding fails in tests.
Ignoring the
state::setresult in Line 112 can mask fixture setup failures and make downstream assertions misleading.Proposed fix
- let _ = iii + iii .trigger(TriggerRequest { @@ }) - .await; + .await + .expect("seed_record: state::set should succeed");🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@turn-orchestrator/tests/run_stop.rs` around lines 112 - 124, The test currently ignores the result of iii.trigger(TriggerRequest { function_id: "state::set", ... }). Instead fail fast when seeding state: replace the discarded await with handling that asserts success or unwraps the Result so the test panics on failure. Locate the call to iii.trigger / TriggerRequest in run_stop.rs and change the await to propagate errors (e.g., .await.unwrap() or .await.expect("state::set failed for key ...")) so failures in turn_orchestrator::turn_state_key(session_id) setup are not masked.
71-82: ⚡ Quick winAssert
iii::durable::publishto pin the wake-up contract.The happy-path currently verifies abort+sweep but not the step-request publish. Add this assertion so regressions in subscriber wake-up are caught.
Proposed fix
impl Sink { @@ fn sweep(&self) -> Vec<Value> { self.sweep_calls.lock().unwrap().clone() } + fn step(&self) -> Vec<Value> { + self.step_publishes.lock().unwrap().clone() + } } @@ let sweep_calls = sink.sweep(); assert_eq!(sweep_calls.len(), 1, "approval::sweep_session invoked once"); assert_eq!(sweep_calls[0]["session_id"], json!(session_id)); assert_eq!(sweep_calls[0]["reason"], json!("run_stopped")); + + let step_calls = sink.step(); + assert_eq!(step_calls.len(), 1, "iii::durable::publish invoked once"); }Also applies to: 223-251
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@turn-orchestrator/tests/run_stop.rs` around lines 71 - 82, Add an assertion that the registered handler for RegisterFunctionMessage::with_id("iii::durable::publish") actually received a publish by checking the sink.step_publishes (or step_log) contents after the test run; locate where iii.register_function is called and after the happy-path that verifies abort+sweep, assert that sink.step_publishes contains at least one payload (or the expected payload shape) to pin the wake-up contract (make the same change in the other similar block around the 223-251 region).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@harness/web/src/components/Composer.tsx`:
- Around line 348-352: handleStopClick currently latches stopRequested before
knowing if onStop succeeds and ignores the promise; change handleStopClick to be
async, call await onStop?.() inside a try/catch, set stopRequested(true) while
the operation is pending and if the awaited onStop resolves keep it set, but if
it throws reset stopRequested(false) in the catch so the UI can retry; also
ensure callers of onStop propagate failures (reject) instead of swallowing them
so this rollback runs. Reference: handleStopClick, stopRequested,
setStopRequested, onStop.
In `@turn-orchestrator/src/run_stop.rs`:
- Around line 56-87: The handler currently ignores failures from iii.trigger
calls for "router::abort" and "approval::sweep_session" and always proceeds to
run_start::publish_step, which can make the handler return a false-success;
change the control flow in the run-stop handler so that if iii.trigger(...) for
function_id "router::abort" or "approval::sweep_session" returns Err you
propagate that failure to the caller (or return an error/accepted:false
response) instead of continuing — detect errors from iii.trigger(TriggerRequest
{ function_id: "router::abort" }) and iii.trigger(TriggerRequest { function_id:
"approval::sweep_session" }) and short-circuit (return Err or an explicit
non-accepted response) after logging, only calling run_start::publish_step(&iii,
&session_id).await when both trigger calls succeed.
In `@turn-orchestrator/src/states/provisioning.rs`:
- Around line 28-33: The unconditional call to crate::abort::clear(iii,
&record.session_id) can erase a valid in-flight stop for the current run; update
this to only clear stale pre-run abort state or make abort state run-scoped.
Specifically, change the logic around crate::abort::clear in provisioning.rs to
either (a) use a run-scoped key (include the current turn/run id `iii` in the
abort key or introduce a new API like crate::abort::clear_run(iii,
&record.session_id)) or (b) perform a conditional clear: read the existing abort
entry (e.g., crate::abort::get or peek) for `&record.session_id`, compare its
stored run/turn id or timestamp against the current `iii`/record, and only call
crate::abort::clear when the stored entry is from a previous run; ensure this
preserves cooperative-stop semantics with run::stop and aligns with
run_start::execute expectations.
---
Nitpick comments:
In `@harness/web/src/components/Composer.test.tsx`:
- Around line 66-80: The test currently never populates the draft so pressing
Enter will never call onSend regardless of the running flag; update the test in
Composer.test.tsx to first fill the textarea (use setup or fireEvent.change on
the element retrieved via screen.getByPlaceholderText) with a non-empty value,
then set up or flip to running: true and simulate the Enter key
(fireEvent.keyDown) to assert onSend is not called; reference the existing
setup({ running: true }) helper and the textarea element retrieval to locate
where to insert the prefill step before firing the Enter event.
In `@turn-orchestrator/tests/run_stop.rs`:
- Around line 112-124: The test currently ignores the result of
iii.trigger(TriggerRequest { function_id: "state::set", ... }). Instead fail
fast when seeding state: replace the discarded await with handling that asserts
success or unwraps the Result so the test panics on failure. Locate the call to
iii.trigger / TriggerRequest in run_stop.rs and change the await to propagate
errors (e.g., .await.unwrap() or .await.expect("state::set failed for key ..."))
so failures in turn_orchestrator::turn_state_key(session_id) setup are not
masked.
- Around line 71-82: Add an assertion that the registered handler for
RegisterFunctionMessage::with_id("iii::durable::publish") actually received a
publish by checking the sink.step_publishes (or step_log) contents after the
test run; locate where iii.register_function is called and after the happy-path
that verifies abort+sweep, assert that sink.step_publishes contains at least one
payload (or the expected payload shape) to pin the wake-up contract (make the
same change in the other similar block around the 223-251 region).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b5064924-a72c-4791-a34c-23ad5cfd27fe
📒 Files selected for processing (13)
harness/web/src/App.tsxharness/web/src/components/Composer.test.tsxharness/web/src/components/Composer.tsxharness/web/src/styles.cssturn-orchestrator/src/abort.rsturn-orchestrator/src/lib.rsturn-orchestrator/src/register.rsturn-orchestrator/src/run_stop.rsturn-orchestrator/src/states/assistant.rsturn-orchestrator/src/states/functions.rsturn-orchestrator/src/states/provisioning.rsturn-orchestrator/src/states/steering.rsturn-orchestrator/tests/run_stop.rs
| const handleStopClick = useCallback(() => { | ||
| if (stopRequested) return; | ||
| setStopRequested(true); | ||
| void onStop?.(); | ||
| }, [stopRequested, onStop]); |
There was a problem hiding this comment.
Stop can get stuck in “stopping…” with no retry path on failure.
At Line 350-Line 351, stopRequested is latched before onStop outcome is known, and the promise is ignored (void onStop?.()). A failed stop request leaves the button disabled until teardown, so users can’t retry.
Suggested fix
- const handleStopClick = useCallback(() => {
- if (stopRequested) return;
- setStopRequested(true);
- void onStop?.();
- }, [stopRequested, onStop]);
+ const handleStopClick = useCallback(async () => {
+ if (stopRequested || !onStop) return;
+ setStopRequested(true);
+ try {
+ await onStop();
+ } catch {
+ setStopRequested(false);
+ }
+ }, [stopRequested, onStop]);Also ensure the caller rejects on failure (instead of swallowing) so this rollback path can run.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const handleStopClick = useCallback(() => { | |
| if (stopRequested) return; | |
| setStopRequested(true); | |
| void onStop?.(); | |
| }, [stopRequested, onStop]); | |
| const handleStopClick = useCallback(async () => { | |
| if (stopRequested || !onStop) return; | |
| setStopRequested(true); | |
| try { | |
| await onStop(); | |
| } catch { | |
| setStopRequested(false); | |
| } | |
| }, [stopRequested, onStop]); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@harness/web/src/components/Composer.tsx` around lines 348 - 352,
handleStopClick currently latches stopRequested before knowing if onStop
succeeds and ignores the promise; change handleStopClick to be async, call await
onStop?.() inside a try/catch, set stopRequested(true) while the operation is
pending and if the awaited onStop resolves keep it set, but if it throws reset
stopRequested(false) in the catch so the UI can retry; also ensure callers of
onStop propagate failures (reject) instead of swallowing them so this rollback
runs. Reference: handleStopClick, stopRequested, setStopRequested, onStop.
| if let Err(e) = iii | ||
| .trigger(TriggerRequest { | ||
| function_id: "router::abort".into(), | ||
| payload: json!({ "session_id": session_id }), | ||
| action: None, | ||
| timeout_ms: None, | ||
| }) | ||
| .await | ||
| { | ||
| tracing::warn!(error = %e, %session_id, "run::stop: router::abort failed"); | ||
| } | ||
|
|
||
| // 2) Sweep pending approvals so no queued approval can still dispatch | ||
| // its inner function after the user clicked stop. | ||
| if let Err(e) = iii | ||
| .trigger(TriggerRequest { | ||
| function_id: "approval::sweep_session".into(), | ||
| payload: json!({ | ||
| "session_id": session_id, | ||
| "reason": "run_stopped", | ||
| }), | ||
| action: None, | ||
| timeout_ms: None, | ||
| }) | ||
| .await | ||
| { | ||
| tracing::warn!(error = %e, %session_id, "run::stop: approval::sweep_session failed"); | ||
| } | ||
|
|
||
| // 3) Nudge the durable subscriber so the abort branch fires immediately. | ||
| run_start::publish_step(&iii, &session_id).await; | ||
|
|
There was a problem hiding this comment.
Do not return accepted: true when stop side-effects fail.
If Line 58 (router::abort) fails, the handler still reports success. That can produce a false-positive “stopping succeeded” response while the run continues.
Proposed fix
pub async fn execute(iii: III, payload: Value) -> Result<Value, IIIError> {
@@
- // 1) Flag write via existing primitive.
+ let mut failures: Vec<&str> = Vec::new();
+
+ // 1) Flag write via existing primitive.
if let Err(e) = iii
.trigger(TriggerRequest {
function_id: "router::abort".into(),
payload: json!({ "session_id": session_id }),
@@
{
tracing::warn!(error = %e, %session_id, "run::stop: router::abort failed");
+ failures.push("router_abort_failed");
}
@@
if let Err(e) = iii
.trigger(TriggerRequest {
@@
{
tracing::warn!(error = %e, %session_id, "run::stop: approval::sweep_session failed");
+ failures.push("approval_sweep_failed");
}
@@
run_start::publish_step(&iii, &session_id).await;
+
+ if !failures.is_empty() {
+ return Ok(json!({
+ "session_id": session_id,
+ "accepted": false,
+ "reason": "stop_side_effect_failed",
+ "failures": failures,
+ "prior_state": prior_state.map(|s| s.as_str()),
+ }));
+ }
@@
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@turn-orchestrator/src/run_stop.rs` around lines 56 - 87, The handler
currently ignores failures from iii.trigger calls for "router::abort" and
"approval::sweep_session" and always proceeds to run_start::publish_step, which
can make the handler return a false-success; change the control flow in the
run-stop handler so that if iii.trigger(...) for function_id "router::abort" or
"approval::sweep_session" returns Err you propagate that failure to the caller
(or return an error/accepted:false response) instead of continuing — detect
errors from iii.trigger(TriggerRequest { function_id: "router::abort" }) and
iii.trigger(TriggerRequest { function_id: "approval::sweep_session" }) and
short-circuit (return Err or an explicit non-accepted response) after logging,
only calling run_start::publish_step(&iii, &session_id).await when both trigger
calls succeed.
| // Single source of truth for "fresh turn entry": clear any sticky abort | ||
| // flag carried over from a previous stopped run. Doing this here (not in | ||
| // `run_start::execute`) avoids a race where `run_start` clears the flag | ||
| // while a previous turn's subscriber tick is still in-flight. | ||
| crate::abort::clear(iii, &record.session_id).await; | ||
|
|
There was a problem hiding this comment.
Unconditional abort clear can drop a valid stop request for the current run
At Line 32, clearing by session_id unconditionally introduces a race with run::stop: if stop is set just before/during provisioning, this clear can erase it and the run won’t short-circuit as expected. This violates the cooperative-stop behavior under timing contention.
Consider making abort state run-scoped (e.g., include turn/run id in key) or using a conditional clear that only removes stale pre-run state.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@turn-orchestrator/src/states/provisioning.rs` around lines 28 - 33, The
unconditional call to crate::abort::clear(iii, &record.session_id) can erase a
valid in-flight stop for the current run; update this to only clear stale
pre-run abort state or make abort state run-scoped. Specifically, change the
logic around crate::abort::clear in provisioning.rs to either (a) use a
run-scoped key (include the current turn/run id `iii` in the abort key or
introduce a new API like crate::abort::clear_run(iii, &record.session_id)) or
(b) perform a conditional clear: read the existing abort entry (e.g.,
crate::abort::get or peek) for `&record.session_id`, compare its stored run/turn
id or timestamp against the current `iii`/record, and only call
crate::abort::clear when the stored entry is from a previous run; ensure this
preserves cooperative-stop semantics with run::stop and aligns with
run_start::execute expectations.
…ndlers Adds 6 integration tests targeting the abort flow's load-bearing edges that the original PR's 5 tests left uncovered: - abort::is_set / clear round-trip via state::get/state::set - handle_awaiting short-circuits to SteeringCheck without burning a turn_count when the flag is set - handle_prepare drops pending_function_calls on abort - handle_execute checks the flag between calls so a long batch can be interrupted mid-loop - handle_provisioning clears the sticky flag at the top of every fresh turn entry — the single source of truth for "we are starting a new turn" (decided over run_start::execute to avoid the race identified in the autoplan dual-voice review) - user can continue working on the same session_id after stop: a fresh run::start drives through provisioning which scrubs the flag, so the new turn proceeds normally Drives transitions::step directly rather than going through the durable subscriber — keeps tests fast and decoupled from publish timing.
Summary
Adds a stop button to the harness chat composer so users can interrupt an in-flight agent turn without waiting for the orchestrator to reach
TearingDownon its own.run::stopbus function — a thin wrapper over the existingrouter::abortflag write plusapproval::sweep_sessionand aturn::step_requestedpublish. Returns a typed{accepted, reason, prior_state}envelope; idempotent for unknown sessions and already-stopped records.provisioning(clears the flag at entry),awaiting,prepare, and per-call insideexecute.steering::handleswitches to the sharedcrate::abort::is_sethelper.useAgentStreamreportsstatus === "running". Clicking it shows an honest "stopping…" intermediate state until the run actually tears down, so the user gets feedback even though cooperative stop can't preempt the in-flight provider call.Built on top of existing primitives — no duplicated flag write, no duplicated approval cleanup. See
docs/superpowers/specs/2026-05-15-stop-button-design.mdonce that lands (currently local in the planning skill).Out of scope / followups (intentional)
router::stream_assistantblocks for up to 300s; cooperative stop cannot preempt that. Worst-case user wait is ~300s (typically <30s for healthy responses). A real fix requires threading a cancellation token throughprovider-routerand the provider crates.turn::step_requestedpublished byrun::stopcould in principle overlap with the natural step tick. Outside this PR's blast radius.Test plan
cargo test -p iii-turn-orchestrator— 118 passed including 5 new integration tests intests/run_stop.rscovering payload validation, unknown-session, already-stopped, primitives invoked on the happy path, and double-stop idempotency.npm --prefix harness/web test— 139 passed including 7 new Vitest cases inComposer.test.tsxcovering the send→stop→stopping morph, click routing, no-op Enter while running, and reset onrunningflip.npx tsc --noEmit— clean.harness/scripts/demo.sh build engine start web, send a prompt that triggers a long tool chain, click stop, expect the button to switch to "stopping…" immediately and the transcript to land an aborted assistant message at the next state boundary.Summary by CodeRabbit
New Features
Tests