Skip to content

feat(harness): cooperative stop button for active runs#143

Open
ytallo wants to merge 2 commits into
mainfrom
feat/harness-stop-button
Open

feat(harness): cooperative stop button for active runs#143
ytallo wants to merge 2 commits into
mainfrom
feat/harness-stop-button

Conversation

@ytallo
Copy link
Copy Markdown
Contributor

@ytallo ytallo commented May 15, 2026

Summary

Adds a stop button to the harness chat composer so users can interrupt an in-flight agent turn without waiting for the orchestrator to reach TearingDown on its own.

  • Backend: new run::stop bus function — a thin wrapper over the existing router::abort flag write plus approval::sweep_session and a turn::step_requested publish. Returns a typed {accepted, reason, prior_state} envelope; idempotent for unknown sessions and already-stopped records.
  • Orchestrator hooks: the abort flag is now checked at every transition that precedes a costly operation — provisioning (clears the flag at entry), awaiting, prepare, and per-call inside execute. steering::handle switches to the shared crate::abort::is_set helper.
  • Frontend: the Composer send button morphs to a red stop button while useAgentStream reports status === "running". Clicking it shows an honest "stopping…" intermediate state until the run actually tears down, so the user gets feedback even though cooperative stop can't preempt the in-flight provider call.

Built on top of existing primitives — no duplicated flag write, no duplicated approval cleanup. See docs/superpowers/specs/2026-05-15-stop-button-design.md once that lands (currently local in the planning skill).

Out of scope / followups (intentional)

  • Mid-stream provider cancellation. router::stream_assistant blocks for up to 300s; cooperative stop cannot preempt that. Worst-case user wait is ~300s (typically <30s for healthy responses). A real fix requires threading a cancellation token through provider-router and the provider crates.
  • Subscriber CAS / lease. A duplicate turn::step_requested published by run::stop could in principle overlap with the natural step tick. Outside this PR's blast radius.

Test plan

  • cargo test -p iii-turn-orchestrator — 118 passed including 5 new integration tests in tests/run_stop.rs covering payload validation, unknown-session, already-stopped, primitives invoked on the happy path, and double-stop idempotency.
  • npm --prefix harness/web test — 139 passed including 7 new Vitest cases in Composer.test.tsx covering the send→stop→stopping morph, click routing, no-op Enter while running, and reset on running flip.
  • npx tsc --noEmit — clean.
  • Manual end-to-end: harness/scripts/demo.sh build engine start web, send a prompt that triggers a long tool chain, click stop, expect the button to switch to "stopping…" immediately and the transcript to land an aborted assistant message at the next state boundary.
  • Manual approval case: stop while a pending approval row is up, expect the approval to disappear (sweep) and no function execution to happen.

Summary by CodeRabbit

  • New Features

    • Added ability to stop in-flight runs; when active, the send button converts to a stop button with updated UI. Clicking stops the session and displays stopping status.
  • Tests

    • Added comprehensive test coverage for stop functionality, verifying button behavior, state transitions, and full stop operation flow across frontend and backend.

Review Change Stack

Add `run::stop` as a thin wrapper over the existing `router::abort` flag
write plus `approval::sweep_session` and a `turn::step_requested` publish.
Composer send button morphs to a stop button while a run is in flight,
with an honest "stopping..." intermediate state so the user gets immediate
feedback even though the orchestrator can only act at the next state
boundary.

Backend (turn-orchestrator):
- Shared `crate::abort::{is_set, clear, abort_signal_key}` helper.
- `run::stop` returns typed `{accepted, reason, prior_state}`; idempotent
  for unknown sessions and already-stopped records.
- Abort flag checked at every transition that precedes a costly operation:
  provisioning (clear), awaiting, prepare, execute (per-call).
- `steering::handle` now uses the shared helper instead of its private
  copy.
- Integration tests cover payload validation, unknown-session and
  already-stopped short-circuits, primitives invoked on the happy path,
  and double-stop idempotency.

Frontend (harness/web):
- Composer takes `running` + `onStop` props. While running the textarea
  is `readOnly` (preserves focus for screen readers) and the send button
  renders as a red stop button with `aria-label`/`title`. After the user
  clicks stop, the button shows "stopping..." with `aria-busy` until the
  parent flips `running` back to false.
- `submit()` is gated on `running` (not just the form's `onSubmit`) so
  Enter inside the textarea is a true no-op mid-run.
- `.composer-send` gets a pinned `min-width` so the send/stop/stopping
  labels don't shift layout.
- Vitest cases cover the morph, click routing, no-op submission, and
  reset behavior.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 15, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
workers Error Error May 15, 2026 9:39pm

Request Review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

skill-check — worker

1 verified, 24 skipped (no docs/).

Layer Result
structure
vale
ai

Three for three. Nicely done.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

Warning

Rate limit exceeded

@ytallo has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 41 minutes and 50 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 58b4a1a7-6b15-4368-815b-3ab6b29e5b3d

📥 Commits

Reviewing files that changed from the base of the PR and between f4adb92 and fef73ed.

📒 Files selected for processing (1)
  • turn-orchestrator/tests/run_stop.rs
📝 Walkthrough

Walkthrough

This PR adds a cooperative stop mechanism for in-flight LLM runs. Users can now click a stop button in the composer UI, triggering a backend run::stop call that sets an abort flag, sweeps pending approvals, and publishes a wake-up signal so the state machine checks and exits early.

Changes

Cooperative Run Stop Mechanism

Layer / File(s) Summary
Abort signal state management
turn-orchestrator/src/abort.rs, turn-orchestrator/src/lib.rs
New abort module provides per-session durable state helpers to query and clear the agent/session/<id>/abort_signal flag, centralizing abort state access across handlers and run::stop.
run::stop backend entrypoint and registration
turn-orchestrator/src/run_stop.rs, turn-orchestrator/src/lib.rs, turn-orchestrator/src/register.rs, turn-orchestrator/tests/run_stop.rs
New run_stop module implements idempotent run::stop function that accepts/rejects stop requests, triggers router::abort, approval::sweep_session, and publish-step primitives; integration tests verify idempotency, primitive invocation counts, and rejection paths.
State machine abort integration
turn-orchestrator/src/states/steering.rs, turn-orchestrator/src/states/assistant.rs, turn-orchestrator/src/states/functions.rs, turn-orchestrator/src/states/provisioning.rs
Abort checks integrated throughout state handlers: steering consolidates abort detection to centralized is_set, assistant/function handlers perform early exit to SteeringCheck when abort is set, and provisioning clears abort at init to prevent carryover from prior runs.
Composer UI and App integration
harness/web/src/components/Composer.tsx, harness/web/src/styles.css, harness/web/src/App.tsx
Composer extended with running and onStop props to morph send button into stop button; textarea becomes readonly and submission is gated while running; App bridges stop action to backend run::stop call with error handling; CSS added for stop-mode (red theme) and busy-state (dimmed, progress cursor) button styling.
Composer component tests
harness/web/src/components/Composer.test.tsx
Vitest suite validates button morphing, stop-click callback invocation, Enter-key blocking, aria-busy state transitions, and layout-stability CSS class presence.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • sergiofilhowz

Poem

🐇 A button that stops what was started,
From frontend to backend, coordinated.
Flag raised, sweep done, no more turns—
The run halts gracefully while the async churns. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding a cooperative stop button to the harness for active runs.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/harness-stop-button

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
harness/web/src/components/Composer.test.tsx (1)

66-80: ⚡ Quick win

This test is a false positive for running-gate behavior.

It never creates a non-empty draft, so onSend stays uncalled even without the running guard. Please prefill text before flipping to running and pressing Enter to validate the intended path.

Stronger test pattern
-  it("form submit is a no-op while running", () => {
-    const { onSend } = setup({ running: true });
+  it("form submit is a no-op while running", () => {
+    const { onSend, onStop, rerender } = setup({ running: false });
+    const textarea = screen.getByRole("textbox") as HTMLTextAreaElement;
+    fireEvent.change(textarea, { target: { value: "hello" } });
+    rerender(
+      <Composer
+        disabled={false}
+        onSend={onSend}
+        cwd=""
+        skillRows={null}
+        sessionMessages={[]}
+        running={true}
+        onStop={onStop}
+      />,
+    );
     // The textarea is `readOnly` while running but still focusable. Type
     // something programmatically (we set value via fireEvent for the test)
     // and try to submit via Enter.
-    const textarea = screen.getByPlaceholderText(
-      /run in flight/i,
-    ) as HTMLTextAreaElement;
     fireEvent.keyDown(textarea, {
       key: "Enter",
       code: "Enter",
       shiftKey: false,
     });
     expect(onSend).not.toHaveBeenCalled();
   });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@harness/web/src/components/Composer.test.tsx` around lines 66 - 80, The test
currently never populates the draft so pressing Enter will never call onSend
regardless of the running flag; update the test in Composer.test.tsx to first
fill the textarea (use setup or fireEvent.change on the element retrieved via
screen.getByPlaceholderText) with a non-empty value, then set up or flip to
running: true and simulate the Enter key (fireEvent.keyDown) to assert onSend is
not called; reference the existing setup({ running: true }) helper and the
textarea element retrieval to locate where to insert the prefill step before
firing the Enter event.
turn-orchestrator/tests/run_stop.rs (2)

112-124: ⚡ Quick win

Fail fast when state seeding fails in tests.

Ignoring the state::set result in Line 112 can mask fixture setup failures and make downstream assertions misleading.

Proposed fix
-    let _ = iii
+    iii
         .trigger(TriggerRequest {
@@
         })
-        .await;
+        .await
+        .expect("seed_record: state::set should succeed");
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@turn-orchestrator/tests/run_stop.rs` around lines 112 - 124, The test
currently ignores the result of iii.trigger(TriggerRequest { function_id:
"state::set", ... }). Instead fail fast when seeding state: replace the
discarded await with handling that asserts success or unwraps the Result so the
test panics on failure. Locate the call to iii.trigger / TriggerRequest in
run_stop.rs and change the await to propagate errors (e.g., .await.unwrap() or
.await.expect("state::set failed for key ...")) so failures in
turn_orchestrator::turn_state_key(session_id) setup are not masked.

71-82: ⚡ Quick win

Assert iii::durable::publish to pin the wake-up contract.

The happy-path currently verifies abort+sweep but not the step-request publish. Add this assertion so regressions in subscriber wake-up are caught.

Proposed fix
 impl Sink {
@@
     fn sweep(&self) -> Vec<Value> {
         self.sweep_calls.lock().unwrap().clone()
     }
+    fn step(&self) -> Vec<Value> {
+        self.step_publishes.lock().unwrap().clone()
+    }
 }
@@
     let sweep_calls = sink.sweep();
     assert_eq!(sweep_calls.len(), 1, "approval::sweep_session invoked once");
     assert_eq!(sweep_calls[0]["session_id"], json!(session_id));
     assert_eq!(sweep_calls[0]["reason"], json!("run_stopped"));
+
+    let step_calls = sink.step();
+    assert_eq!(step_calls.len(), 1, "iii::durable::publish invoked once");
 }

Also applies to: 223-251

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@turn-orchestrator/tests/run_stop.rs` around lines 71 - 82, Add an assertion
that the registered handler for
RegisterFunctionMessage::with_id("iii::durable::publish") actually received a
publish by checking the sink.step_publishes (or step_log) contents after the
test run; locate where iii.register_function is called and after the happy-path
that verifies abort+sweep, assert that sink.step_publishes contains at least one
payload (or the expected payload shape) to pin the wake-up contract (make the
same change in the other similar block around the 223-251 region).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@harness/web/src/components/Composer.tsx`:
- Around line 348-352: handleStopClick currently latches stopRequested before
knowing if onStop succeeds and ignores the promise; change handleStopClick to be
async, call await onStop?.() inside a try/catch, set stopRequested(true) while
the operation is pending and if the awaited onStop resolves keep it set, but if
it throws reset stopRequested(false) in the catch so the UI can retry; also
ensure callers of onStop propagate failures (reject) instead of swallowing them
so this rollback runs. Reference: handleStopClick, stopRequested,
setStopRequested, onStop.

In `@turn-orchestrator/src/run_stop.rs`:
- Around line 56-87: The handler currently ignores failures from iii.trigger
calls for "router::abort" and "approval::sweep_session" and always proceeds to
run_start::publish_step, which can make the handler return a false-success;
change the control flow in the run-stop handler so that if iii.trigger(...) for
function_id "router::abort" or "approval::sweep_session" returns Err you
propagate that failure to the caller (or return an error/accepted:false
response) instead of continuing — detect errors from iii.trigger(TriggerRequest
{ function_id: "router::abort" }) and iii.trigger(TriggerRequest { function_id:
"approval::sweep_session" }) and short-circuit (return Err or an explicit
non-accepted response) after logging, only calling run_start::publish_step(&iii,
&session_id).await when both trigger calls succeed.

In `@turn-orchestrator/src/states/provisioning.rs`:
- Around line 28-33: The unconditional call to crate::abort::clear(iii,
&record.session_id) can erase a valid in-flight stop for the current run; update
this to only clear stale pre-run abort state or make abort state run-scoped.
Specifically, change the logic around crate::abort::clear in provisioning.rs to
either (a) use a run-scoped key (include the current turn/run id `iii` in the
abort key or introduce a new API like crate::abort::clear_run(iii,
&record.session_id)) or (b) perform a conditional clear: read the existing abort
entry (e.g., crate::abort::get or peek) for `&record.session_id`, compare its
stored run/turn id or timestamp against the current `iii`/record, and only call
crate::abort::clear when the stored entry is from a previous run; ensure this
preserves cooperative-stop semantics with run::stop and aligns with
run_start::execute expectations.

---

Nitpick comments:
In `@harness/web/src/components/Composer.test.tsx`:
- Around line 66-80: The test currently never populates the draft so pressing
Enter will never call onSend regardless of the running flag; update the test in
Composer.test.tsx to first fill the textarea (use setup or fireEvent.change on
the element retrieved via screen.getByPlaceholderText) with a non-empty value,
then set up or flip to running: true and simulate the Enter key
(fireEvent.keyDown) to assert onSend is not called; reference the existing
setup({ running: true }) helper and the textarea element retrieval to locate
where to insert the prefill step before firing the Enter event.

In `@turn-orchestrator/tests/run_stop.rs`:
- Around line 112-124: The test currently ignores the result of
iii.trigger(TriggerRequest { function_id: "state::set", ... }). Instead fail
fast when seeding state: replace the discarded await with handling that asserts
success or unwraps the Result so the test panics on failure. Locate the call to
iii.trigger / TriggerRequest in run_stop.rs and change the await to propagate
errors (e.g., .await.unwrap() or .await.expect("state::set failed for key ..."))
so failures in turn_orchestrator::turn_state_key(session_id) setup are not
masked.
- Around line 71-82: Add an assertion that the registered handler for
RegisterFunctionMessage::with_id("iii::durable::publish") actually received a
publish by checking the sink.step_publishes (or step_log) contents after the
test run; locate where iii.register_function is called and after the happy-path
that verifies abort+sweep, assert that sink.step_publishes contains at least one
payload (or the expected payload shape) to pin the wake-up contract (make the
same change in the other similar block around the 223-251 region).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b5064924-a72c-4791-a34c-23ad5cfd27fe

📥 Commits

Reviewing files that changed from the base of the PR and between 590a39b and f4adb92.

📒 Files selected for processing (13)
  • harness/web/src/App.tsx
  • harness/web/src/components/Composer.test.tsx
  • harness/web/src/components/Composer.tsx
  • harness/web/src/styles.css
  • turn-orchestrator/src/abort.rs
  • turn-orchestrator/src/lib.rs
  • turn-orchestrator/src/register.rs
  • turn-orchestrator/src/run_stop.rs
  • turn-orchestrator/src/states/assistant.rs
  • turn-orchestrator/src/states/functions.rs
  • turn-orchestrator/src/states/provisioning.rs
  • turn-orchestrator/src/states/steering.rs
  • turn-orchestrator/tests/run_stop.rs

Comment on lines +348 to +352
const handleStopClick = useCallback(() => {
if (stopRequested) return;
setStopRequested(true);
void onStop?.();
}, [stopRequested, onStop]);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Stop can get stuck in “stopping…” with no retry path on failure.

At Line 350-Line 351, stopRequested is latched before onStop outcome is known, and the promise is ignored (void onStop?.()). A failed stop request leaves the button disabled until teardown, so users can’t retry.

Suggested fix
-  const handleStopClick = useCallback(() => {
-    if (stopRequested) return;
-    setStopRequested(true);
-    void onStop?.();
-  }, [stopRequested, onStop]);
+  const handleStopClick = useCallback(async () => {
+    if (stopRequested || !onStop) return;
+    setStopRequested(true);
+    try {
+      await onStop();
+    } catch {
+      setStopRequested(false);
+    }
+  }, [stopRequested, onStop]);

Also ensure the caller rejects on failure (instead of swallowing) so this rollback path can run.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const handleStopClick = useCallback(() => {
if (stopRequested) return;
setStopRequested(true);
void onStop?.();
}, [stopRequested, onStop]);
const handleStopClick = useCallback(async () => {
if (stopRequested || !onStop) return;
setStopRequested(true);
try {
await onStop();
} catch {
setStopRequested(false);
}
}, [stopRequested, onStop]);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@harness/web/src/components/Composer.tsx` around lines 348 - 352,
handleStopClick currently latches stopRequested before knowing if onStop
succeeds and ignores the promise; change handleStopClick to be async, call await
onStop?.() inside a try/catch, set stopRequested(true) while the operation is
pending and if the awaited onStop resolves keep it set, but if it throws reset
stopRequested(false) in the catch so the UI can retry; also ensure callers of
onStop propagate failures (reject) instead of swallowing them so this rollback
runs. Reference: handleStopClick, stopRequested, setStopRequested, onStop.

Comment on lines +56 to +87
if let Err(e) = iii
.trigger(TriggerRequest {
function_id: "router::abort".into(),
payload: json!({ "session_id": session_id }),
action: None,
timeout_ms: None,
})
.await
{
tracing::warn!(error = %e, %session_id, "run::stop: router::abort failed");
}

// 2) Sweep pending approvals so no queued approval can still dispatch
// its inner function after the user clicked stop.
if let Err(e) = iii
.trigger(TriggerRequest {
function_id: "approval::sweep_session".into(),
payload: json!({
"session_id": session_id,
"reason": "run_stopped",
}),
action: None,
timeout_ms: None,
})
.await
{
tracing::warn!(error = %e, %session_id, "run::stop: approval::sweep_session failed");
}

// 3) Nudge the durable subscriber so the abort branch fires immediately.
run_start::publish_step(&iii, &session_id).await;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not return accepted: true when stop side-effects fail.

If Line 58 (router::abort) fails, the handler still reports success. That can produce a false-positive “stopping succeeded” response while the run continues.

Proposed fix
 pub async fn execute(iii: III, payload: Value) -> Result<Value, IIIError> {
@@
-    // 1) Flag write via existing primitive.
+    let mut failures: Vec<&str> = Vec::new();
+
+    // 1) Flag write via existing primitive.
     if let Err(e) = iii
         .trigger(TriggerRequest {
             function_id: "router::abort".into(),
             payload: json!({ "session_id": session_id }),
@@
     {
         tracing::warn!(error = %e, %session_id, "run::stop: router::abort failed");
+        failures.push("router_abort_failed");
     }
@@
     if let Err(e) = iii
         .trigger(TriggerRequest {
@@
     {
         tracing::warn!(error = %e, %session_id, "run::stop: approval::sweep_session failed");
+        failures.push("approval_sweep_failed");
     }
@@
     run_start::publish_step(&iii, &session_id).await;
+
+    if !failures.is_empty() {
+        return Ok(json!({
+            "session_id": session_id,
+            "accepted": false,
+            "reason": "stop_side_effect_failed",
+            "failures": failures,
+            "prior_state": prior_state.map(|s| s.as_str()),
+        }));
+    }
@@
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@turn-orchestrator/src/run_stop.rs` around lines 56 - 87, The handler
currently ignores failures from iii.trigger calls for "router::abort" and
"approval::sweep_session" and always proceeds to run_start::publish_step, which
can make the handler return a false-success; change the control flow in the
run-stop handler so that if iii.trigger(...) for function_id "router::abort" or
"approval::sweep_session" returns Err you propagate that failure to the caller
(or return an error/accepted:false response) instead of continuing — detect
errors from iii.trigger(TriggerRequest { function_id: "router::abort" }) and
iii.trigger(TriggerRequest { function_id: "approval::sweep_session" }) and
short-circuit (return Err or an explicit non-accepted response) after logging,
only calling run_start::publish_step(&iii, &session_id).await when both trigger
calls succeed.

Comment on lines +28 to +33
// Single source of truth for "fresh turn entry": clear any sticky abort
// flag carried over from a previous stopped run. Doing this here (not in
// `run_start::execute`) avoids a race where `run_start` clears the flag
// while a previous turn's subscriber tick is still in-flight.
crate::abort::clear(iii, &record.session_id).await;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Unconditional abort clear can drop a valid stop request for the current run

At Line 32, clearing by session_id unconditionally introduces a race with run::stop: if stop is set just before/during provisioning, this clear can erase it and the run won’t short-circuit as expected. This violates the cooperative-stop behavior under timing contention.

Consider making abort state run-scoped (e.g., include turn/run id in key) or using a conditional clear that only removes stale pre-run state.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@turn-orchestrator/src/states/provisioning.rs` around lines 28 - 33, The
unconditional call to crate::abort::clear(iii, &record.session_id) can erase a
valid in-flight stop for the current run; update this to only clear stale
pre-run abort state or make abort state run-scoped. Specifically, change the
logic around crate::abort::clear in provisioning.rs to either (a) use a
run-scoped key (include the current turn/run id `iii` in the abort key or
introduce a new API like crate::abort::clear_run(iii, &record.session_id)) or
(b) perform a conditional clear: read the existing abort entry (e.g.,
crate::abort::get or peek) for `&record.session_id`, compare its stored run/turn
id or timestamp against the current `iii`/record, and only call
crate::abort::clear when the stored entry is from a previous run; ensure this
preserves cooperative-stop semantics with run::stop and aligns with
run_start::execute expectations.

…ndlers

Adds 6 integration tests targeting the abort flow's load-bearing edges
that the original PR's 5 tests left uncovered:

- abort::is_set / clear round-trip via state::get/state::set
- handle_awaiting short-circuits to SteeringCheck without burning a
  turn_count when the flag is set
- handle_prepare drops pending_function_calls on abort
- handle_execute checks the flag between calls so a long batch can be
  interrupted mid-loop
- handle_provisioning clears the sticky flag at the top of every fresh
  turn entry — the single source of truth for "we are starting a new
  turn" (decided over run_start::execute to avoid the race identified in
  the autoplan dual-voice review)
- user can continue working on the same session_id after stop: a fresh
  run::start drives through provisioning which scrubs the flag, so the
  new turn proceeds normally

Drives transitions::step directly rather than going through the durable
subscriber — keeps tests fast and decoupled from publish timing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant