Skip to content

feat(runtime): mid-stream cancellation#161

Merged
john-rocky merged 1 commit into
mainfrom
feat/mid-stream-cancellation
Apr 30, 2026
Merged

feat(runtime): mid-stream cancellation#161
john-rocky merged 1 commit into
mainfrom
feat/mid-stream-cancellation

Conversation

@john-rocky
Copy link
Copy Markdown
Owner

Summary

AsyncStream does not auto-cancel the producer Task when the consumer breaks out of for await. Stopping generation mid-stream (user tap, navigation, deinit) kept burning ANE energy until the loop hit maxDecode.

This PR wires the missing piece:

  • bind the producer Task to a genTask handle
  • Task.checkCancellation() at chunk boundaries in both prefill paths (hybrid + decode-loop fallback) and at the top of the decode loop, so cancellation lands within one chunk / one token
  • swallow CancellationError silently — it's the expected user path
  • continuation.onTermination → genTask.cancel() so a stream deinit or consumer drop stops generation

No API change. Cancelling a generation now releases the ANE within one decode step (~30 ms) instead of running to maxDecode.

Extracted from feat/litert-perf-adoptions (commit 8598388, §T4). Other items (S1/S2/T1/T3/T5) will land separately.

Test plan

  • swift build on macos-15
  • Sample app: start a 4K-token generation, navigate away → tok/s ticker stops within ~1 step (was: continued until done)
  • No regression in normal completion path (CancellationError catch only fires on cancel)

AsyncStream does NOT auto-cancel the producer Task when the consumer
breaks out of `for await`. A user who taps stop or navigates away
during a long generation kept burning ANE energy until the loop hit
maxDecode. This wires the missing piece:

- bind the producer Task to a `genTask` handle
- check Task.checkCancellation() at chunk boundaries in both prefill
  paths (hybrid + decode-loop fallback) and at the top of the decode
  loop, so cancellation lands within one chunk / one token
- catch CancellationError silently — it's the expected user path
- continuation.onTermination → genTask.cancel() so a stream deinit or
  the consumer dropping subscription stops generation immediately

Net effect: no API change, no token-output difference, but cancelling
a generation now releases the ANE within one decode step (~30 ms)
instead of seconds-to-minutes. Extracted from feat/litert-perf-adoptions
(commit 8598388 §T4); other items in that branch will land separately.
@john-rocky john-rocky force-pushed the feat/mid-stream-cancellation branch from 6d54bab to e3bf3b2 Compare April 30, 2026 02:55
@john-rocky john-rocky merged commit 68541bf into main Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant