fix: stream idle timeout and goroutine leak in processStepStream#191
Open
amosbird wants to merge 4 commits intocharmbracelet:mainfrom
Open
fix: stream idle timeout and goroutine leak in processStepStream#191amosbird wants to merge 4 commits intocharmbracelet:mainfrom
amosbird wants to merge 4 commits intocharmbracelet:mainfrom
Conversation
74078ad to
2502026
Compare
- Add StreamIdleTimeout option to AgentStreamCall/AgentCall to cancel streams that stop sending data within a configurable duration - Fix goroutine leak: use sync.Once + defer to ensure close(toolChan) and toolExecutionWg.Wait() run on all return paths in processStepStream
When the consumer breaks out of the range loop (yield returns false) and an idle timeout fires concurrently, withIdleTimeout would call yield again after it already returned false, violating the range-over-function protocol and causing a runtime panic: runtime error: range function continued iteration after function for loop body returned false Track whether the consumer has stopped with a 'stopped' flag and skip the timeout error yield when it's already set.
When streamIdleTimeout fires, it cancels the child streamCtx. The provider then yields a StreamPartTypeError carrying context.Canceled. The retry logic's isAbortError checked the error value (not the parent ctx), so it treated this as a user-initiated cancellation and aborted without retrying. Two fixes: - isAbortError now checks ctx.Err() instead of errors.Is(err, context.Canceled). If the parent context is still alive, the cancellation came from a child (idle timeout), not the user. - withIdleTimeout replaces context.Canceled errors with a descriptive "stream idle timeout exceeded" error when the timer has fired, so even if the error escapes the retry loop it won't be confused with a user cancel.
- Increase default MaxRetries from 2 to 5 for better resilience during long tool-call parameter streaming (e.g. Write tool with large content) - Call OnRetry callback for idle timeout errors too (providerErr may be nil in that case), so callers can clean up partial state on retry - Update idle timeout test to verify retry count with MaxRetries=2
594470f to
803c63d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stream idle timeout: Add
StreamIdleTimeoutoption toAgentStreamCall/AgentCall. When set, if the LLM stream produces no data within the specified duration, the stream is cancelled and an error is returned. This prevents the agent from hanging indefinitely when a provider's SSE stream stalls mid-response. The timeout uses a separate context so tool execution is unaffected.Goroutine leak fix:
processStepStreamhas ~17 early return paths that could skipclose(toolChan)andtoolExecutionWg.Wait(), leaking the tool-coordinator goroutine. Fixed by wrapping cleanup insync.Once+defer.Test plan
TestStreamingAgentIdleTimeout— hanging stream cancelled after idle timeout, returns errorTestStreamingAgentIdleTimeoutResetsOnChunks— slow-but-active stream succeeds (timer resets on each chunk)TestStreamingAgentCallbackErrorCleanup— callback error propagates and goroutines are cleaned up