mitm: make upstream ResponseHeaderTimeout configurable (default 5m)#196
Draft
devniel wants to merge 1 commit into
Draft
mitm: make upstream ResponseHeaderTimeout configurable (default 5m)#196devniel wants to merge 1 commit into
devniel wants to merge 1 commit into
Conversation
The proxy's upstream http.Transport hardcoded ResponseHeaderTimeout to 30s. Agents proxy LLM APIs, and reasoning / "thinking" models (Gemini Pro, Claude extended thinking, o-series) routinely withhold HTTP response headers for tens of seconds — sometimes minutes — while they think. A 30s cap aborts those otherwise-healthy streaming completions with a 502 upstream_error. Default raised to 5m and made overridable via AGENT_VAULT_RESPONSE_HEADER_TIMEOUT (Go duration; "0" disables it).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The proxy's upstream
http.TransporthardcodesResponseHeaderTimeout: 30 * time.Secondininternal/mitm/proxy.go. For agents that proxy LLM APIs, reasoning / "thinking" models (Gemini 2.x Pro, Claude extended thinking, the o-series) routinely withhold HTTP response headers for tens of seconds — sometimes minutes — while they think. A 30s cap aborts those otherwise-healthy streaming completions with a502 upstream_error, which surfaces as aProxyErrorin the SDK and silently kills the agent's turn.Change
Default raised to 5m (matches the kind of think-time these models actually take in practice) and made overridable via the
AGENT_VAULT_RESPONSE_HEADER_TIMEOUTenv var (Go duration; "0" disables the timeout entirely).Test plan
go test ./internal/mitm/...)Opening as a draft for upstream feedback on the default value + env-var name before marking ready. Happy to land as e.g.
AGENT_VAULT_UPSTREAM_RESPONSE_HEADER_TIMEOUTor any other naming the maintainers prefer.