I'm a product owner with enough technical depth to be dangerous, leaning on Claude to do the deeper digging - so treat the analysis below as "carefully investigated, but I haven't read your Go." I've been running the gateway in front of Atlassian's hosted MCP (some context on that setup here: https://productowner.ro/blog/almost-converted-mcp-json-to-toon/) and hit a reproducible failure that silently loses writes. Reporting it carefully because the failure mode is easy to miss.
TL;DR
After a remote MCP server's streamable-http session goes stale, the gateway keeps proxying calls but they return empty results reported as success - in tens of milliseconds, with no error logged anywhere, even at --verbose. Reads coming back blank is annoying; the real problem is on writes: an updateConfluencePage / editJiraIssue reports success but never actually applies the change - the upstream version doesn't increment, and the edit is lost with no signal. I've lost a Jira sprint write and two Confluence pushes this way before noticing.
/mcp reconnect (which respawns the gateway process) restores it for a while, then it recurs.
This is adjacent to #412 but distinct: #412 is the gateway as a server dropping client connections with a hard error. This is the gateway as a client to a remote upstream, where the failure is a silent empty success - worse, because nothing surfaces it.
Plain-language vs. technical
| What I see |
What's actually happening |
| 🔴 A write "succeeds" but didn't take. |
updateConfluencePage/editJiraIssue return success; upstream version unchanged. Silent data loss, no error. |
| 🟠 Reads suddenly come back empty. |
Tool calls "completed successfully" but with empty content. |
| 🟡 It happens after the gateway's been up a little while, under load. |
Works fine for ~10-15 min after a fresh start, then stops returning real data - frequently in the middle of a run of write calls. |
| ⚪ Reconnecting "fixes" it temporarily. |
/mcp reconnect mints a fresh upstream session → healthy again for a while. |
Environment
docker mcp plugin v0.42.1 (gateway advertises serverVersion Docker AI MCP Gateway 2.0.1 on the wire)
- Docker Desktop 29.5.2, macOS (Apple Silicon)
- Client: Claude Code (stdio transport to the gateway)
- Remote server: Atlassian's hosted MCP -
https://mcp.atlassian.com/v1/mcp, transport=streamable-http, authenticated via the gateway's built-in OAuth (docker mcp oauth authorize)
- Gateway launched with
--verbose and --servers <list incl. the remote>
Reproduction
- Add a remote streamable-http MCP server that uses the gateway's OAuth (I used
atlassian-remote).
- Run the gateway and exercise the remote's tools normally (reads + writes) over ~10-20 minutes.
- Observe healthy calls completing in ~0.6-2s.
- After a while (for me ~11-15 min from a fresh process, sooner under a burst of writes), every call to that remote starts "completing successfully" in 40-70ms with empty content. Writes report success but don't commit.
/mcp reconnect → healthy again for another ~10-15 min.
Evidence (from the Claude Code MCP debug log, identifiers scrubbed)
Healthy (fresh process), real upstream round-trips:
11:08:30 Calling MCP tool: <remote>__searchJiraIssuesUsingJql
11:08:31 completed successfully in 800ms
11:09:00 Calling MCP tool: <remote>__editJiraIssue
11:09:01 completed successfully in 1s
Wedged (~6 min later, same process), instant empty "successes":
11:15:38 Calling MCP tool: <remote>__searchJiraIssuesUsingJql
11:15:38 completed successfully in 67ms
11:15:46 ... 62ms
11:15:50 ... 58ms
11:17:30 Calling MCP tool: <remote>__updateConfluencePage
11:17:30 completed successfully in 61ms <-- write "succeeded"; page version did NOT change
11:17:38 Calling MCP tool: <remote>__getConfluencePage
11:17:38 completed successfully in 48ms <-- empty
Two things I think matter:
--verbose emits nothing per-request. The only stderr I get is the startup banner (config read, image pulls, tool listing, OAuth loop start). During the wedge: zero stderr - no 401, no "session not found", no HTTP status. So from the logs there's no error to act on; the gateway just returns empty.
- OAuth refresh is running, yet it still wedges. The startup banner shows
Starting OAuth notification monitor and Started OAuth provider loop for <remote>. So this doesn't look like access-token expiry (refresh is active) - it looks like the upstream streamable-http session itself being recycled/invalidated, with the gateway not re-establishing it and not detecting that responses are empty.
This lines up with the same root class reported elsewhere - server-side streamable-http session invalidation with no client-side re-establish: github/gh-aw#23153, anomalyco/opencode#25137, NousResearch/hermes-agent#13383.
What seems to be the right fix (and why I think it stalled)
The keepalive direction looks correct. A commit referenced from #412 - 71b0a90 "fix(sessions): pong response resets inactivity timer (MCP-spec liveness…)" - is exactly the shape I'd expect (don't let an idle session lapse). But as far as I can tell it never landed: the commit 404s (the fork it lived on is no longer reachable), there's no open PR carrying it, and it's in no release (latest is v0.42.2, whose only session-related change is "reuse containers per session" - local containers, not the remote upstream).
Two complementary asks, in priority order:
- Don't report empty upstream responses as success. When a streamable-http call to a remote returns empty/no-content where data is expected (or the session is gone), surface an error to the client instead of a silent empty success. Silent success on writes is the data-loss vector; even without a full reconnect fix, failing loudly would stop the bleeding.
- Re-establish a recycled upstream session (the keepalive/
71b0a90 direction, plus reconnect-on-stale-session), so it doesn't wedge in the first place.
Confidence
High confidence on the symptom and timing, medium on the exact mechanism - the session-recycling call is inferred from behavior + logs, not source. Happy to provide a fuller --verbose log, test a patch against my setup (I can reproduce on demand), or help however's useful.
I'm a product owner with enough technical depth to be dangerous, leaning on Claude to do the deeper digging - so treat the analysis below as "carefully investigated, but I haven't read your Go." I've been running the gateway in front of Atlassian's hosted MCP (some context on that setup here: https://productowner.ro/blog/almost-converted-mcp-json-to-toon/) and hit a reproducible failure that silently loses writes. Reporting it carefully because the failure mode is easy to miss.
TL;DR
After a remote MCP server's streamable-http session goes stale, the gateway keeps proxying calls but they return empty results reported as success - in tens of milliseconds, with no error logged anywhere, even at
--verbose. Reads coming back blank is annoying; the real problem is on writes: anupdateConfluencePage/editJiraIssuereports success but never actually applies the change - the upstream version doesn't increment, and the edit is lost with no signal. I've lost a Jira sprint write and two Confluence pushes this way before noticing./mcp reconnect(which respawns the gateway process) restores it for a while, then it recurs.This is adjacent to #412 but distinct: #412 is the gateway as a server dropping client connections with a hard error. This is the gateway as a client to a remote upstream, where the failure is a silent empty success - worse, because nothing surfaces it.
Plain-language vs. technical
updateConfluencePage/editJiraIssuereturn success; upstream version unchanged. Silent data loss, no error./mcp reconnectmints a fresh upstream session → healthy again for a while.Environment
docker mcpplugin v0.42.1 (gateway advertises serverVersionDocker AI MCP Gateway 2.0.1on the wire)https://mcp.atlassian.com/v1/mcp,transport=streamable-http, authenticated via the gateway's built-in OAuth (docker mcp oauth authorize)--verboseand--servers <list incl. the remote>Reproduction
atlassian-remote)./mcp reconnect→ healthy again for another ~10-15 min.Evidence (from the Claude Code MCP debug log, identifiers scrubbed)
Healthy (fresh process), real upstream round-trips:
Wedged (~6 min later, same process), instant empty "successes":
Two things I think matter:
--verboseemits nothing per-request. The only stderr I get is the startup banner (config read, image pulls, tool listing, OAuth loop start). During the wedge: zero stderr - no 401, no "session not found", no HTTP status. So from the logs there's no error to act on; the gateway just returns empty.Starting OAuth notification monitorandStarted OAuth provider loop for <remote>. So this doesn't look like access-token expiry (refresh is active) - it looks like the upstream streamable-http session itself being recycled/invalidated, with the gateway not re-establishing it and not detecting that responses are empty.This lines up with the same root class reported elsewhere - server-side streamable-http session invalidation with no client-side re-establish: github/gh-aw#23153, anomalyco/opencode#25137, NousResearch/hermes-agent#13383.
What seems to be the right fix (and why I think it stalled)
The keepalive direction looks correct. A commit referenced from #412 -
71b0a90"fix(sessions): pong response resets inactivity timer (MCP-spec liveness…)" - is exactly the shape I'd expect (don't let an idle session lapse). But as far as I can tell it never landed: the commit 404s (the fork it lived on is no longer reachable), there's no open PR carrying it, and it's in no release (latest is v0.42.2, whose only session-related change is "reuse containers per session" - local containers, not the remote upstream).Two complementary asks, in priority order:
71b0a90direction, plus reconnect-on-stale-session), so it doesn't wedge in the first place.Confidence
High confidence on the symptom and timing, medium on the exact mechanism - the session-recycling call is inferred from behavior + logs, not source. Happy to provide a fuller
--verboselog, test a patch against my setup (I can reproduce on demand), or help however's useful.