Remote streamable-http session goes stale → gateway returns silent "success" with empty results (writes lost), no error, no reconnect

I'm a product owner with enough technical depth to be dangerous, leaning on Claude to do the deeper digging - so treat the analysis below as "carefully investigated, but I haven't read your Go." I've been running the gateway in front of Atlassian's hosted MCP (some context on that setup here: https://productowner.ro/blog/almost-converted-mcp-json-to-toon/) and hit a reproducible failure that silently loses writes. Reporting it carefully because the failure mode is easy to miss.

## TL;DR

After a remote MCP server's streamable-http session goes stale, the gateway keeps proxying calls but they **return empty results reported as success** - in tens of milliseconds, with **no error logged anywhere**, even at `--verbose`. Reads coming back blank is annoying; the real problem is on writes: an `updateConfluencePage` / `editJiraIssue` **reports success but never actually applies the change** - the upstream version doesn't increment, and the edit is lost with no signal. I've lost a Jira sprint write and two Confluence pushes this way before noticing.

`/mcp reconnect` (which respawns the gateway process) restores it for a while, then it recurs.

This is adjacent to #412 but distinct: #412 is the gateway **as a server** dropping client connections with a *hard error*. This is the gateway **as a client to a remote upstream**, where the failure is a *silent empty success* - worse, because nothing surfaces it.

## Plain-language vs. technical

| What I see | What's actually happening |
|---|---|
| 🔴 A write "succeeds" but didn't take. | `updateConfluencePage`/`editJiraIssue` return success; upstream version unchanged. Silent data loss, no error. |
| 🟠 Reads suddenly come back empty. | Tool calls "completed successfully" but with empty content. |
| 🟡 It happens after the gateway's been up a little while, under load. | Works fine for ~10-15 min after a fresh start, then stops returning real data - frequently in the middle of a run of write calls. |
| ⚪ Reconnecting "fixes" it temporarily. | `/mcp reconnect` mints a fresh upstream session → healthy again for a while. |

## Environment

- `docker mcp` plugin **v0.42.1** (gateway advertises serverVersion `Docker AI MCP Gateway 2.0.1` on the wire)
- Docker Desktop **29.5.2**, macOS (Apple Silicon)
- Client: Claude Code (stdio transport to the gateway)
- Remote server: Atlassian's hosted MCP - `https://mcp.atlassian.com/v1/mcp`, `transport=streamable-http`, authenticated via the gateway's built-in OAuth (`docker mcp oauth authorize`)
- Gateway launched with `--verbose` and `--servers <list incl. the remote>`

## Reproduction

1. Add a **remote** streamable-http MCP server that uses the gateway's OAuth (I used `atlassian-remote`).
2. Run the gateway and exercise the remote's tools normally (reads + writes) over ~10-20 minutes.
3. Observe healthy calls completing in ~0.6-2s.
4. After a while (for me ~11-15 min from a fresh process, sooner under a burst of writes), every call to that remote starts "completing successfully" in **40-70ms** with **empty** content. Writes report success but don't commit.
5. `/mcp reconnect` → healthy again for another ~10-15 min.

## Evidence (from the Claude Code MCP debug log, identifiers scrubbed)

Healthy (fresh process), real upstream round-trips:

```
11:08:30  Calling MCP tool: <remote>__searchJiraIssuesUsingJql
11:08:31  completed successfully in 800ms
11:09:00  Calling MCP tool: <remote>__editJiraIssue
11:09:01  completed successfully in 1s
```

Wedged (~6 min later, same process), instant empty "successes":

```
11:15:38  Calling MCP tool: <remote>__searchJiraIssuesUsingJql
11:15:38  completed successfully in 67ms
11:15:46  ... 62ms
11:15:50  ... 58ms
11:17:30  Calling MCP tool: <remote>__updateConfluencePage
11:17:30  completed successfully in 61ms      <-- write "succeeded"; page version did NOT change
11:17:38  Calling MCP tool: <remote>__getConfluencePage
11:17:38  completed successfully in 48ms      <-- empty
```

Two things I think matter:

- **`--verbose` emits nothing per-request.** The only stderr I get is the startup banner (config read, image pulls, tool listing, OAuth loop start). During the wedge: **zero** stderr - no 401, no "session not found", no HTTP status. So from the logs there's no error to act on; the gateway just returns empty.
- **OAuth refresh is running, yet it still wedges.** The startup banner shows `Starting OAuth notification monitor` and `Started OAuth provider loop for <remote>`. So this doesn't look like access-token expiry (refresh is active) - it looks like the **upstream streamable-http session itself being recycled/invalidated**, with the gateway not re-establishing it and not detecting that responses are empty.

This lines up with the same root class reported elsewhere - server-side streamable-http session invalidation with no client-side re-establish: github/gh-aw#23153, anomalyco/opencode#25137, NousResearch/hermes-agent#13383.

## What seems to be the right fix (and why I think it stalled)

The keepalive direction looks correct. A commit referenced from [#412](https://github.com/docker/mcp-gateway/issues/412) - [`71b0a90`](https://github.com/docker/mcp-gateway/commit/71b0a907800af1ec88dcb7643e755fc7b3297f2b) *"fix(sessions): pong response resets inactivity timer (MCP-spec liveness…)"* - is exactly the shape I'd expect (don't let an idle session lapse). But as far as I can tell it never landed: the commit 404s (the fork it lived on is no longer reachable), there's no open PR carrying it, and it's in no release (latest is v0.42.2, whose only session-related change is "reuse containers per session" - local containers, not the remote upstream).

Two complementary asks, in priority order:

1. **Don't report empty upstream responses as success.** When a streamable-http call to a remote returns empty/no-content where data is expected (or the session is gone), surface an error to the client instead of a silent empty success. Silent success on writes is the data-loss vector; even without a full reconnect fix, failing loudly would stop the bleeding.
2. **Re-establish a recycled upstream session** (the keepalive/`71b0a90` direction, plus reconnect-on-stale-session), so it doesn't wedge in the first place.

## Confidence

High confidence on the symptom and timing, medium on the exact mechanism - the session-recycling call is inferred from behavior + logs, not source. Happy to provide a fuller `--verbose` log, test a patch against my setup (I can reproduce on demand), or help however's useful.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remote streamable-http session goes stale → gateway returns silent "success" with empty results (writes lost), no error, no reconnect #505

TL;DR

Plain-language vs. technical

Environment

Reproduction

Evidence (from the Claude Code MCP debug log, identifiers scrubbed)

What seems to be the right fix (and why I think it stalled)

Confidence

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

What I see	What's actually happening
🔴 A write "succeeds" but didn't take.	`updateConfluencePage`/`editJiraIssue` return success; upstream version unchanged. Silent data loss, no error.
🟠 Reads suddenly come back empty.	Tool calls "completed successfully" but with empty content.
🟡 It happens after the gateway's been up a little while, under load.	Works fine for ~10-15 min after a fresh start, then stops returning real data - frequently in the middle of a run of write calls.
⚪ Reconnecting "fixes" it temporarily.	`/mcp reconnect` mints a fresh upstream session → healthy again for a while.

Uh oh!

Remote streamable-http session goes stale → gateway returns silent "success" with empty results (writes lost), no error, no reconnect #505

Description

TL;DR

Plain-language vs. technical

Environment

Reproduction

Evidence (from the Claude Code MCP debug log, identifiers scrubbed)

What seems to be the right fix (and why I think it stalled)

Confidence

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions