Skip to content

fix(common,frontend): observe ctx in 2 streaming RPCs (GetMempoolStream, GetSubtreeRoots) + GetTreeState hygiene#561

Open
dingledropper wants to merge 1 commit into
zcash:masterfrom
dingledropper:fix/streaming-rpc-cancel-hygiene
Open

fix(common,frontend): observe ctx in 2 streaming RPCs (GetMempoolStream, GetSubtreeRoots) + GetTreeState hygiene#561
dingledropper wants to merge 1 commit into
zcash:masterfrom
dingledropper:fix/streaming-rpc-cancel-hygiene

Conversation

@dingledropper
Copy link
Copy Markdown
Contributor

Summary

Follow-up to PR #560 (0b155d5). That PR's sibling-handler audit used a coarse rule — "iterates inline and propagates cancellation through resp.Send errors" — and named six handlers clean against it. Re-auditing the same six handlers under a stricter rule ("ctx must be observable on every blocking op, not only on resp.Send") finds that three of them fail it. This PR fixes two of those three. The third (GetMempoolTx, which combines the same cancel-disrespect with a lock-amplified head-of-line blocking pattern) is under separate disclosure.

The re-audit also identified GetTreeState's SkipHash retry loop as ctx-blind in the same way. In practice that loop terminates in one iteration on the active chain because z_gettreestate (zcashd src/rpc/blockchain.cpp:1411) hard-stops the SkipHash walk at the Sapling activation height, so the GetTreeState change in this PR is a hygiene companion — not a DoS fix — but it mirrors the PR #560 pattern across the rest of the streaming-RPC surface.

Re-audit table

Handler PR #560 verdict Re-audit verdict (stricter rule) Failure mode Fixed in this PR
GetTaddressBalanceStream propagates via Send clean n/a (single RPC, no loop)
GetAddressUtxosStream propagates via Send clean n/a (single RPC + in-mem iterate)
GetMempoolStream propagates via Send fails empty-mempool block interval: sendToClient never invoked → cancel signal never observed → handler goroutine survives until next tip yes
GetSubtreeRoots propagates via Send fails per-subtree common.GetBlock does 2 RPCs each on cache-miss; no ctx between iterations yes
GetMempoolTx propagates via Send fails s.mutex held across 1 + N sequential RawRequest calls; concurrent callers serialize behind active refresher under separate disclosure
GetTaddressTransactions propagates via Send fails (soft) inner GetTransaction(ctx, …) silently ignores its ctx parameter; the 30s timeout context constructed at service.go:139 is dead follow-up PR (code-quality)
GetTreeState (not in PR #560 audit list) hygiene-only per-iteration RawRequest("z_gettreestate") lacks <-ctx.Done() check, but the loop is bounded to 1 iteration in practice by zcashd's activation-height guard at blockchain.cpp:1411 yes (companion, no DoS claim)

Fix

  • common/mempool.go: add ctx context.Context as the first parameter of GetMempool(…); replace the inner Time.Sleep(200ms) with select { case <-time.After(200*time.Millisecond): case <-ctx.Done(): return ctx.Err() }; pass ctx.Err() early on the per-iteration top of the loop.
  • frontend/service.go GetMempoolStream: pass resp.Context() to common.GetMempool.
  • frontend/service.go GetSubtreeRoots: insert if err := resp.Context().Err(); err != nil { return status.FromContextError(err).Err() } at the top of the per-subtree loop body (lines 868-889).
  • frontend/service.go GetTreeState: insert the same ctx-check at the top of the SkipHash retry loop body (lines 339-362). Note in code comment that the loop is bounded to one iteration in practice; the check is for symmetry, not for DoS defense.
  • frontend/service.go GetLatestTreeState: delegates to GetTreeState; no code change needed beyond what the GetTreeState change above gives transitively.

Test plan

  • go build ./...
  • go test ./... (existing tests should still pass)
  • New regression test TestGetMempoolStreamCancelOnEmptyMempool: launch a regtest stream, cancel its context after 100ms with an empty mempool, assert runtime.NumGoroutine() returns to baseline within one block interval rather than persisting until tip change.

Out-of-scope follow-ups (intentional)

  • Thread ctx through common.RawRequest (the btcd rpcclient wrapper). That requires the upstream rpcclient.Client to support RawRequestWithContext, which is a larger surgery. The minimum fix per handler — checking ctx between RPCs — is what this PR does.
  • GetSubtreeRoots server-side MaxEntries cap. The MaxEntries proto field documents "0 for all entries" and has no server-side bound; a malicious client can request arbitrarily many subtrees. This is a separate DoS class from cancellation-hygiene and will be filed as its own issue.
  • GetTaddressTransactions dead-letter ctx. GetTransaction takes (ctx context.Context, …) but never reads ctx. Will be filed as a small code-quality PR in the next 1-2 weeks.

Notes

AI disclosure

This finding was identified through a re-audit pass, and this PR description and the underlying patches were drafted, with assistance from Claude (Anthropic). I reviewed each step and am the responsible author.

…, GetSubtreeRoots) + GetTreeState hygiene

Follow-up to PR zcash#560 (0b155d5). That PR's sibling audit used the rule
"iterate inline and propagate cancellation through resp.Send errors" and
named six handlers clean. Re-auditing the same handlers under the
stricter rule "ctx must be observable on every blocking op, not only on
resp.Send" finds that GetMempoolStream and GetSubtreeRoots fail it.

- common/mempool.go: GetMempool now takes ctx and uses a cancel-aware
  select around the 200ms wait. On an empty mempool the original loop
  never invokes sendToClient and the prior Time.Sleep is non-cancellable,
  so a disconnected client's goroutine survived until the next tip
  change (~75s avg on mainnet). With ctx threading, cancel propagates
  within 200ms.
- common/common.go: add Time.After to the testable time wrapper.
- cmd/root.go: wire Time.After to the stdlib in production.
- frontend/service.go GetMempoolStream: pass resp.Context() to
  common.GetMempool.
- frontend/service.go GetSubtreeRoots: ctx.Err check between
  per-subtree common.GetBlock calls (each can be 2 zcashd RPCs on
  cache-miss).
- frontend/service.go GetTreeState: ctx.Err check at the top of the
  SkipHash retry loop. In practice the loop terminates in one iteration
  on the active chain because zcashd's z_gettreestate hard-stops the
  SkipHash walk at the Sapling activation height; this is a hygiene
  companion, not a DoS fix.
- common/common_test.go: add TestMempoolStreamCancelOnEmptyMempool
  regression test demonstrating the cancel-aware sleep returns within
  100ms of cancel on an empty mempool with stable tip. The existing
  TestMempoolStream is updated to pass context.Background() and now
  sets Time.After = afterStub for determinism.

A third sibling, GetMempoolTx, is under separate disclosure (lock-
amplified head-of-line blocking distinct from the synchronous-callback
pattern fixed here). A fourth handler, GetTaddressTransactions, has a
soft dead-letter ctx parameter that will be addressed in a small
follow-up PR.

Re-audit table for the six handlers PR zcash#560 named clean:
| Handler                       | Re-audit verdict     | Fixed here                  |
| GetTaddressBalanceStream      | clean                | -                           |
| GetAddressUtxosStream         | clean                | -                           |
| GetMempoolStream              | fails (empty-pool)   | yes                         |
| GetSubtreeRoots               | fails (per-iter)     | yes                         |
| GetMempoolTx                  | fails (lock-amp)     | under separate disclosure   |
| GetTaddressTransactions       | fails (soft)         | follow-up PR (code-quality) |

Out-of-scope follow-ups noted in the PR description.
@nuttycom nuttycom requested a review from LarryRuane May 22, 2026 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant