fix(common,frontend): observe ctx in 2 streaming RPCs (GetMempoolStream, GetSubtreeRoots) + GetTreeState hygiene#561
Open
dingledropper wants to merge 1 commit into
Conversation
…, GetSubtreeRoots) + GetTreeState hygiene Follow-up to PR zcash#560 (0b155d5). That PR's sibling audit used the rule "iterate inline and propagate cancellation through resp.Send errors" and named six handlers clean. Re-auditing the same handlers under the stricter rule "ctx must be observable on every blocking op, not only on resp.Send" finds that GetMempoolStream and GetSubtreeRoots fail it. - common/mempool.go: GetMempool now takes ctx and uses a cancel-aware select around the 200ms wait. On an empty mempool the original loop never invokes sendToClient and the prior Time.Sleep is non-cancellable, so a disconnected client's goroutine survived until the next tip change (~75s avg on mainnet). With ctx threading, cancel propagates within 200ms. - common/common.go: add Time.After to the testable time wrapper. - cmd/root.go: wire Time.After to the stdlib in production. - frontend/service.go GetMempoolStream: pass resp.Context() to common.GetMempool. - frontend/service.go GetSubtreeRoots: ctx.Err check between per-subtree common.GetBlock calls (each can be 2 zcashd RPCs on cache-miss). - frontend/service.go GetTreeState: ctx.Err check at the top of the SkipHash retry loop. In practice the loop terminates in one iteration on the active chain because zcashd's z_gettreestate hard-stops the SkipHash walk at the Sapling activation height; this is a hygiene companion, not a DoS fix. - common/common_test.go: add TestMempoolStreamCancelOnEmptyMempool regression test demonstrating the cancel-aware sleep returns within 100ms of cancel on an empty mempool with stable tip. The existing TestMempoolStream is updated to pass context.Background() and now sets Time.After = afterStub for determinism. A third sibling, GetMempoolTx, is under separate disclosure (lock- amplified head-of-line blocking distinct from the synchronous-callback pattern fixed here). A fourth handler, GetTaddressTransactions, has a soft dead-letter ctx parameter that will be addressed in a small follow-up PR. Re-audit table for the six handlers PR zcash#560 named clean: | Handler | Re-audit verdict | Fixed here | | GetTaddressBalanceStream | clean | - | | GetAddressUtxosStream | clean | - | | GetMempoolStream | fails (empty-pool) | yes | | GetSubtreeRoots | fails (per-iter) | yes | | GetMempoolTx | fails (lock-amp) | under separate disclosure | | GetTaddressTransactions | fails (soft) | follow-up PR (code-quality) | Out-of-scope follow-ups noted in the PR description.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to PR #560 (
0b155d5). That PR's sibling-handler audit used a coarse rule — "iterates inline and propagates cancellation throughresp.Senderrors" — and named six handlers clean against it. Re-auditing the same six handlers under a stricter rule ("ctx must be observable on every blocking op, not only onresp.Send") finds that three of them fail it. This PR fixes two of those three. The third (GetMempoolTx, which combines the same cancel-disrespect with a lock-amplified head-of-line blocking pattern) is under separate disclosure.The re-audit also identified
GetTreeState's SkipHash retry loop as ctx-blind in the same way. In practice that loop terminates in one iteration on the active chain becausez_gettreestate(zcashdsrc/rpc/blockchain.cpp:1411) hard-stops the SkipHash walk at the Sapling activation height, so the GetTreeState change in this PR is a hygiene companion — not a DoS fix — but it mirrors the PR #560 pattern across the rest of the streaming-RPC surface.Re-audit table
GetTaddressBalanceStreamGetAddressUtxosStreamGetMempoolStreamsendToClientnever invoked → cancel signal never observed → handler goroutine survives until next tipGetSubtreeRootscommon.GetBlockdoes 2 RPCs each on cache-miss; no ctx between iterationsGetMempoolTxs.mutexheld across1 + Nsequential RawRequest calls; concurrent callers serialize behind active refresherGetTaddressTransactionsGetTransaction(ctx, …)silently ignores itsctxparameter; the 30s timeout context constructed atservice.go:139is deadGetTreeStateRawRequest("z_gettreestate")lacks<-ctx.Done()check, but the loop is bounded to 1 iteration in practice by zcashd's activation-height guard atblockchain.cpp:1411Fix
common/mempool.go: addctx context.Contextas the first parameter ofGetMempool(…); replace the innerTime.Sleep(200ms)withselect { case <-time.After(200*time.Millisecond): case <-ctx.Done(): return ctx.Err() }; passctx.Err()early on the per-iteration top of the loop.frontend/service.goGetMempoolStream: passresp.Context()tocommon.GetMempool.frontend/service.goGetSubtreeRoots: insertif err := resp.Context().Err(); err != nil { return status.FromContextError(err).Err() }at the top of the per-subtree loop body (lines 868-889).frontend/service.goGetTreeState: insert the same ctx-check at the top of the SkipHash retry loop body (lines 339-362). Note in code comment that the loop is bounded to one iteration in practice; the check is for symmetry, not for DoS defense.frontend/service.goGetLatestTreeState: delegates toGetTreeState; no code change needed beyond what the GetTreeState change above gives transitively.Test plan
go build ./...go test ./...(existing tests should still pass)TestGetMempoolStreamCancelOnEmptyMempool: launch a regtest stream, cancel its context after 100ms with an empty mempool, assertruntime.NumGoroutine()returns to baseline within one block interval rather than persisting until tip change.Out-of-scope follow-ups (intentional)
ctxthroughcommon.RawRequest(the btcdrpcclientwrapper). That requires the upstreamrpcclient.Clientto supportRawRequestWithContext, which is a larger surgery. The minimum fix per handler — checking ctx between RPCs — is what this PR does.GetSubtreeRootsserver-sideMaxEntriescap. TheMaxEntriesproto field documents "0 for all entries" and has no server-side bound; a malicious client can request arbitrarily many subtrees. This is a separate DoS class from cancellation-hygiene and will be filed as its own issue.GetTaddressTransactionsdead-letterctx.GetTransactiontakes(ctx context.Context, …)but never readsctx. Will be filed as a small code-quality PR in the next 1-2 weeks.Notes
GetTaddressBalanceStream,GetAddressUtxosStream); three fail it (two fixed here, one under separate disclosure); one is a soft code-quality issue (deferred PR).zaino-state/src/backends/fetch.rsandstate.rs) — all four equivalent handlers in Zaino usetokio::spawn+timeout+ receiver-closedis_err()checks and do not exhibit any of these classes. Strong corroboration that the lightwalletd patterns are real defects, not intentional design.AI disclosure
This finding was identified through a re-audit pass, and this PR description and the underlying patches were drafted, with assistance from Claude (Anthropic). I reviewed each step and am the responsible author.