Make JSON-RPC request id allocation collision-free and resilient to cancellation#84
Open
PeterPalenik wants to merge 13 commits intohappyleavesaoc:masterfrom
Open
Make JSON-RPC request id allocation collision-free and resilient to cancellation#84PeterPalenik wants to merge 13 commits intohappyleavesaoc:masterfrom
PeterPalenik wants to merge 13 commits intohappyleavesaoc:masterfrom
Conversation
e51f8c7 to
ce48ff9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Opening this after chasing down a bunch of symptoms downstream in Music
Assistant: occasional
KeyError: 385/KeyError: 518followed byServer disconnected: <id>, also reported inmusic-assistant/support#4435 (where a commenter
independently traced it to this library). The trail led back to this
library's JSON-RPC request handling.
What was broken
Three things compound under concurrent load:
Request ids came from
random.randint(1, 1000). With ~50 in-flightcalls the birthday paradox already gives a >70% collision probability.
When two callers pick the same id, the second one's
_buffer[id] = ...overwrites the first; responses get cross-routed and the original waiter
hangs forever.
When a response arrived for an id no longer in
_buffer(cancellation,late server reply, or the collision case above),
handle_responsedidself._buffer[id]and theKeyErrorpropagated out. That surfaced inthe asyncio transport and disconnected the snapserver session.
Cancelled
request()calls (e.g. viaasyncio.wait_fortimeout) left_buffer[id]behind, so the dict slowly grew and stale responses had achance to land in it later.
Fix
A monotonic counter under
threading.Lockreplacesrandom.randint. Thelock is overkill for asyncio under current CPython, but I'd rather have it
than rely on
itertools.countatomicity, which is implementation-definedunder PEP 703 free-threaded builds.
handle_responseuses.get()and silently drops responses for unknownids, with a
DEBUGlog so future regressions stay diagnosable. Andrequest()runs undertry/finallyso_buffer.pop(id, None)always runs.Testing
I patched the lib into my running Music Assistant container (snapserver
0.35.0 on the other side) and ran 200 concurrent
server.rpc_version()calls:
Same script, same snapserver, only
protocol.pydiffers. The 26 timeoutsmatch the ~19 collision pairs the birthday paradox predicts for
randint(1,1000)at that load.