Skip to content

fix: resolve concurrency bugs in eventbus and webtransport#3518

Open
Sahil-4555 wants to merge 2 commits into
libp2p:masterfrom
Sahil-4555:fix/concurrency-safety
Open

fix: resolve concurrency bugs in eventbus and webtransport#3518
Sahil-4555 wants to merge 2 commits into
libp2p:masterfrom
Sahil-4555:fix/concurrency-safety

Conversation

@Sahil-4555

Copy link
Copy Markdown
Contributor

Issue

observed two concurrency issues:

  1. Eventbus Deadlock: When multiple emitters concurrently broadcast events to a slow consumer (full queue), they can deadlock. Both the wildcard subscriber node and standard subscriber node shared a single slowConsumerTimer pointer. Concurrent emitters raced on Reset(), causing the timer to fire only once. One emitter consumed the tick and blocked on the channel, leaving other emitters permanently blocked on <-timer.C even after the queue was drained. This hang occurred under a read-lock (RLock), blocking any subscription closure (sub.Close()) or node updates from acquiring a write-lock.
  2. WebTransport Cert Manager Data Race: The SerializedCertHashes() getter read the shared m.serializedCertHashes slice without acquiring a lock. This caused a read/write data race with the background cert manager thread which updates the slice during certificate rotation. Returning the raw slice by reference also caused slice aliasing, exposing the caller to backing memory being concurrently mutated/resized.

Fix

  1. Eventbus Timer Isolation: Removed the shared slowConsumerTimer pointer field from node and wildcardNode. The slow consumer warning logic was refactored to use a local time.Timer allocated on the stack within emitAndLogError, eliminating shared timer state across concurrent emitters.
  2. WebTransport Thread-Safety: Added m.mx.RLock() to SerializedCertHashes() and returned a deep copy of the slice rather than a direct reference to avoid data races and slice aliasing.

Tests

  1. Eventbus: Added TestWildcardSlowConsumerDeadlock in p2p/host/eventbus/basic_test.go which simulates a full queue under concurrent emissions to verify that no deadlocks occur and the emitters safely proceed when the queue is drained.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant