Skip to content

perf: event-driven msg_wait wake-up via asyncio.Event#57

Merged
Killea merged 3 commits intoKillea:mainfrom
bertheto:perf/event-driven-msg-wait
Mar 11, 2026
Merged

perf: event-driven msg_wait wake-up via asyncio.Event#57
Killea merged 3 commits intoKillea:mainfrom
bertheto:perf/event-driven-msg-wait

Conversation

@bertheto
Copy link
Contributor

Summary

  • Replace the 1s �syncio.sleep polling loop in handle_msg_wait with an �syncio.Event-based notification mechanism
  • When msg_post inserts a message, it signals the per-thread event, waking up all waiters immediately
  • The 1s timeout fallback is preserved via �syncio.wait_for(event.wait(), timeout=1.0) for correctness in multi-process/stdio mode

Changes

  • src/tools/dispatch.py (+33 lines, -1 line):
    • Add _thread_events: dict[str, asyncio.Event] registry at module level
    • Add _get_thread_event() helper for lazy event creation
    • In _poll(): replace �syncio.sleep(1.0) with event.clear() + �syncio.wait_for(event.wait(), timeout=1.0)
    • In handle_msg_post: signal _thread_events[thread_id].set() after successful message insertion

Design decisions

  • In-memory dict (not DB-backed): Works because SSE mode runs a single uvicorn process. The 1s fallback guarantees correctness if events are missed (e.g., in multi-process stdio mode).
  • **
    eload_excludes**: dispatch.py is excluded from hot-reload (
    eload_excludes=[src/tools/*.py] in main.py), so the dict persists across code changes during development.
  • Spurious-wakeup safe: The outer while True loop in _poll() re-checks crud.msg_list after every wake-up, so spurious wakes are harmless.

Performance impact

  • Before: ~500ms average latency per msg_wait call (uniform distribution over 0-1s sleep)
  • After: ~10ms (event signal + context switch)
  • Per session (8 msg_wait calls in a typical 4-round debate): ~4s saved

Test plan

  • Verified single-worker uvicorn configuration (no --workers flag)
  • Verified
    eload_excludes covers dispatch.py
  • Manual test: run a multi-agent session and confirm reduced latency
  • Verify no regression in stdio mode (fallback 1s polling)

Replace the 1s asyncio.sleep polling loop in handle_msg_wait with
an asyncio.Event-based notification mechanism. When msg_post inserts
a message successfully, it signals the per-thread event, waking up
all waiters immediately instead of waiting for the next poll tick.

- Add _thread_events registry (dict[str, asyncio.Event]) at module level
- Add _get_thread_event() helper for lazy event creation
- In _poll(): replace asyncio.sleep(1.0) with event.clear() +
  asyncio.wait_for(event.wait(), timeout=1.0) for fallback safety
- In handle_msg_post: signal event after successful message insertion

The 1s timeout fallback preserves correctness in multi-process/stdio
mode where in-memory events are not shared across processes.

Estimated improvement: inter-message latency from ~500ms to ~10ms.

Made-with: Cursor
The test_timeout_handling tests were patching asyncio.wait_for globally,
which interfered with the event-driven msg_wait mechanism introduced in
dispatch.py (asyncio.wait_for(event.wait(), timeout=1.0)).

Scope the mock to src.main.asyncio.wait_for so it only intercepts calls
from main.py, leaving dispatch.py's event-based wait unaffected.

Note: test_api_threads_success was already failing on main before this PR
(TypeError: 'coroutine' object is not iterable — caused by the threads_agents_map
refactor). This fix addresses both the pre-existing failure and the new
interference introduced by the event-driven msg_wait.

Made-with: Cursor
The previous approach patched asyncio.wait_for/gather globally (then
src.main-scoped), but api_threads nests wait_for inside gather, making
the mock fragile against code structure changes.

Mock the CRUD layer directly instead:
- patch get_db to return a mock db connection
- patch crud.thread_list, crud.thread_count, crud.threads_agents_map
  as AsyncMocks with controlled return values

This is the correct level of abstraction: tests verify endpoint logic,
not asyncio plumbing. Also fixes the pre-existing failure on main
introduced by the threads_agents_map refactor (missing mock for the new
third await call).

Made-with: Cursor
@Killea Killea merged commit 5a71a17 into Killea:main Mar 11, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants