Add advanced_scenarios benchmark suite with tenant resolver and artifact accumulation tests#74
Merged
Merged
Conversation
- Fix BLOCKING production_scenarios panic: MultiEventExecutor was emitting invalid Working → Working state transitions. Restructured to emit a single Working status, then N artifact events, then Completed — conforming to the A2A spec state machine where Working can only transition to terminal or interrupted states. - Fix push_config benchmark: added NoopPushSender and server helper with push notification support enabled, resolving PushNotificationNotSupported errors during benchmark runs. - Add measurement_time to 23+ benchmark groups across 8 files to eliminate criterion warmup warnings on CI runners. Groups classified by per-iteration cost: transport round-trips (8s), multi-turn/concurrent (10s), slow/wall-clock bound (15s). Sub-µs store/serde groups left at default 5s. - Optimize InMemoryTaskStore::insert() for update path: when updating an existing task with unchanged context_id (the common case), skip all BTreeSet and context index operations — only update the primary HashMap entry. This eliminates the [1.5µs, 4.2µs] variance from occasional BTreeSet node splits and reduces update cost from ~2.5µs to ~700ns. - Document get() 100K cache anomaly: the ~42% faster lookup at 100K vs 1K/10K is a CPU cache warming artifact from the large populate_store() setup, not a genuine HashMap performance difference. The 1K/10K number (~450ns) is the representative O(1) lookup time. - Update MultiEventExecutor event count comments in backpressure.rs to reflect the corrected N+2 formula (Working + N artifacts + Completed) instead of the invalid 2N+1 (N Working + N artifacts + Completed). https://claude.ai/code/session_015mMFNY6W6zUUZmVcpMDGrR
New benchmark file exercising 7 previously-unbenchmarked SDK capabilities: 1. **Tenant resolver overhead** (5 benchmarks): HeaderTenantResolver, BearerTokenTenantResolver (with and without mapper), PathSegmentTenantResolver, and missing-header fast-rejection path. Uses CallContext with realistic HTTP headers to measure per-request extraction cost. 2. **Agent card hot-reload** (3 benchmarks): Steady-state read of current card via RwLock, atomic swap-and-read cycle, and complex card swap (100 skills) measuring the cost of production-scale card replacement. 3. **Agent card discovery endpoint** (1 benchmark): /.well-known/agent.json HTTP round-trip latency via raw hyper client. 4. **Subscribe fan-out** (3 benchmarks): 1/5/10 concurrent subscribers reconnecting to the same task simultaneously — simulates mobile/web client reconnection bursts at scale. 5. **Streaming artifact accumulation** (10 benchmarks): Isolates the per-event task.clone() cost at 0/10/50/100/500 accumulated artifacts — the root cause of the 90µs/event bottleneck at 501+ events. Also measures full store_save at each depth to capture clone + index + HashMap insert. 6. **Pagination full walk** (4 benchmarks): Complete cursor-based traversal of 100 and 1K task stores at page_size=25/50, both unfiltered and with context_id filter. Measures cumulative multi-page latency. 7. **Extended agent card round-trip** (1 benchmark): Full client→server→client round-trip for get_extended_agent_card with capability flag enabled. All 24 benchmark test cases pass smoke testing. https://claude.ai/code/session_015mMFNY6W6zUUZmVcpMDGrR
…nchmarks Updates all documentation and CI infrastructure to include the new advanced_scenarios benchmark suite (13th benchmark module): CI & Scripts: - .github/workflows/benchmarks.yml: add advanced_scenarios step - benches/scripts/run_benchmarks.sh: add to BENCHMARKS array - benches/scripts/generate_book_page.sh: add Advanced Scenarios section with emit_table "advanced_" call Documentation: - benches/README.md: add table row, architecture entry, update CI count (12→13), update executor listing (add NoopPushSender), add enterprise/ production/advanced dimensions to "Why it matters" table, correct backpressure event counts (1001→502) - CONTRIBUTING.md: add enterprise_scenarios, production_scenarios, and advanced_scenarios to benchmark table - CHANGELOG.md: document advanced_scenarios addition, MultiEventExecutor fix, save() variance optimization, measurement_time fixes, NoopPushSender, and push-enabled server helper - book/src/reference/changelog.md: add advanced_scenarios, save() insert optimization, MultiEventExecutor fix, measurement_time fixes, update backpressure event counts https://claude.ai/code/session_015mMFNY6W6zUUZmVcpMDGrR
- cargo fmt: fix import ordering and closure formatting in advanced_scenarios.rs and backpressure.rs - clippy: add backticks to doc comment references (context_id, BTreeSet, HashMap) in InMemoryTaskStore::insert() - cargo doc: fix unresolved doc link to NoopPushSender in server.rs by using fully qualified crate::executor::NoopPushSender path https://claude.ai/code/session_015mMFNY6W6zUUZmVcpMDGrR
- Use `next_back()` instead of `last()` on DoubleEndedIterator
(split('.') returns a reversible iterator)
- Use `usize::div_ceil()` instead of manual `(n + d - 1) / d`
https://claude.ai/code/session_015mMFNY6W6zUUZmVcpMDGrR
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces a comprehensive
advanced_scenariosbenchmark suite that exercises previously unbenchmarked SDK capabilities, along with critical bug fixes to theMultiEventExecutorand performance optimizations to the in-memory task store.Key Changes
New Benchmark Suite:
advanced_scenariosHeaderTenantResolver,BearerTokenTenantResolver(with and without mapper), andPathSegmentTenantResolverHotReloadAgentCardHandlerread performance, atomic swap cost, and complex card updates (100 skills)/.well-known/agent.jsonHTTP fetch latencytask.clone()cost as artifacts accumulate (0–500 depth), demonstrating the 90µs/event bottleneck at 501+ eventsBug Fixes
MultiEventExecutorinvalid state transitions — Fixed emission ofWorking → Workingstatus events in a loop, which violated the A2A spec state machine. Now correctly emitsWorkingonce, then N artifact events, thenCompletedInMemoryTaskStore::insert()optimization — Added fast path for context-unchanged updates: skips all index operations (BTreeSet inserts, string clones) when updating an existing task with the same context_id, reducing update cost from ~2.5µs to ~700nsInfrastructure Improvements
NoopPushSenderfor benchmarks requiring push notification support without actual HTTP deliverystart_jsonrpc_server_with_push()helper for benchmark servers with push capabilitiesbackpressure.rsevent counts to match correctedMultiEventExecutorbehavior (3 → 502 events instead of 3 → 1001)Notable Implementation Details
task.clone()cost scales with accumulated statemeasurement_time()configuration to ensure stable results across CI environmentsInMemoryTaskStoreoptimization maintains correctness by only skipping index operations when context_id is unchanged; context changes still trigger full index updateshttps://claude.ai/code/session_015mMFNY6W6zUUZmVcpMDGrR