Add advanced_scenarios benchmark suite with tenant resolver and artifact accumulation tests by tomtom215 · Pull Request #74 · tomtom215/a2a-rust

tomtom215 · 2026-04-01T15:12:51Z

Summary

This PR introduces a comprehensive advanced_scenarios benchmark suite that exercises previously unbenchmarked SDK capabilities, along with critical bug fixes to the MultiEventExecutor and performance optimizations to the in-memory task store.

Key Changes

New Benchmark Suite: `advanced_scenarios`

Tenant resolver overhead — Measures per-request extraction cost for HeaderTenantResolver, BearerTokenTenantResolver (with and without mapper), and PathSegmentTenantResolver
Agent card hot-reload — Benchmarks HotReloadAgentCardHandler read performance, atomic swap cost, and complex card updates (100 skills)
Agent card discovery endpoint — Measures /.well-known/agent.json HTTP fetch latency
Subscribe fan-out — Tests concurrent subscriber behavior (1–10 subscribers) during reconnection bursts
Streaming artifact accumulation — Isolates the task.clone() cost as artifacts accumulate (0–500 depth), demonstrating the 90µs/event bottleneck at 501+ events
Pagination full walk — Multi-page cursor traversal of 100–1K tasks with unfiltered and context-filtered variants
Extended agent card round-trip — Tests extended card endpoint latency

Bug Fixes

MultiEventExecutor invalid state transitions — Fixed emission of Working → Working status events in a loop, which violated the A2A spec state machine. Now correctly emits Working once, then N artifact events, then Completed
InMemoryTaskStore::insert() optimization — Added fast path for context-unchanged updates: skips all index operations (BTreeSet inserts, string clones) when updating an existing task with the same context_id, reducing update cost from ~2.5µs to ~700ns

Infrastructure Improvements

Added NoopPushSender for benchmarks requiring push notification support without actual HTTP delivery
Added start_jsonrpc_server_with_push() helper for benchmark servers with push capabilities
Increased measurement time to 8–10 seconds for multi-threaded benchmarks to reduce CI scheduler variance
Updated backpressure.rs event counts to match corrected MultiEventExecutor behavior (3 → 502 events instead of 3 → 1001)
Updated benchmark documentation and CI workflow to include the new suite

Notable Implementation Details

The artifact accumulation benchmark uses pre-populated tasks with 0, 10, 50, 100, and 500 artifacts to demonstrate how task.clone() cost scales with accumulated state
The pagination walk benchmark measures both unfiltered and context-filtered traversals to capture index lookup overhead
All multi-threaded benchmarks now use explicit measurement_time() configuration to ensure stable results across CI environments
The InMemoryTaskStore optimization maintains correctness by only skipping index operations when context_id is unchanged; context changes still trigger full index updates

https://claude.ai/code/session_015mMFNY6W6zUUZmVcpMDGrR

- Fix BLOCKING production_scenarios panic: MultiEventExecutor was emitting invalid Working → Working state transitions. Restructured to emit a single Working status, then N artifact events, then Completed — conforming to the A2A spec state machine where Working can only transition to terminal or interrupted states. - Fix push_config benchmark: added NoopPushSender and server helper with push notification support enabled, resolving PushNotificationNotSupported errors during benchmark runs. - Add measurement_time to 23+ benchmark groups across 8 files to eliminate criterion warmup warnings on CI runners. Groups classified by per-iteration cost: transport round-trips (8s), multi-turn/concurrent (10s), slow/wall-clock bound (15s). Sub-µs store/serde groups left at default 5s. - Optimize InMemoryTaskStore::insert() for update path: when updating an existing task with unchanged context_id (the common case), skip all BTreeSet and context index operations — only update the primary HashMap entry. This eliminates the [1.5µs, 4.2µs] variance from occasional BTreeSet node splits and reduces update cost from ~2.5µs to ~700ns. - Document get() 100K cache anomaly: the ~42% faster lookup at 100K vs 1K/10K is a CPU cache warming artifact from the large populate_store() setup, not a genuine HashMap performance difference. The 1K/10K number (~450ns) is the representative O(1) lookup time. - Update MultiEventExecutor event count comments in backpressure.rs to reflect the corrected N+2 formula (Working + N artifacts + Completed) instead of the invalid 2N+1 (N Working + N artifacts + Completed). https://claude.ai/code/session_015mMFNY6W6zUUZmVcpMDGrR

New benchmark file exercising 7 previously-unbenchmarked SDK capabilities: 1. **Tenant resolver overhead** (5 benchmarks): HeaderTenantResolver, BearerTokenTenantResolver (with and without mapper), PathSegmentTenantResolver, and missing-header fast-rejection path. Uses CallContext with realistic HTTP headers to measure per-request extraction cost. 2. **Agent card hot-reload** (3 benchmarks): Steady-state read of current card via RwLock, atomic swap-and-read cycle, and complex card swap (100 skills) measuring the cost of production-scale card replacement. 3. **Agent card discovery endpoint** (1 benchmark): /.well-known/agent.json HTTP round-trip latency via raw hyper client. 4. **Subscribe fan-out** (3 benchmarks): 1/5/10 concurrent subscribers reconnecting to the same task simultaneously — simulates mobile/web client reconnection bursts at scale. 5. **Streaming artifact accumulation** (10 benchmarks): Isolates the per-event task.clone() cost at 0/10/50/100/500 accumulated artifacts — the root cause of the 90µs/event bottleneck at 501+ events. Also measures full store_save at each depth to capture clone + index + HashMap insert. 6. **Pagination full walk** (4 benchmarks): Complete cursor-based traversal of 100 and 1K task stores at page_size=25/50, both unfiltered and with context_id filter. Measures cumulative multi-page latency. 7. **Extended agent card round-trip** (1 benchmark): Full client→server→client round-trip for get_extended_agent_card with capability flag enabled. All 24 benchmark test cases pass smoke testing. https://claude.ai/code/session_015mMFNY6W6zUUZmVcpMDGrR

…nchmarks Updates all documentation and CI infrastructure to include the new advanced_scenarios benchmark suite (13th benchmark module): CI & Scripts: - .github/workflows/benchmarks.yml: add advanced_scenarios step - benches/scripts/run_benchmarks.sh: add to BENCHMARKS array - benches/scripts/generate_book_page.sh: add Advanced Scenarios section with emit_table "advanced_" call Documentation: - benches/README.md: add table row, architecture entry, update CI count (12→13), update executor listing (add NoopPushSender), add enterprise/ production/advanced dimensions to "Why it matters" table, correct backpressure event counts (1001→502) - CONTRIBUTING.md: add enterprise_scenarios, production_scenarios, and advanced_scenarios to benchmark table - CHANGELOG.md: document advanced_scenarios addition, MultiEventExecutor fix, save() variance optimization, measurement_time fixes, NoopPushSender, and push-enabled server helper - book/src/reference/changelog.md: add advanced_scenarios, save() insert optimization, MultiEventExecutor fix, measurement_time fixes, update backpressure event counts https://claude.ai/code/session_015mMFNY6W6zUUZmVcpMDGrR

- cargo fmt: fix import ordering and closure formatting in advanced_scenarios.rs and backpressure.rs - clippy: add backticks to doc comment references (context_id, BTreeSet, HashMap) in InMemoryTaskStore::insert() - cargo doc: fix unresolved doc link to NoopPushSender in server.rs by using fully qualified crate::executor::NoopPushSender path https://claude.ai/code/session_015mMFNY6W6zUUZmVcpMDGrR

- Use `next_back()` instead of `last()` on DoubleEndedIterator (split('.') returns a reversible iterator) - Use `usize::div_ceil()` instead of manual `(n + d - 1) / d` https://claude.ai/code/session_015mMFNY6W6zUUZmVcpMDGrR

claude added 5 commits April 1, 2026 14:30

fix: resolve clippy --all-targets lints in advanced_scenarios

3d18ee0

- Use `next_back()` instead of `last()` on DoubleEndedIterator (split('.') returns a reversible iterator) - Use `usize::div_ceil()` instead of manual `(n + d - 1) / d` https://claude.ai/code/session_015mMFNY6W6zUUZmVcpMDGrR

tomtom215 merged commit b82cd0d into main Apr 1, 2026
40 checks passed

tomtom215 deleted the claude/analyze-benchmark-results-Ok0Qs branch April 6, 2026 14:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add advanced_scenarios benchmark suite with tenant resolver and artifact accumulation tests#74

Add advanced_scenarios benchmark suite with tenant resolver and artifact accumulation tests#74
tomtom215 merged 5 commits into
mainfrom
claude/analyze-benchmark-results-Ok0Qs

tomtom215 commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tomtom215 commented Apr 1, 2026

Summary

Key Changes

New Benchmark Suite: advanced_scenarios

Bug Fixes

Infrastructure Improvements

Notable Implementation Details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

New Benchmark Suite: `advanced_scenarios`