Document benchmark limitations and fix CI timeout issues by tomtom215 · Pull Request #76 · tomtom215/a2a-rust

tomtom215 · 2026-04-01T20:23:41Z

Summary

This PR adds comprehensive documentation for known benchmark measurement limitations and resolves Criterion.rs timeout warnings by increasing measurement time budgets for 5 benchmark groups that were marginally exceeding their time limits on CI runners.

Key Changes

Documentation Improvements:

Added 8 new "Known Measurement Limitations" entries to benches/README.md and auto-generated benchmark results page covering:
- Data volume/save wide confidence intervals from BTreeSet rebalancing
- Dispatch routing inverted results (direct handler vs HTTP round-trip)
- Cold start vs steady state measurement differences
- Subscribe fan-out O(1) scaling behavior
- Agent burst sub-linear scaling with Tokio work-stealing
- Tenant resolver negligible overhead (88–173ns)
- Pagination context index 2× speedup via BTreeSet filtering
Updated CHANGELOG.md with performance highlights from the cleanest benchmark run in project history (237 benchmarks, zero panics/errors)
Added benchmark overview section to book/src/deployment/testing.md documenting all 13 suites and 237 benchmarks
Updated CI/CD documentation in book/src/deployment/cicd.md describing the benchmarks workflow

Benchmark Timeout Fixes:

transport/payload_scaling: 8s → 10s (4KB and 16KB payloads needed 8.4–9.5s)
concurrent/sends: 18s → 30s (4-concurrent and 16-concurrent cases needed 21.8–28.8s)
realistic/payload_complexity: 10s → 15s (mixed_parts and nested_metadata exceeded budget)
realistic/connection: 10s → 15s (new_client_per_request TCP setup cost)
enterprise/client_interceptors: 8s → 10s (5 and 10 interceptor chains exceeded budget)

Benchmark Code Improvements:

production_scenarios.rs: Pre-allocate direct_params outside the measurement loop in bench_dispatch_routing to isolate handler dispatch cost from fixture allocation, producing fairer comparison against HTTP round-trip path
Added detailed measurement notes to data_volume.rs explaining BTreeSet rebalancing variance in the after_prefill/10000 case

Documentation Updates:

Updated README.md to reference all 237 benchmarks across 13 suites
Updated .github/workflows/README.md to document the benchmarks workflow

Notable Details

All timeout increases are based on actual CI analysis showing which benchmarks were marginally exceeding their budgets
The dispatch routing benchmark fix isolates the true handler cost by moving fixture allocation outside the timed loop
Documentation entries explain why seemingly anomalous results (e.g., direct handler slower than HTTP) are actually expected and validate the measurement methodology

https://claude.ai/code/session_01YNLEZUUKzzs4qjkj1TPgXz

…sis findings Address all findings from the 237-benchmark criterion analysis: Timeout fixes (10 warnings → 0): - transport/payload_scaling: 8s → 10s (4KB/16KB needed 8.4–9.5s) - concurrent/sends: 18s → 30s (/16 needed 28.8s at 5.68ms × 5050 iters) - realistic/payload_complexity: 10s → 15s (mixed_parts marginally over) - realistic/connection: 10s → 15s (new_client_per_request TCP setup) - enterprise/client_interceptors: 8s → 10s (/5 and /10 chains over) Benchmark fixes: - dispatch_routing: pre-allocate params outside measurement loop for fair comparison between direct handler invoke and HTTP round-trip - data_volume/save: document wide CIs from BTreeSet rebalancing spikes Documentation (8 new Known Measurement Limitations): - data_volume/save wide confidence intervals - dispatch routing direct vs HTTP (near-parity validates keep-alive) - cold start vs steady state (complementary, not comparable) - subscribe fan-out O(1) scaling up to 5 subscribers - agent burst sub-linear scaling (714→310µs/agent) - tenant resolver negligible overhead (88–173ns) - pagination context index 2× speedup - Updated generate_book_page.sh footer, benches/README.md, CHANGELOG.md, root README.md, CI/CD book page, testing book page, workflows README https://claude.ai/code/session_01YNLEZUUKzzs4qjkj1TPgXz

tomtom215 merged commit 5414bb3 into main Apr 1, 2026
67 of 74 checks passed

tomtom215 deleted the claude/analyze-benchmark-results-U0EYE branch April 6, 2026 14:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document benchmark limitations and fix CI timeout issues#76

Document benchmark limitations and fix CI timeout issues#76
tomtom215 merged 1 commit into
mainfrom
claude/analyze-benchmark-results-U0EYE

tomtom215 commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tomtom215 commented Apr 1, 2026

Summary

Key Changes

Notable Details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants