Skip to content

Document benchmark limitations and fix CI timeout issues#76

Merged
tomtom215 merged 1 commit into
mainfrom
claude/analyze-benchmark-results-U0EYE
Apr 1, 2026
Merged

Document benchmark limitations and fix CI timeout issues#76
tomtom215 merged 1 commit into
mainfrom
claude/analyze-benchmark-results-U0EYE

Conversation

@tomtom215
Copy link
Copy Markdown
Owner

Summary

This PR adds comprehensive documentation for known benchmark measurement limitations and resolves Criterion.rs timeout warnings by increasing measurement time budgets for 5 benchmark groups that were marginally exceeding their time limits on CI runners.

Key Changes

Documentation Improvements:

  • Added 8 new "Known Measurement Limitations" entries to benches/README.md and auto-generated benchmark results page covering:
    • Data volume/save wide confidence intervals from BTreeSet rebalancing
    • Dispatch routing inverted results (direct handler vs HTTP round-trip)
    • Cold start vs steady state measurement differences
    • Subscribe fan-out O(1) scaling behavior
    • Agent burst sub-linear scaling with Tokio work-stealing
    • Tenant resolver negligible overhead (88–173ns)
    • Pagination context index 2× speedup via BTreeSet filtering
  • Updated CHANGELOG.md with performance highlights from the cleanest benchmark run in project history (237 benchmarks, zero panics/errors)
  • Added benchmark overview section to book/src/deployment/testing.md documenting all 13 suites and 237 benchmarks
  • Updated CI/CD documentation in book/src/deployment/cicd.md describing the benchmarks workflow

Benchmark Timeout Fixes:

  • transport/payload_scaling: 8s → 10s (4KB and 16KB payloads needed 8.4–9.5s)
  • concurrent/sends: 18s → 30s (4-concurrent and 16-concurrent cases needed 21.8–28.8s)
  • realistic/payload_complexity: 10s → 15s (mixed_parts and nested_metadata exceeded budget)
  • realistic/connection: 10s → 15s (new_client_per_request TCP setup cost)
  • enterprise/client_interceptors: 8s → 10s (5 and 10 interceptor chains exceeded budget)

Benchmark Code Improvements:

  • production_scenarios.rs: Pre-allocate direct_params outside the measurement loop in bench_dispatch_routing to isolate handler dispatch cost from fixture allocation, producing fairer comparison against HTTP round-trip path
  • Added detailed measurement notes to data_volume.rs explaining BTreeSet rebalancing variance in the after_prefill/10000 case

Documentation Updates:

  • Updated README.md to reference all 237 benchmarks across 13 suites
  • Updated .github/workflows/README.md to document the benchmarks workflow

Notable Details

  • All timeout increases are based on actual CI analysis showing which benchmarks were marginally exceeding their budgets
  • The dispatch routing benchmark fix isolates the true handler cost by moving fixture allocation outside the timed loop
  • Documentation entries explain why seemingly anomalous results (e.g., direct handler slower than HTTP) are actually expected and validate the measurement methodology

https://claude.ai/code/session_01YNLEZUUKzzs4qjkj1TPgXz

…sis findings

Address all findings from the 237-benchmark criterion analysis:

Timeout fixes (10 warnings → 0):
- transport/payload_scaling: 8s → 10s (4KB/16KB needed 8.4–9.5s)
- concurrent/sends: 18s → 30s (/16 needed 28.8s at 5.68ms × 5050 iters)
- realistic/payload_complexity: 10s → 15s (mixed_parts marginally over)
- realistic/connection: 10s → 15s (new_client_per_request TCP setup)
- enterprise/client_interceptors: 8s → 10s (/5 and /10 chains over)

Benchmark fixes:
- dispatch_routing: pre-allocate params outside measurement loop for fair
  comparison between direct handler invoke and HTTP round-trip
- data_volume/save: document wide CIs from BTreeSet rebalancing spikes

Documentation (8 new Known Measurement Limitations):
- data_volume/save wide confidence intervals
- dispatch routing direct vs HTTP (near-parity validates keep-alive)
- cold start vs steady state (complementary, not comparable)
- subscribe fan-out O(1) scaling up to 5 subscribers
- agent burst sub-linear scaling (714→310µs/agent)
- tenant resolver negligible overhead (88–173ns)
- pagination context index 2× speedup
- Updated generate_book_page.sh footer, benches/README.md, CHANGELOG.md,
  root README.md, CI/CD book page, testing book page, workflows README

https://claude.ai/code/session_01YNLEZUUKzzs4qjkj1TPgXz
@tomtom215 tomtom215 merged commit 5414bb3 into main Apr 1, 2026
67 of 74 checks passed
@tomtom215 tomtom215 deleted the claude/analyze-benchmark-results-U0EYE branch April 6, 2026 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants