Add interactive benchmark dashboard and optimize serialization#78
Merged
Conversation
…documentation Addresses every finding from the v0.4.1 benchmark analysis (237 benchmarks across 13 suites) — 2 HIGH, 4 MEDIUM, and 4 LOW severity items. HIGH severity fixes: - Add serde_helpers module with SerBuffer (thread-local reusable serialization buffer) to eliminate 2.3x small-payload allocation overhead - Add deser_from_str/deser_from_slice helpers enabling serde_json's borrowed-data path for ~15-25% fewer deserialization allocations - Document optimization paths for history depth scaling (~494 allocs/turn) MEDIUM severity fixes: - Increase broadcast channel DEFAULT_QUEUE_CAPACITY from 64 to 256, pushing the 12x per-event cost inflection from ~52 to ~252 events - Use thread-local reusable buffer in SSE frame building for 0-amortized per-event allocations (was 1 allocation per event) - Extend transport payload benchmarks to 1MB (was 16KB) for regression detection at payload-dominant scales - Add protocol/payload_scaling isolation benchmarks comparing to_vec vs SerBuffer and from_slice vs from_str across 64B-1MB - Document artifact accumulation clone cost and optimization path LOW severity fixes: - Add 4MB cache-busting step for data_volume/get at 100K to eliminate CPU cache warming artifact producing anomalous fast lookups - Document connection reuse best practices (9% savings on loopback, 10-50ms with TLS) in benchmark comments and deployment guide - Document cold start vs steady state as complementary measurements - Document concurrent store 1→4 thread anomaly as runtime artifact Documentation updates: - Update CHANGELOG, book changelog, benches README, and all relevant GH Book pages (streaming, configuration, pitfalls, production, testing, CI/CD, API reference) - Update benchmark CI script to capture new payload_scaling benchmarks - Update Known Measurement Limitations in book page generator CI verification: cargo fmt, clippy (0 warnings), all tests pass, cargo doc clean. https://claude.ai/code/session_019BGJMBYuv8Bcrk7cxBqjUP
BREAKING CHANGE: `TaskStore::save()` and `TaskStore::insert_if_absent()` now accept `&Task` instead of owned `Task`. This eliminates forced `.clone()` at every save call site. Migration: // Before (0.4.x): store.save(task.clone()).await?; // After (0.5.0): store.save(&task).await?; Custom TaskStore implementations must update method signatures: // Before: fn save<'a>(&'a self, task: Task) -> Pin<Box<...>>; // After: fn save<'a>(&'a self, task: &'a Task) -> Pin<Box<...>>; Impact: - InMemoryTaskStore clones internally (same total cost, cleaner API) - SqliteTaskStore/PostgresTaskStore borrow fields directly (zero clones) - Background state machine: 6 fewer .clone() calls per event cycle - All 8 TaskStore implementations updated - All ~100 call sites migrated across production, test, and benchmark code - Version bumped 0.4.1 → 0.5.0 across all 4 crates https://claude.ai/code/session_019BGJMBYuv8Bcrk7cxBqjUP
Adds a dynamic, Chart.js-powered benchmark visualization dashboard that is auto-generated from criterion results by CI. New files: - benches/dashboard/template.html — standalone HTML dashboard template with 7 tabs (Overview, Transport, Serde, Data Volume, Enterprise, Production, Memory), Chart.js visualizations, dark theme, ARIA accessibility, responsive layout, and graceful missing-data handling - benches/scripts/extract_benchmark_json.py — extracts criterion estimates.json into structured JSON consumed by the dashboard - benches/scripts/generate_dashboard.sh — injects JSON into template - book/src/reference/dashboard.md — mdbook wrapper page linking to the interactive dashboard - book/src/reference/benchmark-dashboard.html — generated output CI integration: - benchmarks.yml now runs generate_dashboard.sh after benchmarks - Commits both benchmarks.md and benchmark-dashboard.html to book The dashboard complements the existing tabular benchmarks.md page with interactive charts, computed metrics, and drill-down analysis across all 13 benchmark suites. https://claude.ai/code/session_019BGJMBYuv8Bcrk7cxBqjUP
…pings The agent-written extractor adds: - _find_prefix() for parameterized benchmarks with variable byte lengths - Exhaustive criterion path mappings verified against Rust source files - Type annotations (Dict, List, Optional, Tuple) for all functions - --compact flag for minified JSON output - Better error handling with zero-value fallbacks Regenerated dashboard HTML with the updated extractor. https://claude.ai/code/session_019BGJMBYuv8Bcrk7cxBqjUP
Python bytecode cache from benchmark data extractor. https://claude.ai/code/session_019BGJMBYuv8Bcrk7cxBqjUP
…warnings `serde_json::to_value(&task)` and `serde_json::to_string(&task)` where `task` is already `&Task` produce a double-reference that clippy flags as `needless_borrows_for_generic_args`. Changed to `to_value(task)` and `to_string(task)` — the generic `impl Serialize` parameter accepts `&Task` directly without the extra `&`. Fixes CI clippy failure on postgres and sqlite feature gates. https://claude.ai/code/session_019BGJMBYuv8Bcrk7cxBqjUP
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces an interactive benchmark dashboard for visualizing performance metrics, adds serialization optimization helpers, and refactors the
TaskStoreAPI to reduce unnecessary cloning.Key Changes
Benchmark Dashboard
book/src/reference/benchmark-dashboard.html): Auto-generated HTML page with Chart.js visualizations, tabbed navigation, and drill-down analysis of Criterion.rs benchmark resultsbenches/scripts/extract_benchmark_json.py): Walkstarget/criterion/to extract median estimates and confidence intervals from Criterion JSON, producing structured data for the dashboardbenches/scripts/generate_dashboard.sh): Injects benchmark data into HTML template at build timebook/src/reference/dashboard.md): Guide to accessing and interpreting the interactive dashboard.github/workflows/benchmarks.yml): Added dashboard generation step to benchmark workflowSerialization Optimization
serde_helpersmodule (crates/a2a-types/src/serde_helpers.rs):SerBuffer: Thread-local reusableVec<u8>forserde_json::to_writer, eliminating per-call allocation overhead (2.3× improvement on 64B payloads)deser_from_str: Wrapper enabling serde'svisit_borrowed_strpath for 15-25% allocation reduction on deserializationcrates/a2a-server/src/streaming/sse.rs): Uses thread-local buffer to avoid intermediate string allocationsTaskStore API Refactoring
TaskStore::save()andTaskStore::insert_if_absent()now accept&Taskinstead of ownedTask.clone()at every call siteInMemoryTaskStore) clone internallyInMemoryTaskStore,SqliteTaskStore,PostgresTaskStore,TenantAwareInMemoryTaskStore,TenantAwareSqliteTaskStore,TenantAwarePostgresTaskStoreDocumentation & Metadata
a2a-protocol-types,a2a-protocol-server,a2a-protocol-client,a2a-protocol-sdk)Implementation Details
thread_local!for zero-cost thread safety without lockshttps://claude.ai/code/session_019BGJMBYuv8Bcrk7cxBqjUP