Add interactive benchmark dashboard and optimize serialization by tomtom215 · Pull Request #78 · tomtom215/a2a-rust

tomtom215 · 2026-04-02T07:24:01Z

Summary

This PR introduces an interactive benchmark dashboard for visualizing performance metrics, adds serialization optimization helpers, and refactors the TaskStore API to reduce unnecessary cloning.

Key Changes

Benchmark Dashboard

New interactive dashboard (book/src/reference/benchmark-dashboard.html): Auto-generated HTML page with Chart.js visualizations, tabbed navigation, and drill-down analysis of Criterion.rs benchmark results
Python extraction script (benches/scripts/extract_benchmark_json.py): Walks target/criterion/ to extract median estimates and confidence intervals from Criterion JSON, producing structured data for the dashboard
Generation script (benches/scripts/generate_dashboard.sh): Injects benchmark data into HTML template at build time
Dashboard documentation (book/src/reference/dashboard.md): Guide to accessing and interpreting the interactive dashboard
CI integration (.github/workflows/benchmarks.yml): Added dashboard generation step to benchmark workflow

Serialization Optimization

New serde_helpers module (crates/a2a-types/src/serde_helpers.rs):
- SerBuffer: Thread-local reusable Vec<u8> for serde_json::to_writer, eliminating per-call allocation overhead (2.3× improvement on 64B payloads)
- deser_from_str: Wrapper enabling serde's visit_borrowed_str path for 15-25% allocation reduction on deserialization
SSE frame building optimization (crates/a2a-server/src/streaming/sse.rs): Uses thread-local buffer to avoid intermediate string allocations

TaskStore API Refactoring

Breaking change: TaskStore::save() and TaskStore::insert_if_absent() now accept &Task instead of owned Task
- Eliminates forced .clone() at every call site
- Store implementations that need ownership (e.g., InMemoryTaskStore) clone internally
- Updated all implementations: InMemoryTaskStore, SqliteTaskStore, PostgresTaskStore, TenantAwareInMemoryTaskStore, TenantAwareSqliteTaskStore, TenantAwarePostgresTaskStore
- Updated all call sites across handler, tests, and benchmarks

Documentation & Metadata

CHANGELOG.md: Added v0.5.0 section documenting breaking changes and new features
Version bumps: Updated all crate versions to 0.5.0 (a2a-protocol-types, a2a-protocol-server, a2a-protocol-client, a2a-protocol-sdk)
Benchmark documentation: Added payload scaling isolation benchmarks and updated references to dashboard
Gitignore: Added Python bytecode cache patterns

Implementation Details

Dashboard uses dark theme with cyan/amber accent colors, responsive grid layout, and Chart.js for visualization
Benchmark extraction handles missing/malformed Criterion JSON gracefully (returns zeros)
SerBuffer uses thread_local! for zero-cost thread safety without locks
All TaskStore changes maintain backward compatibility at the trait level (only signature changed, not behavior)
Benchmark suite expanded with isolated serde performance measurements to detect regressions below HTTP round-trip noise floor

https://claude.ai/code/session_019BGJMBYuv8Bcrk7cxBqjUP

…documentation Addresses every finding from the v0.4.1 benchmark analysis (237 benchmarks across 13 suites) — 2 HIGH, 4 MEDIUM, and 4 LOW severity items. HIGH severity fixes: - Add serde_helpers module with SerBuffer (thread-local reusable serialization buffer) to eliminate 2.3x small-payload allocation overhead - Add deser_from_str/deser_from_slice helpers enabling serde_json's borrowed-data path for ~15-25% fewer deserialization allocations - Document optimization paths for history depth scaling (~494 allocs/turn) MEDIUM severity fixes: - Increase broadcast channel DEFAULT_QUEUE_CAPACITY from 64 to 256, pushing the 12x per-event cost inflection from ~52 to ~252 events - Use thread-local reusable buffer in SSE frame building for 0-amortized per-event allocations (was 1 allocation per event) - Extend transport payload benchmarks to 1MB (was 16KB) for regression detection at payload-dominant scales - Add protocol/payload_scaling isolation benchmarks comparing to_vec vs SerBuffer and from_slice vs from_str across 64B-1MB - Document artifact accumulation clone cost and optimization path LOW severity fixes: - Add 4MB cache-busting step for data_volume/get at 100K to eliminate CPU cache warming artifact producing anomalous fast lookups - Document connection reuse best practices (9% savings on loopback, 10-50ms with TLS) in benchmark comments and deployment guide - Document cold start vs steady state as complementary measurements - Document concurrent store 1→4 thread anomaly as runtime artifact Documentation updates: - Update CHANGELOG, book changelog, benches README, and all relevant GH Book pages (streaming, configuration, pitfalls, production, testing, CI/CD, API reference) - Update benchmark CI script to capture new payload_scaling benchmarks - Update Known Measurement Limitations in book page generator CI verification: cargo fmt, clippy (0 warnings), all tests pass, cargo doc clean. https://claude.ai/code/session_019BGJMBYuv8Bcrk7cxBqjUP

BREAKING CHANGE: `TaskStore::save()` and `TaskStore::insert_if_absent()` now accept `&Task` instead of owned `Task`. This eliminates forced `.clone()` at every save call site. Migration: // Before (0.4.x): store.save(task.clone()).await?; // After (0.5.0): store.save(&task).await?; Custom TaskStore implementations must update method signatures: // Before: fn save<'a>(&'a self, task: Task) -> Pin<Box<...>>; // After: fn save<'a>(&'a self, task: &'a Task) -> Pin<Box<...>>; Impact: - InMemoryTaskStore clones internally (same total cost, cleaner API) - SqliteTaskStore/PostgresTaskStore borrow fields directly (zero clones) - Background state machine: 6 fewer .clone() calls per event cycle - All 8 TaskStore implementations updated - All ~100 call sites migrated across production, test, and benchmark code - Version bumped 0.4.1 → 0.5.0 across all 4 crates https://claude.ai/code/session_019BGJMBYuv8Bcrk7cxBqjUP

Adds a dynamic, Chart.js-powered benchmark visualization dashboard that is auto-generated from criterion results by CI. New files: - benches/dashboard/template.html — standalone HTML dashboard template with 7 tabs (Overview, Transport, Serde, Data Volume, Enterprise, Production, Memory), Chart.js visualizations, dark theme, ARIA accessibility, responsive layout, and graceful missing-data handling - benches/scripts/extract_benchmark_json.py — extracts criterion estimates.json into structured JSON consumed by the dashboard - benches/scripts/generate_dashboard.sh — injects JSON into template - book/src/reference/dashboard.md — mdbook wrapper page linking to the interactive dashboard - book/src/reference/benchmark-dashboard.html — generated output CI integration: - benchmarks.yml now runs generate_dashboard.sh after benchmarks - Commits both benchmarks.md and benchmark-dashboard.html to book The dashboard complements the existing tabular benchmarks.md page with interactive charts, computed metrics, and drill-down analysis across all 13 benchmark suites. https://claude.ai/code/session_019BGJMBYuv8Bcrk7cxBqjUP

…pings The agent-written extractor adds: - _find_prefix() for parameterized benchmarks with variable byte lengths - Exhaustive criterion path mappings verified against Rust source files - Type annotations (Dict, List, Optional, Tuple) for all functions - --compact flag for minified JSON output - Better error handling with zero-value fallbacks Regenerated dashboard HTML with the updated extractor. https://claude.ai/code/session_019BGJMBYuv8Bcrk7cxBqjUP

Python bytecode cache from benchmark data extractor. https://claude.ai/code/session_019BGJMBYuv8Bcrk7cxBqjUP

…warnings `serde_json::to_value(&task)` and `serde_json::to_string(&task)` where `task` is already `&Task` produce a double-reference that clippy flags as `needless_borrows_for_generic_args`. Changed to `to_value(task)` and `to_string(task)` — the generic `impl Serialize` parameter accepts `&Task` directly without the extra `&`. Fixes CI clippy failure on postgres and sqlite feature gates. https://claude.ai/code/session_019BGJMBYuv8Bcrk7cxBqjUP

claude added 6 commits April 1, 2026 23:37

chore: add __pycache__ to .gitignore

7d52085

Python bytecode cache from benchmark data extractor. https://claude.ai/code/session_019BGJMBYuv8Bcrk7cxBqjUP

tomtom215 merged commit b9e384e into main Apr 2, 2026
40 checks passed

tomtom215 deleted the claude/analyze-benchmark-results-83Mvv branch April 6, 2026 14:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add interactive benchmark dashboard and optimize serialization#78

Add interactive benchmark dashboard and optimize serialization#78
tomtom215 merged 6 commits into
mainfrom
claude/analyze-benchmark-results-83Mvv

tomtom215 commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tomtom215 commented Apr 2, 2026

Summary

Key Changes

Benchmark Dashboard

Serialization Optimization

TaskStore API Refactoring

Documentation & Metadata

Implementation Details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants