Skip to content

Add interactive benchmark dashboard and optimize serialization#78

Merged
tomtom215 merged 6 commits into
mainfrom
claude/analyze-benchmark-results-83Mvv
Apr 2, 2026
Merged

Add interactive benchmark dashboard and optimize serialization#78
tomtom215 merged 6 commits into
mainfrom
claude/analyze-benchmark-results-83Mvv

Conversation

@tomtom215
Copy link
Copy Markdown
Owner

Summary

This PR introduces an interactive benchmark dashboard for visualizing performance metrics, adds serialization optimization helpers, and refactors the TaskStore API to reduce unnecessary cloning.

Key Changes

Benchmark Dashboard

  • New interactive dashboard (book/src/reference/benchmark-dashboard.html): Auto-generated HTML page with Chart.js visualizations, tabbed navigation, and drill-down analysis of Criterion.rs benchmark results
  • Python extraction script (benches/scripts/extract_benchmark_json.py): Walks target/criterion/ to extract median estimates and confidence intervals from Criterion JSON, producing structured data for the dashboard
  • Generation script (benches/scripts/generate_dashboard.sh): Injects benchmark data into HTML template at build time
  • Dashboard documentation (book/src/reference/dashboard.md): Guide to accessing and interpreting the interactive dashboard
  • CI integration (.github/workflows/benchmarks.yml): Added dashboard generation step to benchmark workflow

Serialization Optimization

  • New serde_helpers module (crates/a2a-types/src/serde_helpers.rs):
    • SerBuffer: Thread-local reusable Vec<u8> for serde_json::to_writer, eliminating per-call allocation overhead (2.3× improvement on 64B payloads)
    • deser_from_str: Wrapper enabling serde's visit_borrowed_str path for 15-25% allocation reduction on deserialization
  • SSE frame building optimization (crates/a2a-server/src/streaming/sse.rs): Uses thread-local buffer to avoid intermediate string allocations

TaskStore API Refactoring

  • Breaking change: TaskStore::save() and TaskStore::insert_if_absent() now accept &Task instead of owned Task
    • Eliminates forced .clone() at every call site
    • Store implementations that need ownership (e.g., InMemoryTaskStore) clone internally
    • Updated all implementations: InMemoryTaskStore, SqliteTaskStore, PostgresTaskStore, TenantAwareInMemoryTaskStore, TenantAwareSqliteTaskStore, TenantAwarePostgresTaskStore
    • Updated all call sites across handler, tests, and benchmarks

Documentation & Metadata

  • CHANGELOG.md: Added v0.5.0 section documenting breaking changes and new features
  • Version bumps: Updated all crate versions to 0.5.0 (a2a-protocol-types, a2a-protocol-server, a2a-protocol-client, a2a-protocol-sdk)
  • Benchmark documentation: Added payload scaling isolation benchmarks and updated references to dashboard
  • Gitignore: Added Python bytecode cache patterns

Implementation Details

  • Dashboard uses dark theme with cyan/amber accent colors, responsive grid layout, and Chart.js for visualization
  • Benchmark extraction handles missing/malformed Criterion JSON gracefully (returns zeros)
  • SerBuffer uses thread_local! for zero-cost thread safety without locks
  • All TaskStore changes maintain backward compatibility at the trait level (only signature changed, not behavior)
  • Benchmark suite expanded with isolated serde performance measurements to detect regressions below HTTP round-trip noise floor

https://claude.ai/code/session_019BGJMBYuv8Bcrk7cxBqjUP

claude added 6 commits April 1, 2026 23:37
…documentation

Addresses every finding from the v0.4.1 benchmark analysis (237 benchmarks
across 13 suites) — 2 HIGH, 4 MEDIUM, and 4 LOW severity items.

HIGH severity fixes:
- Add serde_helpers module with SerBuffer (thread-local reusable serialization
  buffer) to eliminate 2.3x small-payload allocation overhead
- Add deser_from_str/deser_from_slice helpers enabling serde_json's
  borrowed-data path for ~15-25% fewer deserialization allocations
- Document optimization paths for history depth scaling (~494 allocs/turn)

MEDIUM severity fixes:
- Increase broadcast channel DEFAULT_QUEUE_CAPACITY from 64 to 256, pushing
  the 12x per-event cost inflection from ~52 to ~252 events
- Use thread-local reusable buffer in SSE frame building for 0-amortized
  per-event allocations (was 1 allocation per event)
- Extend transport payload benchmarks to 1MB (was 16KB) for regression
  detection at payload-dominant scales
- Add protocol/payload_scaling isolation benchmarks comparing to_vec vs
  SerBuffer and from_slice vs from_str across 64B-1MB
- Document artifact accumulation clone cost and optimization path

LOW severity fixes:
- Add 4MB cache-busting step for data_volume/get at 100K to eliminate
  CPU cache warming artifact producing anomalous fast lookups
- Document connection reuse best practices (9% savings on loopback,
  10-50ms with TLS) in benchmark comments and deployment guide
- Document cold start vs steady state as complementary measurements
- Document concurrent store 1→4 thread anomaly as runtime artifact

Documentation updates:
- Update CHANGELOG, book changelog, benches README, and all relevant
  GH Book pages (streaming, configuration, pitfalls, production,
  testing, CI/CD, API reference)
- Update benchmark CI script to capture new payload_scaling benchmarks
- Update Known Measurement Limitations in book page generator

CI verification: cargo fmt, clippy (0 warnings), all tests pass,
cargo doc clean.

https://claude.ai/code/session_019BGJMBYuv8Bcrk7cxBqjUP
BREAKING CHANGE: `TaskStore::save()` and `TaskStore::insert_if_absent()`
now accept `&Task` instead of owned `Task`. This eliminates forced
`.clone()` at every save call site.

Migration:
  // Before (0.4.x):
  store.save(task.clone()).await?;
  // After (0.5.0):
  store.save(&task).await?;

Custom TaskStore implementations must update method signatures:
  // Before:
  fn save<'a>(&'a self, task: Task) -> Pin<Box<...>>;
  // After:
  fn save<'a>(&'a self, task: &'a Task) -> Pin<Box<...>>;

Impact:
- InMemoryTaskStore clones internally (same total cost, cleaner API)
- SqliteTaskStore/PostgresTaskStore borrow fields directly (zero clones)
- Background state machine: 6 fewer .clone() calls per event cycle
- All 8 TaskStore implementations updated
- All ~100 call sites migrated across production, test, and benchmark code
- Version bumped 0.4.1 → 0.5.0 across all 4 crates

https://claude.ai/code/session_019BGJMBYuv8Bcrk7cxBqjUP
Adds a dynamic, Chart.js-powered benchmark visualization dashboard that
is auto-generated from criterion results by CI.

New files:
- benches/dashboard/template.html — standalone HTML dashboard template
  with 7 tabs (Overview, Transport, Serde, Data Volume, Enterprise,
  Production, Memory), Chart.js visualizations, dark theme, ARIA
  accessibility, responsive layout, and graceful missing-data handling
- benches/scripts/extract_benchmark_json.py — extracts criterion
  estimates.json into structured JSON consumed by the dashboard
- benches/scripts/generate_dashboard.sh — injects JSON into template
- book/src/reference/dashboard.md — mdbook wrapper page linking to
  the interactive dashboard
- book/src/reference/benchmark-dashboard.html — generated output

CI integration:
- benchmarks.yml now runs generate_dashboard.sh after benchmarks
- Commits both benchmarks.md and benchmark-dashboard.html to book

The dashboard complements the existing tabular benchmarks.md page with
interactive charts, computed metrics, and drill-down analysis across
all 13 benchmark suites.

https://claude.ai/code/session_019BGJMBYuv8Bcrk7cxBqjUP
…pings

The agent-written extractor adds:
- _find_prefix() for parameterized benchmarks with variable byte lengths
- Exhaustive criterion path mappings verified against Rust source files
- Type annotations (Dict, List, Optional, Tuple) for all functions
- --compact flag for minified JSON output
- Better error handling with zero-value fallbacks

Regenerated dashboard HTML with the updated extractor.

https://claude.ai/code/session_019BGJMBYuv8Bcrk7cxBqjUP
…warnings

`serde_json::to_value(&task)` and `serde_json::to_string(&task)` where
`task` is already `&Task` produce a double-reference that clippy flags as
`needless_borrows_for_generic_args`. Changed to `to_value(task)` and
`to_string(task)` — the generic `impl Serialize` parameter accepts `&Task`
directly without the extra `&`.

Fixes CI clippy failure on postgres and sqlite feature gates.

https://claude.ai/code/session_019BGJMBYuv8Bcrk7cxBqjUP
@tomtom215 tomtom215 merged commit b9e384e into main Apr 2, 2026
40 checks passed
@tomtom215 tomtom215 deleted the claude/analyze-benchmark-results-83Mvv branch April 6, 2026 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants