Refactor dashboard UI and expand benchmark coverage to v0.5.0 by tomtom215 · Pull Request #80 · tomtom215/a2a-rust

tomtom215 · 2026-04-02T12:43:41Z

Summary

This PR significantly refactors the benchmark dashboard UI for improved responsiveness and clarity, expands benchmark coverage with new test categories, and bumps the SDK version to 0.5.0.

Key Changes

Dashboard UI Refactoring

Responsive grid system: Replaced fixed aspect-ratio charts with explicit height values (220px, 180px, 280px) and added size variants (chart-wrap-sm, chart-wrap-lg)
Improved spacing: Reduced padding and margins throughout (e.g., header padding 1.5rem → 1.25rem, card padding 1rem → 0.75rem)
Better typography: Adjusted font sizes for improved readability on smaller screens
Enhanced scrolling: Added scrollbar-width: none and webkit scrollbar hiding for tabs; added table scroll container with max-height
New layout classes: Added .grid-full (single column), .grid-wide (2-column), and .card-full for flexible layouts
Table styling: Introduced .bench-table with sticky headers, hover effects, and group headers for structured data display
Search & filtering: Added .search-box and .result-count styles for data exploration

Chart & Visualization Improvements

Color palette expansion: Added slate and teal colors to the color list (8 colors total)
New chart type: Implemented hbarChart() for horizontal bar charts
Logarithmic scale support: Added optional logarithmic y-axis with custom tick formatting for lineChart()
Improved formatting: Enhanced fmt() function to handle nanosecond-to-microsecond/millisecond conversion intelligently
Chart.js defaults: Reduced font sizes and adjusted legend/tick styling for better density

Tab Organization

Split combined tabs: Separated "Transport & Concurrency" into "Transport" and "Concurrency" tabs
Separated serde tab: "Serde & Protocol" → "Serde"
New tabs: Added "Backpressure" and "All Results" tabs for comprehensive benchmark exploration
Improved navigation: Enhanced keyboard navigation with proper focus management

Benchmark Data Structure

Reorganized memory metrics: Moved alloc_counts to alloc_timing sub-object for better organization
Updated highlights: Changed serde floor benchmark key from protocol_type_serde/agent_card/serialize to protocol_type_serde/agent_card_serialize
New metrics: Added store operation benchmarks (store_save_ns, store_get_ns) and event queue metrics (queue_write_read)

Code Quality

Minification improvements: Shortened variable names (COLORS → C, COLOR_LIST → CL, fallback → fb) while maintaining readability
ES5 compatibility: Converted arrow functions and template literals to traditional function syntax for broader browser support
Removed comments: Cleaned up section comments for more compact code

Documentation & Version Updates

Updated SDK version from 0.4 to 0.5 across all documentation and examples
Updated benchmark script version to 0.5.0
Updated CITATION.cff version to 0.5.0
Enhanced deployment and testing documentation with updated feature flags

Notable Implementation Details

Chart sizing now uses explicit pixel heights instead of aspect ratios, enabling better control over responsive behavior
The fmt() function now intelligently converts nanoseconds to appropriate units (ns/µs/ms) based on magnitude
Table implementation includes sticky headers and scrollable containers for large datasets
Mobile breakpoints optimized for screens ≤640px and tablet view (641-1024px)

https://claude.ai/code/session_01NDPYAkSiGN9n17Cx6hVA8b

- Fix serde_floor_ns highlight key: use underscore separator matching criterion's actual directory naming (agent_card_serialize, not agent_card/serialize) - this was causing the Overview serde floor metric to show as 0/missing on the dashboard - Rename memory alloc_counts to alloc_timing with _ns suffix to accurately reflect that these values are wall-clock timing under the counting allocator, not raw allocation counts - Add concurrent_mixed/send_then_get benchmark to dashboard data extraction (was collected by criterion but not surfaced) https://claude.ai/code/session_01NDPYAkSiGN9n17Cx6hVA8b

…and mobile support - Fix serde_floor_ns highlight showing 0.0 (key mapping corrected) - Fix memory section key mismatch (alloc_timing keys now consistent between extractor and template) - Rebuild dashboard template with 10 tabs (added Concurrency, Backpressure, All Results) covering all 267 benchmarks - Add searchable/filterable All Results table with all raw measurements - Reduce chart heights from 16:10 aspect to fixed 200px (160px small) to eliminate excessive scrolling - Add horizontal bar charts for tenant resolvers and pagination walk - Add tablet breakpoint (641-1024px) for better responsive scaling - Display enterprise subsections: eviction, large history, hot reload, rate limiting, handler limits, cancel task - Display production subsections: push config CRUD, dispatch routing, cancel/subscribe race, cross-language baselines - Display backpressure details: slow consumer, timer calibration, concurrent streams - Update benchmark count from 237 to 267 across all docs (README, CHANGELOG, CONTRIBUTING, CI/CD book page, testing book page, workflows README) - Update CI/CD docs to reference dashboard generation step - Update benches/README.md architecture tree with dashboard/ and new scripts https://claude.ai/code/session_01JpJQZqzu84H7UdVRdNXmyf

The cross-language serialize benchmark (590 ns) was invisible in the Cross-Language Baselines bar chart because the other values are 1-6 ms (1000x larger). Moved serialize to a metric card ("590 ns") and kept only ms-scale values (Echo RT, Stream, Concurrent 50, Minimal) in the chart so all bars are visible and proportional. Verified via Playwright screenshots at desktop (1280x900), tablet (768x1024), and mobile (375x812) viewports. https://claude.ai/code/session_01JpJQZqzu84H7UdVRdNXmyf

Charts: - Single charts now use grid-full (1fr) instead of grid-wide (2-column) so they span the full viewport width on Overview, Transport, Serde, and Concurrency tabs - Paired charts remain in 2-column grid-wide layout (Enterprise, Backpressure, Memory, Production) - Serde type-level horizontal bar chart uses chart-wrap-lg (280px) for better label legibility with 14 bars - Increased default chart height from 200px to 220px All Results table: - Grouped by benchmark category with cyan group headers - Short names (group prefix stripped, shown indented under header) - Scrollable container (max-height: 600px) with sticky column headers - Search box with live result counter ("Showing N of 267 benchmarks") - Larger font size (.82rem body, .78rem cells) for readability - Group headers hide/show with filter to avoid orphaned sections Verified via Playwright at desktop (1280x900), tablet (768x1024), and mobile (375x812). https://claude.ai/code/session_01JpJQZqzu84H7UdVRdNXmyf

Five issues identified and fixed via systematic data audit: 1. Concurrency: Split into two charts (Transport ms vs Store us) because store values (30-180 us) were 40-240x smaller than sends/streams (1.5-7.2 ms), rendering store bars invisible on the shared axis. 2. Multi-Tenant Isolation: Convert Y-axis from raw nanoseconds (30,000-145,000) to microseconds (30-145 us) for readability. 3. Memory "Serialized Bytes per Payload Size": Renamed to "Serialize Timing by Payload Size" -- values are wall-clock nanoseconds under counting allocator, not byte counts. Previous title was factually incorrect. 4. Memory "History Allocation Scaling": Renamed to "History Serde Timing by Depth" -- values are timing in ns, not allocation counts. Previous title was misleading. 5. Serde Payload Scaling: Switched to logarithmic Y-axis because data spans 130 ns to 494,000 ns (3,800x range). Linear axis compressed small-payload values to zero. Log scale with K/M tick formatters now shows all four series clearly. https://claude.ai/code/session_01JpJQZqzu84H7UdVRdNXmyf

Update dependency version examples in all documentation, book pages, scripts, and metadata files from "0.4" to "0.5" to reflect the v0.5.0 release that includes the breaking TaskStore::save(&Task) change. Files updated: - README.md, crates/README.md: Quick start examples - CITATION.cff: Software version metadata (0.3.0 -> 0.5.0) - book/src/getting-started/installation.md: All dependency examples - book/src/getting-started/first-agent.md: SDK dependency - book/src/concepts/transport-layers.md: WebSocket/gRPC examples - book/src/building-agents/dispatchers.md: Transport feature examples - book/src/building-agents/stores.md: SQLite feature example - book/src/client/builder.md: gRPC client example - book/src/deployment/production.md: Tracing feature example - benches/scripts/run_benchmarks.sh: SDK version in summary JSON - benches/scripts/generate_book_page.sh: Version refs in generated text https://claude.ai/code/session_01JpJQZqzu84H7UdVRdNXmyf

- stores.md: Update TaskStore::save() and insert_if_absent() signatures from `task: Task` (owned) to `task: &'a Task` (borrowed) to match the v0.5.0 breaking change - handler.md: Update event_queue_capacity default from 64 to 256 to match DEFAULT_QUEUE_CAPACITY in source code (changed in v0.5.0) https://claude.ai/code/session_01JpJQZqzu84H7UdVRdNXmyf

…-sljPM Redesign benchmark dashboard UI and expand benchmark coverage

claude and others added 8 commits April 2, 2026 10:38

Merge pull request #79 from tomtom215/claude/review-benchmark-results…

d5d2811

…-sljPM Redesign benchmark dashboard UI and expand benchmark coverage

tomtom215 merged commit 949d3d2 into main Apr 2, 2026
40 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor dashboard UI and expand benchmark coverage to v0.5.0#80

Refactor dashboard UI and expand benchmark coverage to v0.5.0#80
tomtom215 merged 8 commits into
mainfrom
claude/review-benchmark-results-FXuYv

tomtom215 commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tomtom215 commented Apr 2, 2026

Summary

Key Changes

Dashboard UI Refactoring

Chart & Visualization Improvements

Tab Organization

Benchmark Data Structure

Code Quality

Documentation & Version Updates

Notable Implementation Details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants