Refactor dashboard UI and expand benchmark coverage to v0.5.0#80
Merged
Conversation
- Fix serde_floor_ns highlight key: use underscore separator matching criterion's actual directory naming (agent_card_serialize, not agent_card/serialize) - this was causing the Overview serde floor metric to show as 0/missing on the dashboard - Rename memory alloc_counts to alloc_timing with _ns suffix to accurately reflect that these values are wall-clock timing under the counting allocator, not raw allocation counts - Add concurrent_mixed/send_then_get benchmark to dashboard data extraction (was collected by criterion but not surfaced) https://claude.ai/code/session_01NDPYAkSiGN9n17Cx6hVA8b
…and mobile support - Fix serde_floor_ns highlight showing 0.0 (key mapping corrected) - Fix memory section key mismatch (alloc_timing keys now consistent between extractor and template) - Rebuild dashboard template with 10 tabs (added Concurrency, Backpressure, All Results) covering all 267 benchmarks - Add searchable/filterable All Results table with all raw measurements - Reduce chart heights from 16:10 aspect to fixed 200px (160px small) to eliminate excessive scrolling - Add horizontal bar charts for tenant resolvers and pagination walk - Add tablet breakpoint (641-1024px) for better responsive scaling - Display enterprise subsections: eviction, large history, hot reload, rate limiting, handler limits, cancel task - Display production subsections: push config CRUD, dispatch routing, cancel/subscribe race, cross-language baselines - Display backpressure details: slow consumer, timer calibration, concurrent streams - Update benchmark count from 237 to 267 across all docs (README, CHANGELOG, CONTRIBUTING, CI/CD book page, testing book page, workflows README) - Update CI/CD docs to reference dashboard generation step - Update benches/README.md architecture tree with dashboard/ and new scripts https://claude.ai/code/session_01JpJQZqzu84H7UdVRdNXmyf
The cross-language serialize benchmark (590 ns) was invisible in the
Cross-Language Baselines bar chart because the other values are 1-6 ms
(1000x larger). Moved serialize to a metric card ("590 ns") and kept
only ms-scale values (Echo RT, Stream, Concurrent 50, Minimal) in the
chart so all bars are visible and proportional.
Verified via Playwright screenshots at desktop (1280x900), tablet
(768x1024), and mobile (375x812) viewports.
https://claude.ai/code/session_01JpJQZqzu84H7UdVRdNXmyf
Charts:
- Single charts now use grid-full (1fr) instead of grid-wide (2-column)
so they span the full viewport width on Overview, Transport, Serde,
and Concurrency tabs
- Paired charts remain in 2-column grid-wide layout (Enterprise,
Backpressure, Memory, Production)
- Serde type-level horizontal bar chart uses chart-wrap-lg (280px) for
better label legibility with 14 bars
- Increased default chart height from 200px to 220px
All Results table:
- Grouped by benchmark category with cyan group headers
- Short names (group prefix stripped, shown indented under header)
- Scrollable container (max-height: 600px) with sticky column headers
- Search box with live result counter ("Showing N of 267 benchmarks")
- Larger font size (.82rem body, .78rem cells) for readability
- Group headers hide/show with filter to avoid orphaned sections
Verified via Playwright at desktop (1280x900), tablet (768x1024),
and mobile (375x812).
https://claude.ai/code/session_01JpJQZqzu84H7UdVRdNXmyf
Five issues identified and fixed via systematic data audit: 1. Concurrency: Split into two charts (Transport ms vs Store us) because store values (30-180 us) were 40-240x smaller than sends/streams (1.5-7.2 ms), rendering store bars invisible on the shared axis. 2. Multi-Tenant Isolation: Convert Y-axis from raw nanoseconds (30,000-145,000) to microseconds (30-145 us) for readability. 3. Memory "Serialized Bytes per Payload Size": Renamed to "Serialize Timing by Payload Size" -- values are wall-clock nanoseconds under counting allocator, not byte counts. Previous title was factually incorrect. 4. Memory "History Allocation Scaling": Renamed to "History Serde Timing by Depth" -- values are timing in ns, not allocation counts. Previous title was misleading. 5. Serde Payload Scaling: Switched to logarithmic Y-axis because data spans 130 ns to 494,000 ns (3,800x range). Linear axis compressed small-payload values to zero. Log scale with K/M tick formatters now shows all four series clearly. https://claude.ai/code/session_01JpJQZqzu84H7UdVRdNXmyf
Update dependency version examples in all documentation, book pages, scripts, and metadata files from "0.4" to "0.5" to reflect the v0.5.0 release that includes the breaking TaskStore::save(&Task) change. Files updated: - README.md, crates/README.md: Quick start examples - CITATION.cff: Software version metadata (0.3.0 -> 0.5.0) - book/src/getting-started/installation.md: All dependency examples - book/src/getting-started/first-agent.md: SDK dependency - book/src/concepts/transport-layers.md: WebSocket/gRPC examples - book/src/building-agents/dispatchers.md: Transport feature examples - book/src/building-agents/stores.md: SQLite feature example - book/src/client/builder.md: gRPC client example - book/src/deployment/production.md: Tracing feature example - benches/scripts/run_benchmarks.sh: SDK version in summary JSON - benches/scripts/generate_book_page.sh: Version refs in generated text https://claude.ai/code/session_01JpJQZqzu84H7UdVRdNXmyf
- stores.md: Update TaskStore::save() and insert_if_absent() signatures from `task: Task` (owned) to `task: &'a Task` (borrowed) to match the v0.5.0 breaking change - handler.md: Update event_queue_capacity default from 64 to 256 to match DEFAULT_QUEUE_CAPACITY in source code (changed in v0.5.0) https://claude.ai/code/session_01JpJQZqzu84H7UdVRdNXmyf
…-sljPM Redesign benchmark dashboard UI and expand benchmark coverage
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR significantly refactors the benchmark dashboard UI for improved responsiveness and clarity, expands benchmark coverage with new test categories, and bumps the SDK version to 0.5.0.
Key Changes
Dashboard UI Refactoring
220px,180px,280px) and added size variants (chart-wrap-sm,chart-wrap-lg)1.5rem → 1.25rem, card padding1rem → 0.75rem)scrollbar-width: noneand webkit scrollbar hiding for tabs; added table scroll container with max-height.grid-full(single column),.grid-wide(2-column), and.card-fullfor flexible layouts.bench-tablewith sticky headers, hover effects, and group headers for structured data display.search-boxand.result-countstyles for data explorationChart & Visualization Improvements
hbarChart()for horizontal bar chartslineChart()fmt()function to handle nanosecond-to-microsecond/millisecond conversion intelligentlyTab Organization
Benchmark Data Structure
alloc_countstoalloc_timingsub-object for better organizationprotocol_type_serde/agent_card/serializetoprotocol_type_serde/agent_card_serializestore_save_ns,store_get_ns) and event queue metrics (queue_write_read)Code Quality
COLORS → C,COLOR_LIST → CL,fallback → fb) while maintaining readabilityDocumentation & Version Updates
0.4to0.5across all documentation and examples0.5.00.5.0Notable Implementation Details
fmt()function now intelligently converts nanoseconds to appropriate units (ns/µs/ms) based on magnitudehttps://claude.ai/code/session_01NDPYAkSiGN9n17Cx6hVA8b