Skip to content

Refactor dashboard UI and expand benchmark coverage to v0.5.0#80

Merged
tomtom215 merged 8 commits into
mainfrom
claude/review-benchmark-results-FXuYv
Apr 2, 2026
Merged

Refactor dashboard UI and expand benchmark coverage to v0.5.0#80
tomtom215 merged 8 commits into
mainfrom
claude/review-benchmark-results-FXuYv

Conversation

@tomtom215
Copy link
Copy Markdown
Owner

Summary

This PR significantly refactors the benchmark dashboard UI for improved responsiveness and clarity, expands benchmark coverage with new test categories, and bumps the SDK version to 0.5.0.

Key Changes

Dashboard UI Refactoring

  • Responsive grid system: Replaced fixed aspect-ratio charts with explicit height values (220px, 180px, 280px) and added size variants (chart-wrap-sm, chart-wrap-lg)
  • Improved spacing: Reduced padding and margins throughout (e.g., header padding 1.5rem → 1.25rem, card padding 1rem → 0.75rem)
  • Better typography: Adjusted font sizes for improved readability on smaller screens
  • Enhanced scrolling: Added scrollbar-width: none and webkit scrollbar hiding for tabs; added table scroll container with max-height
  • New layout classes: Added .grid-full (single column), .grid-wide (2-column), and .card-full for flexible layouts
  • Table styling: Introduced .bench-table with sticky headers, hover effects, and group headers for structured data display
  • Search & filtering: Added .search-box and .result-count styles for data exploration

Chart & Visualization Improvements

  • Color palette expansion: Added slate and teal colors to the color list (8 colors total)
  • New chart type: Implemented hbarChart() for horizontal bar charts
  • Logarithmic scale support: Added optional logarithmic y-axis with custom tick formatting for lineChart()
  • Improved formatting: Enhanced fmt() function to handle nanosecond-to-microsecond/millisecond conversion intelligently
  • Chart.js defaults: Reduced font sizes and adjusted legend/tick styling for better density

Tab Organization

  • Split combined tabs: Separated "Transport & Concurrency" into "Transport" and "Concurrency" tabs
  • Separated serde tab: "Serde & Protocol" → "Serde"
  • New tabs: Added "Backpressure" and "All Results" tabs for comprehensive benchmark exploration
  • Improved navigation: Enhanced keyboard navigation with proper focus management

Benchmark Data Structure

  • Reorganized memory metrics: Moved alloc_counts to alloc_timing sub-object for better organization
  • Updated highlights: Changed serde floor benchmark key from protocol_type_serde/agent_card/serialize to protocol_type_serde/agent_card_serialize
  • New metrics: Added store operation benchmarks (store_save_ns, store_get_ns) and event queue metrics (queue_write_read)

Code Quality

  • Minification improvements: Shortened variable names (COLORS → C, COLOR_LIST → CL, fallback → fb) while maintaining readability
  • ES5 compatibility: Converted arrow functions and template literals to traditional function syntax for broader browser support
  • Removed comments: Cleaned up section comments for more compact code

Documentation & Version Updates

  • Updated SDK version from 0.4 to 0.5 across all documentation and examples
  • Updated benchmark script version to 0.5.0
  • Updated CITATION.cff version to 0.5.0
  • Enhanced deployment and testing documentation with updated feature flags

Notable Implementation Details

  • Chart sizing now uses explicit pixel heights instead of aspect ratios, enabling better control over responsive behavior
  • The fmt() function now intelligently converts nanoseconds to appropriate units (ns/µs/ms) based on magnitude
  • Table implementation includes sticky headers and scrollable containers for large datasets
  • Mobile breakpoints optimized for screens ≤640px and tablet view (641-1024px)

https://claude.ai/code/session_01NDPYAkSiGN9n17Cx6hVA8b

claude and others added 8 commits April 2, 2026 10:38
- Fix serde_floor_ns highlight key: use underscore separator matching
  criterion's actual directory naming (agent_card_serialize, not
  agent_card/serialize) - this was causing the Overview serde floor
  metric to show as 0/missing on the dashboard
- Rename memory alloc_counts to alloc_timing with _ns suffix to
  accurately reflect that these values are wall-clock timing under the
  counting allocator, not raw allocation counts
- Add concurrent_mixed/send_then_get benchmark to dashboard data
  extraction (was collected by criterion but not surfaced)

https://claude.ai/code/session_01NDPYAkSiGN9n17Cx6hVA8b
…and mobile support

- Fix serde_floor_ns highlight showing 0.0 (key mapping corrected)
- Fix memory section key mismatch (alloc_timing keys now consistent
  between extractor and template)
- Rebuild dashboard template with 10 tabs (added Concurrency,
  Backpressure, All Results) covering all 267 benchmarks
- Add searchable/filterable All Results table with all raw measurements
- Reduce chart heights from 16:10 aspect to fixed 200px (160px small)
  to eliminate excessive scrolling
- Add horizontal bar charts for tenant resolvers and pagination walk
- Add tablet breakpoint (641-1024px) for better responsive scaling
- Display enterprise subsections: eviction, large history, hot reload,
  rate limiting, handler limits, cancel task
- Display production subsections: push config CRUD, dispatch routing,
  cancel/subscribe race, cross-language baselines
- Display backpressure details: slow consumer, timer calibration,
  concurrent streams
- Update benchmark count from 237 to 267 across all docs (README,
  CHANGELOG, CONTRIBUTING, CI/CD book page, testing book page,
  workflows README)
- Update CI/CD docs to reference dashboard generation step
- Update benches/README.md architecture tree with dashboard/ and
  new scripts

https://claude.ai/code/session_01JpJQZqzu84H7UdVRdNXmyf
The cross-language serialize benchmark (590 ns) was invisible in the
Cross-Language Baselines bar chart because the other values are 1-6 ms
(1000x larger). Moved serialize to a metric card ("590 ns") and kept
only ms-scale values (Echo RT, Stream, Concurrent 50, Minimal) in the
chart so all bars are visible and proportional.

Verified via Playwright screenshots at desktop (1280x900), tablet
(768x1024), and mobile (375x812) viewports.

https://claude.ai/code/session_01JpJQZqzu84H7UdVRdNXmyf
Charts:
- Single charts now use grid-full (1fr) instead of grid-wide (2-column)
  so they span the full viewport width on Overview, Transport, Serde,
  and Concurrency tabs
- Paired charts remain in 2-column grid-wide layout (Enterprise,
  Backpressure, Memory, Production)
- Serde type-level horizontal bar chart uses chart-wrap-lg (280px) for
  better label legibility with 14 bars
- Increased default chart height from 200px to 220px

All Results table:
- Grouped by benchmark category with cyan group headers
- Short names (group prefix stripped, shown indented under header)
- Scrollable container (max-height: 600px) with sticky column headers
- Search box with live result counter ("Showing N of 267 benchmarks")
- Larger font size (.82rem body, .78rem cells) for readability
- Group headers hide/show with filter to avoid orphaned sections

Verified via Playwright at desktop (1280x900), tablet (768x1024),
and mobile (375x812).

https://claude.ai/code/session_01JpJQZqzu84H7UdVRdNXmyf
Five issues identified and fixed via systematic data audit:

1. Concurrency: Split into two charts (Transport ms vs Store us)
   because store values (30-180 us) were 40-240x smaller than
   sends/streams (1.5-7.2 ms), rendering store bars invisible on
   the shared axis.

2. Multi-Tenant Isolation: Convert Y-axis from raw nanoseconds
   (30,000-145,000) to microseconds (30-145 us) for readability.

3. Memory "Serialized Bytes per Payload Size": Renamed to
   "Serialize Timing by Payload Size" -- values are wall-clock
   nanoseconds under counting allocator, not byte counts. Previous
   title was factually incorrect.

4. Memory "History Allocation Scaling": Renamed to "History Serde
   Timing by Depth" -- values are timing in ns, not allocation
   counts. Previous title was misleading.

5. Serde Payload Scaling: Switched to logarithmic Y-axis because
   data spans 130 ns to 494,000 ns (3,800x range). Linear axis
   compressed small-payload values to zero. Log scale with K/M
   tick formatters now shows all four series clearly.

https://claude.ai/code/session_01JpJQZqzu84H7UdVRdNXmyf
Update dependency version examples in all documentation, book pages,
scripts, and metadata files from "0.4" to "0.5" to reflect the v0.5.0
release that includes the breaking TaskStore::save(&Task) change.

Files updated:
- README.md, crates/README.md: Quick start examples
- CITATION.cff: Software version metadata (0.3.0 -> 0.5.0)
- book/src/getting-started/installation.md: All dependency examples
- book/src/getting-started/first-agent.md: SDK dependency
- book/src/concepts/transport-layers.md: WebSocket/gRPC examples
- book/src/building-agents/dispatchers.md: Transport feature examples
- book/src/building-agents/stores.md: SQLite feature example
- book/src/client/builder.md: gRPC client example
- book/src/deployment/production.md: Tracing feature example
- benches/scripts/run_benchmarks.sh: SDK version in summary JSON
- benches/scripts/generate_book_page.sh: Version refs in generated text

https://claude.ai/code/session_01JpJQZqzu84H7UdVRdNXmyf
- stores.md: Update TaskStore::save() and insert_if_absent() signatures
  from `task: Task` (owned) to `task: &'a Task` (borrowed) to match the
  v0.5.0 breaking change
- handler.md: Update event_queue_capacity default from 64 to 256 to
  match DEFAULT_QUEUE_CAPACITY in source code (changed in v0.5.0)

https://claude.ai/code/session_01JpJQZqzu84H7UdVRdNXmyf
…-sljPM

Redesign benchmark dashboard UI and expand benchmark coverage
@tomtom215 tomtom215 merged commit 949d3d2 into main Apr 2, 2026
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants