fix(unofficial-run): auto-switch model when URL loads run for unselected model#243
Open
Oseltamivir wants to merge 10 commits intomasterfrom
Open
fix(unofficial-run): auto-switch model when URL loads run for unselected model#243Oseltamivir wants to merge 10 commits intomasterfrom
Oseltamivir wants to merge 10 commits intomasterfrom
Conversation
Accept `?unofficialrun=123,456,789` on the dashboard URL to merge benchmark and evaluation data from multiple GitHub Actions runs into a single view. Each run's benchmarks are tagged with their originating run_url for per-point traceability, and eval config ids are offset per-run to avoid collisions in the merged set. A NON-OFFICIAL banner is rendered per run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When multiple unofficial runs are loaded, overlay points/rooflines for the same GPU were rendered in identical colors, making it impossible to tell runs apart. Derive a per-run hue rotation from the run's position in the loaded set and apply it via CSS filter — run 0 unchanged, each subsequent run shifted by 55°. Roofline grouping now includes runIndex so each run gets its own Pareto front. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BarChartD3's X-mark overlay points and their error-bar groups now use the same per-run hue rotation as the inference scatter overlay, so runs loaded via a comma-separated unofficialrun= list are visually separable on the evaluation tab too. Extracts the shared filter and runIndex helpers into lib/overlay-run-style.ts to avoid duplication. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Benchmark artifacts for DeepSeek-V4-Pro runs (e.g. run 24884703163) emit `infmax_model_prefix: "dsv4pro"` while the canonical DB key is `dsv4`. Without an alias the prefix resolver fell through all three strategies (direct match, alias table, precision-suffix strip) and every row was dropped as `unmappedModel`, so unofficial-run queries for these runs returned an empty benchmark set. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three stacked fixes so multiple unofficial runs don't all look the same: 1. Include overlay hw keys in the vendor-color active set so overlay strokes get a real hue instead of the muted-foreground fallback — hue-rotate on gray is a no-op, which was the main reason runs appeared identical. 2. Strengthen the per-run CSS filter: saturate(2.2) hue-rotate brightness(1.1), and widen the hue step from 55° to 80° for more separation. 3. Use a different stroke-dasharray per run index on overlay rooflines so runs stay distinguishable even when the filter can't produce a shift. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CSS-filter approach made the legend and chart diverge: the legend rendered each overlay hwKey's vendor color (red for MI355X), while the chart stroke got the same base color *plus* a hue-rotate filter that shifted it to an unrelated hue. Since the legend's colored dot is a direct backgroundColor style, there was no clean way to apply the same filter to it. Switch to an explicit OKLch palette indexed by run order — both the overlay stroke and the legend swatch read from the same palette, so they match exactly. Restructure the overlay legend section to show one entry per loaded run (branch name) rather than per-hardware, since N runs × M hardware keys can't collapse to a single color per hw. Hardware identity for overlay points is still visible in the point label and tooltip; the X-mark shape and legend branch labels carry the run identity. Roofline dash-pattern per run is kept as a secondary (colorblind-friendly) encoding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…selected model Navigating to ?unofficialrun=<id> when `g_model` isn't set in the URL used to silently leave the dashboard on the default DeepSeek-R1 model. If the run only contained data for a different model (e.g. the DeepSeek-V4-Pro run 24889121634 on MI355X), the user saw no overlay and had to know to manually switch the model dropdown. Now, when an unofficial run is loaded and `g_model` wasn't provided, auto-switch to the first model the run contributes data for — once, so subsequent manual selections stick. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
The previous auto-switch used a one-shot ref, so navigating from one unofficial run to another in the same session (e.g. swapping the runId in the URL) wouldn't re-evaluate which model to land on. If a user had been viewing run A on DeepSeek-V4-Pro and then navigated to run B that also has DeepSeek-V4-Pro data, that's fine — but if run B has data for a different model and the user happens to currently sit on a model that B doesn't cover, they'd see an empty chart with no overlay. Switch the guard to a stringified key of the (model, sequence) set from the current unofficial run, so each new run set re-evaluates the switch. Manual model changes while the same run is loaded still stick because the key doesn't change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Run 24936260529 uses hw: "gb300-cw" which wasn't recognized. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…al-runs # Conflicts: # packages/app/src/components/evaluation/ui/BarChartD3.tsx # packages/app/src/components/unofficial-run-provider.tsx # packages/app/src/lib/overlay-run-style.ts # packages/db/src/etl/normalizers.test.ts # packages/db/src/etl/normalizers.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #236. Navigating to `?unofficialrun=` when `g_model` isn't in the URL used to silently leave the dashboard on the default `DeepSeek-R1` model. If the run only contained data for a different model (e.g. run 24889121634 which only has DeepSeek-V4-Pro on MI355X), the user saw no overlay and had to manually switch the model dropdown.
Adds a one-shot `useEffect` in `GlobalFilterContext` that switches `selectedModel` to the first model the unofficial run contributes data for, when:
A ref guards against re-running on subsequent state changes, so manual model selections stick.
Test plan
🤖 Generated with Claude Code