Skip to content

fix(unofficial-run): auto-switch model when URL loads run for unselected model#243

Open
Oseltamivir wants to merge 10 commits intomasterfrom
feat/multi-unofficial-runs
Open

fix(unofficial-run): auto-switch model when URL loads run for unselected model#243
Oseltamivir wants to merge 10 commits intomasterfrom
feat/multi-unofficial-runs

Conversation

@Oseltamivir
Copy link
Copy Markdown
Contributor

Summary

Follow-up to #236. Navigating to `?unofficialrun=` when `g_model` isn't in the URL used to silently leave the dashboard on the default `DeepSeek-R1` model. If the run only contained data for a different model (e.g. run 24889121634 which only has DeepSeek-V4-Pro on MI355X), the user saw no overlay and had to manually switch the model dropdown.

Adds a one-shot `useEffect` in `GlobalFilterContext` that switches `selectedModel` to the first model the unofficial run contributes data for, when:

  • An unofficial run is loaded (`unofficialAvailable.length > 0`), AND
  • `g_model` wasn't provided in the URL (respect explicit user intent), AND
  • The current model isn't already covered by the overlay.

A ref guards against re-running on subsequent state changes, so manual model selections stick.

Test plan

  • `pnpm typecheck` — clean
  • `pnpm lint` / `pnpm fmt` — clean
  • `pnpm test:unit` — 1682 passed
  • Manual: `?unofficialrun=24889121634` auto-selects DeepSeek-V4-Pro; 1k/1k and 8k/1k both render overlay points

🤖 Generated with Claude Code

Oseltamivir and others added 7 commits April 24, 2026 05:48
Accept `?unofficialrun=123,456,789` on the dashboard URL to merge
benchmark and evaluation data from multiple GitHub Actions runs into
a single view. Each run's benchmarks are tagged with their originating
run_url for per-point traceability, and eval config ids are offset
per-run to avoid collisions in the merged set. A NON-OFFICIAL banner
is rendered per run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When multiple unofficial runs are loaded, overlay points/rooflines for
the same GPU were rendered in identical colors, making it impossible to
tell runs apart. Derive a per-run hue rotation from the run's position
in the loaded set and apply it via CSS filter — run 0 unchanged, each
subsequent run shifted by 55°. Roofline grouping now includes runIndex
so each run gets its own Pareto front.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BarChartD3's X-mark overlay points and their error-bar groups now use
the same per-run hue rotation as the inference scatter overlay, so runs
loaded via a comma-separated unofficialrun= list are visually separable
on the evaluation tab too. Extracts the shared filter and runIndex
helpers into lib/overlay-run-style.ts to avoid duplication.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Benchmark artifacts for DeepSeek-V4-Pro runs (e.g. run 24884703163)
emit `infmax_model_prefix: "dsv4pro"` while the canonical DB key is
`dsv4`. Without an alias the prefix resolver fell through all three
strategies (direct match, alias table, precision-suffix strip) and
every row was dropped as `unmappedModel`, so unofficial-run queries
for these runs returned an empty benchmark set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three stacked fixes so multiple unofficial runs don't all look the same:

1. Include overlay hw keys in the vendor-color active set so overlay
   strokes get a real hue instead of the muted-foreground fallback —
   hue-rotate on gray is a no-op, which was the main reason runs
   appeared identical.
2. Strengthen the per-run CSS filter: saturate(2.2) hue-rotate brightness(1.1),
   and widen the hue step from 55° to 80° for more separation.
3. Use a different stroke-dasharray per run index on overlay rooflines so
   runs stay distinguishable even when the filter can't produce a shift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CSS-filter approach made the legend and chart diverge: the legend
rendered each overlay hwKey's vendor color (red for MI355X), while the
chart stroke got the same base color *plus* a hue-rotate filter that
shifted it to an unrelated hue. Since the legend's colored dot is a
direct backgroundColor style, there was no clean way to apply the same
filter to it.

Switch to an explicit OKLch palette indexed by run order — both the
overlay stroke and the legend swatch read from the same palette, so
they match exactly. Restructure the overlay legend section to show one
entry per loaded run (branch name) rather than per-hardware, since N
runs × M hardware keys can't collapse to a single color per hw.

Hardware identity for overlay points is still visible in the point
label and tooltip; the X-mark shape and legend branch labels carry the
run identity. Roofline dash-pattern per run is kept as a secondary
(colorblind-friendly) encoding.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…selected model

Navigating to ?unofficialrun=<id> when `g_model` isn't set in the URL
used to silently leave the dashboard on the default DeepSeek-R1 model.
If the run only contained data for a different model (e.g. the
DeepSeek-V4-Pro run 24889121634 on MI355X), the user saw no overlay
and had to know to manually switch the model dropdown.

Now, when an unofficial run is loaded and `g_model` wasn't provided,
auto-switch to the first model the run contributes data for — once,
so subsequent manual selections stick.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 24, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
inferencemax-app Ready Ready Preview, Comment Apr 25, 2026 11:10pm

Request Review

The previous auto-switch used a one-shot ref, so navigating from one
unofficial run to another in the same session (e.g. swapping the runId
in the URL) wouldn't re-evaluate which model to land on. If a user had
been viewing run A on DeepSeek-V4-Pro and then navigated to run B that
also has DeepSeek-V4-Pro data, that's fine — but if run B has data for
a different model and the user happens to currently sit on a model
that B doesn't cover, they'd see an empty chart with no overlay.

Switch the guard to a stringified key of the (model, sequence) set
from the current unofficial run, so each new run set re-evaluates the
switch. Manual model changes while the same run is loaded still stick
because the key doesn't change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Run 24936260529 uses hw: "gb300-cw" which wasn't recognized.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…al-runs

# Conflicts:
#	packages/app/src/components/evaluation/ui/BarChartD3.tsx
#	packages/app/src/components/unofficial-run-provider.tsx
#	packages/app/src/lib/overlay-run-style.ts
#	packages/db/src/etl/normalizers.test.ts
#	packages/db/src/etl/normalizers.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant