Skip to content

fix: stop double-counting partial offload in ranking#108

Merged
Andyyyy64 merged 1 commit into
mainfrom
fix/partial-offload-ranking
Jun 11, 2026
Merged

fix: stop double-counting partial offload in ranking#108
Andyyyy64 merged 1 commit into
mainfrom
fix/partial-offload-ranking

Conversation

@Andyyyy64

Copy link
Copy Markdown
Owner

Fixes #105.
Follow-up from #104.

What changed

This fixes a ranking bug where strong models that need partial offload could be buried below weaker full-GPU models.

The old score path counted VRAM fit more than once:

  • quality_score already applied the partial-offload quality multiplier
  • quality_score also included the speed adjustment
  • the final family sort key then added another +15 bonus for full_gpu

That made the displayed score and the actual rank disagree. On an RTX 3060 12GB-style setup, Qwen/Qwen3.6-27B could show a higher score than several models above it, but still land near the bottom because the sort key applied the extra full-GPU bonus.

This PR keeps final sorting closer to the displayed score. Full-GPU candidates are still favored through the runtime-fit and speed terms, but the sort key no longer adds a second full-GPU bonus.

I also made the partial-offload multiplier more granular:

  • light spill is penalized less than heavy spill
  • heavy dense partial offload is still penalized strongly
  • MoE models get a milder penalty when the active parameter working set can plausibly stay on GPU while inactive experts spill

Local QA

uv run pytest
# 331 passed

uv run python -m compileall -q src tests
# OK

Manual checks:

  • RTX 3060 12GB: Qwen/Qwen3.6-27B moves from buried near the bottom to the top group with a partial-offload warning.
  • A3000 6GB: heavy partial-offload models do not take over the top rank.
  • RTX 4060 8GB: Qwen/Qwen3.6-27B stays lower when the offload ratio and speed are not good enough.
  • RTX 4090 24GB and M2 16GB rankings still look sane.

@Andyyyy64 Andyyyy64 force-pushed the fix/partial-offload-ranking branch from 143ea1b to b3f3e98 Compare June 11, 2026 07:32
@Andyyyy64 Andyyyy64 merged commit 31c0334 into main Jun 11, 2026
4 checks passed
@Andyyyy64 Andyyyy64 deleted the fix/partial-offload-ranking branch June 11, 2026 07:46
@Andyyyy64 Andyyyy64 mentioned this pull request Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Strong models that need partial offload rank below weaker ones that fit

1 participant