fix: stop double-counting partial offload in ranking by Andyyyy64 · Pull Request #108 · Andyyyy64/whichllm

Andyyyy64 · 2026-06-11T07:27:57Z

Fixes #105.
Follow-up from #104.

What changed

This fixes a ranking bug where strong models that need partial offload could be buried below weaker full-GPU models.

The old score path counted VRAM fit more than once:

quality_score already applied the partial-offload quality multiplier
quality_score also included the speed adjustment
the final family sort key then added another +15 bonus for full_gpu

That made the displayed score and the actual rank disagree. On an RTX 3060 12GB-style setup, Qwen/Qwen3.6-27B could show a higher score than several models above it, but still land near the bottom because the sort key applied the extra full-GPU bonus.

This PR keeps final sorting closer to the displayed score. Full-GPU candidates are still favored through the runtime-fit and speed terms, but the sort key no longer adds a second full-GPU bonus.

I also made the partial-offload multiplier more granular:

light spill is penalized less than heavy spill
heavy dense partial offload is still penalized strongly
MoE models get a milder penalty when the active parameter working set can plausibly stay on GPU while inactive experts spill

Local QA

uv run pytest
# 331 passed

uv run python -m compileall -q src tests
# OK

Manual checks:

RTX 3060 12GB: Qwen/Qwen3.6-27B moves from buried near the bottom to the top group with a partial-offload warning.
A3000 6GB: heavy partial-offload models do not take over the top rank.
RTX 4060 8GB: Qwen/Qwen3.6-27B stays lower when the offload ratio and speed are not good enough.
RTX 4090 24GB and M2 16GB rankings still look sane.

Andyyyy64 mentioned this pull request Jun 11, 2026

Results not good for my GPU #104

Open

fix: stop double-counting partial offload in ranking

b3f3e98

Andyyyy64 force-pushed the fix/partial-offload-ranking branch from 143ea1b to b3f3e98 Compare June 11, 2026 07:32

Andyyyy64 merged commit 31c0334 into main Jun 11, 2026
4 checks passed

Andyyyy64 deleted the fix/partial-offload-ranking branch June 11, 2026 07:46

Andyyyy64 mentioned this pull request Jun 11, 2026

release: v0.5.10 #109

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: stop double-counting partial offload in ranking#108

fix: stop double-counting partial offload in ranking#108
Andyyyy64 merged 1 commit into
mainfrom
fix/partial-offload-ranking

Andyyyy64 commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Andyyyy64 commented Jun 11, 2026

What changed

Local QA

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant