Unslop the codebase: pay down ai slop in oversized core modules

## Problem

The codebase works, but several core modules have accumulated too much responsibility in single files. This is intentionally blunt: parts of the implementation now read like ai slop, where CLI orchestration, external I/O, domain rules, scoring heuristics, rendering, and curated data all sit together.

That makes future ranking changes riskier than they need to be. Small edits to benchmark logic, model fetching, output formatting, or CLI behavior can have a wide blast radius because the boundaries are not clean.

## Current hotspots

Measured with `wc -l`:

```text
1087  src/whichllm/cli.py
 880  src/whichllm/models/fetcher.py
 835  src/whichllm/output/display.py
 803  src/whichllm/engine/ranker.py
 751  src/whichllm/models/benchmark.py
 404  src/whichllm/constants.py
```

Tests with similar size pressure:

```text
574  tests/test_cli.py
553  tests/test_ranker.py
500  tests/test_p1_p3_regressions.py
459  tests/test_r3_regressions.py
```

## Suggested direction

Refactor toward clearer responsibility boundaries:

- Keep `cli.py` as a thin Typer layer: option parsing, validation, and command dispatch only.
- Add application/use-case modules for command flows, e.g. `recommend`, `plan`, `upgrade`, `run`, `snippet`, and `hardware`.
- Split `ranker.py` into candidate generation, filtering, scoring, family deduplication, and ranking orchestration.
- Split `benchmark.py` into benchmark fetching, source normalization, score lookup, and evidence resolution.
- Split `fetcher.py` into HuggingFace client calls, model parsing, GGUF extraction, MoE/param overrides, and serialization.
- Split `display.py` by output surface: ranking, plan, upgrade, JSON, and shared formatting helpers.
- Move large curated registries out of `constants.py` where practical, or at least separate GPU, quantization, lineage, and model override data.

## Acceptance criteria

- Core runtime modules are below 400 lines where practical, excluding intentionally data-only files.
- CLI command functions do not contain model loading, benchmark fetching, ranking, and rendering logic inline.
- Ranking behavior is preserved by existing tests.
- The current regression suite still passes.
- New module boundaries make it possible to modify benchmark evidence, scoring, or Rich output without touching unrelated command code.

## Non-goals

- Do not change ranking behavior as part of the first cleanup unless required by extraction.
- Do not reformat the entire codebase mechanically.
- Do not remove existing regression tests; split them only if it improves maintainability.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unslop the codebase: pay down ai slop in oversized core modules #41

Problem

Current hotspots

Suggested direction

Acceptance criteria

Non-goals

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Unslop the codebase: pay down ai slop in oversized core modules #41

Description

Problem

Current hotspots

Suggested direction

Acceptance criteria

Non-goals

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions