Skip to content

Fix models not appearing: stop /api/models from re-downloading the catalog every call#776

Merged
michaelneale merged 3 commits into
mainfrom
micn/model-load-last-fix
Jun 2, 2026
Merged

Fix models not appearing: stop /api/models from re-downloading the catalog every call#776
michaelneale merged 3 commits into
mainfrom
micn/model-load-last-fix

Conversation

@michaelneale

Copy link
Copy Markdown
Collaborator

What this fixes

Models stopped showing up in the console because /api/models was hanging ~9s on every call and silently serving a stale (or empty) catalog.

Each /api/models request walks ensure_catalog(), which saw the staleness marker as expired and kicked off a synchronous HuggingFace download of the meshllm/catalog dataset. That download failed on the very first small file with a spurious size mismatch:

failed to refresh stale meshllm/catalog; using already-loaded stale catalog:
downloaded 818 bytes but expected 341 bytes for entries/.../*.json

The refresh aborted before touching the .last_refresh marker, so ensure_catalog() fell back to the stale cache and returned Ok — but the marker stayed stale, so the next request re-attempted the same multi-second download. That loop repeated forever, ~9s per call.

This was pre-existing on main (verified by building and timing a clean main checkout), not a regression from #773.

Root cause + the two fixes

1. The actual bug (hf-hub fork). The cache download path HEADs the resolve endpoint with a no-redirect client and gets a 307 whose own Content-Length (341) is the redirect body, not the file. With no X-Linked-Size, the fork trusted that length, then rejected the correctly-downloaded 818-byte file. Fixed in Mesh-LLM/hf-hub#2extract_file_size() no longer trusts a redirect's Content-Length. This PR points hf-hub at that fix branch.

2. Defense-in-depth (this crate). Add a 5-minute refresh backoff so that any future refresh failure (HF outage, auth, etc.) can't turn every request into a fresh network download when a stale catalog is already loaded. After a failed refresh with a loaded catalog, retries are suppressed for 5 minutes; a successful refresh clears the backoff immediately.

Validation

  • cargo fmt --all -- --check, cargo clippy -p mesh-llm-host-runtime --all-targets -- -D warnings clean.
  • New unit test refresh_backoff_suppresses_then_clears passes.
  • New ignored networked test refresh_catalog_live now downloads the live meshllm/catalog dataset end-to-end (80 entries); it fails on entry macos menu app #1 without the hf-hub fix.

Follow-up

Cargo.toml temporarily tracks the fork's micn/fix-redirect-content-length-size branch. Once Mesh-LLM/hf-hub#2 merges into the fork's mesh-llm branch, repoint hf-hub back to branch = "mesh-llm" and cargo update -p hf-hub.

The catalog refresh failed on every request because the hf-hub fork mistook a
307 redirect's Content-Length for the target file size, tripping the
post-download size check. ensure_catalog() then silently fell back to the
stale cache without refreshing the staleness marker, so every /api/models call
re-attempted the multi-second download forever (~9s per call).

- Point hf-hub at the redirect Content-Length fix branch (Mesh-LLM/hf-hub#2).
- Add a 5-minute refresh backoff so a failing refresh can't turn every request
  into a fresh network download when a stale catalog is already loaded.
- Add a unit test for the backoff and an ignored networked test that verifies
  the live meshllm/catalog dataset now downloads end-to-end.
Copilot AI review requested due to automatic review settings June 2, 2026 01:52

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes /api/models re-downloading the HuggingFace catalog on every call by (a) pointing hf-hub at a fork branch that fixes a redirect Content-Length mis-detection, and (b) adding a 5-minute refresh backoff so any future refresh failure (with a stale catalog already loaded) doesn't trigger a network attempt per request.

Changes:

  • Add CATALOG_REFRESH_BACKOFF_UNTIL state + helpers; consult it in ensure_catalog() when stale and a cached catalog exists; clear on successful refresh.
  • Repoint hf-hub patch to the micn/fix-redirect-content-length-size fork branch.
  • Add unit test for backoff lifecycle and an ignored networked test for live catalog refresh.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated no comments.

File Description
crates/mesh-llm-host-runtime/src/models/remote_catalog.rs Implements refresh backoff with helpers, integrates into ensure_catalog(), and adds tests.
Cargo.toml Temporarily repoints hf-hub patch to the fix branch (follow-up to revert noted in description).
Cargo.lock Updated to pick up the new hf-hub commit (and incidental dependency churn).

The hf-hub redirect fix makes the remote catalog actually downloadable in
CI again. parse_exact_model_ref consults the live catalog before the
Hugging Face parser branches, so the parse_exact_model_ref_accepts_* tests
now match a real catalog entry (e.g. unsloth/gemma-4-31B-it-GGUF) and return
ExactModelRef::Catalog instead of the HuggingFace ref they assert. These
tests were only passing because catalog downloads were broken.

Install an empty catalog override (and mark serial, since the override is
global) so the parser branches are tested in isolation.
Copilot AI review requested due to automatic review settings June 2, 2026 03:05

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated no new comments.

@michaelneale michaelneale merged commit c221017 into main Jun 2, 2026
36 checks passed
@michaelneale michaelneale deleted the micn/model-load-last-fix branch June 2, 2026 03:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants