Don't trust redirect Content-Length as file size#2
Merged
Conversation
extract_file_size fell back to a response's Content-Length when X-Linked-Size was absent. On a 3xx redirect from the HF resolve endpoint, Content-Length describes the redirect body (e.g. 341 bytes), not the target file (e.g. 818 bytes). The cache download path does its HEAD with a no-redirect client, so it saw the 307 and recorded the wrong size. The post-download bytes_read != total_bytes check then failed for every such file. Only fall back to Content-Length for non-redirect responses; on a redirect without X-Linked-Size, return None so the size is treated as unknown and the strict size check is skipped.
There was a problem hiding this comment.
Pull request overview
Fixes a bug where extract_file_size() would incorrectly use the Content-Length header from a 3xx redirect response (which describes the redirect body, not the target file), causing spurious size-mismatch errors for small files downloaded via the HF resolve endpoint.
Changes:
extract_file_size()now prefersX-Linked-Size, returnsNoneon redirects lacking it, and only falls back toContent-Lengthon non-redirect responses.- Added 4 unit tests for the new behavior, with a
http = "1"dev-dependency to construct test responses.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| hf-hub/src/repository/files.rs | Rework extract_file_size logic and add unit tests for redirect/linked-size handling. |
| hf-hub/Cargo.toml | Add http = "1" to dev-dependencies for constructing test reqwest::Responses. |
| Cargo.lock | Lockfile update reflecting the new http dev-dependency. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
i386
approved these changes
Jun 2, 2026
i386
left a comment
Collaborator
There was a problem hiding this comment.
@michaelneale can you also open a PR upstream too please
michaelneale
added a commit
to Mesh-LLM/mesh-llm
that referenced
this pull request
Jun 2, 2026
…talog every call (#776) * fix: stop /api/models from re-downloading the catalog on every request The catalog refresh failed on every request because the hf-hub fork mistook a 307 redirect's Content-Length for the target file size, tripping the post-download size check. ensure_catalog() then silently fell back to the stale cache without refreshing the staleness marker, so every /api/models call re-attempted the multi-second download forever (~9s per call). - Point hf-hub at the redirect Content-Length fix branch (Mesh-LLM/hf-hub#2). - Add a 5-minute refresh backoff so a failing refresh can't turn every request into a fresh network download when a stale catalog is already loaded. - Add a unit test for the backoff and an ignored networked test that verifies the live meshllm/catalog dataset now downloads end-to-end. * test: isolate model-ref parser tests from live catalog The hf-hub redirect fix makes the remote catalog actually downloadable in CI again. parse_exact_model_ref consults the live catalog before the Hugging Face parser branches, so the parse_exact_model_ref_accepts_* tests now match a real catalog entry (e.g. unsloth/gemma-4-31B-it-GGUF) and return ExactModelRef::Catalog instead of the HuggingFace ref they assert. These tests were only passing because catalog downloads were broken. Install an empty catalog override (and mark serial, since the override is global) so the parser branches are tested in isolation. * deps: track merged hf-hub redirect size fix on fork mesh-llm branch
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this fixes
Small files downloaded through the HF
resolveendpoint could fail with a spurious size-mismatch error like:The cache download path issues a HEAD with a no-redirect client. The
resolveendpoint answers with a 307 redirect whose ownContent-Lengthdescribes the redirect body (e.g.341), not the target file. When the 307 carries noX-Linked-Size,extract_file_size()fell back to thatContent-Length, so the laterbytes_read != total_bytescheck rejected the correctly-downloaded file (e.g.818bytes).The change
extract_file_size()now:X-Linked-Sizeon any response (authoritative file size, safe even on redirects).Noneon a 3xx response whenX-Linked-Sizeis absent — we must not treat the redirect body'sContent-Lengthas the file size.Content-Lengthfor non-redirect responses.All callers already
.unwrap_or(0)and the post-download size check guards ontotal_bytes > 0, so returningNone(size unknown) is safe.Tests
Added 4 unit tests covering linked-size preference, non-redirect
Content-Length, redirectContent-Lengthrejection, and linked-size-on-redirect. Full suite (179 tests) passes.