Upload validation: format, size, codec#100
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThis PR adds early magic-byte validation to the video upload endpoint. Before any storage quota checks or R2 operations, the handler inspects the first 32 bytes of chunk 0 to detect the container format (MP4, WebM, MKV, MOV, 3GP, AVI, MPEG, Ogg, or FLV) and verifies it matches the filename extension, rejecting mismatches and unsupported container types with specific error codes. ChangesVideo Upload Magic Bytes Validation
🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly Related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsThese MCP integrations need to be re-authenticated in the Integrations settings: Sentry Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
ECC bundle files are already tracked in this repository. Skipping generation of another bundle PR. |
There was a problem hiding this comment.
Code Review
This pull request implements magic byte validation for video uploads to prevent rename-attacks and ensure file content matches supported containers such as MP4, MOV, MKV, AVI, and WebM. It introduces sniffContainer and validateMagicBytes utilities along with corresponding tests. A review comment identifies that a size guard in the upload route allows very small files to bypass these checks and recommends its removal to ensure strict validation for all uploads.
| if (rawFile.size >= MAGIC_BYTES_MIN) { | ||
| const headBuf = await rawFile.slice(0, MAGIC_BYTES_MIN).arrayBuffer(); | ||
| const magicError = validateMagicBytes({ | ||
| fileName: rawFile.name, | ||
| head: new Uint8Array(headBuf), | ||
| }); | ||
| if (magicError) { | ||
| return c.json({ error: magicError.message, code: magicError.code }, 400); | ||
| } | ||
| } |
There was a problem hiding this comment.
The if (rawFile.size >= MAGIC_BYTES_MIN) guard prevents magic byte validation for files smaller than 12 bytes. This allows very small files (e.g., a 1-byte file named video.mp4) to bypass the container sniffing check entirely, which contradicts the goal of ensuring the file content matches its extension. Since validateMagicBytes already handles headers shorter than MAGIC_BYTES_MIN by returning a header_too_short error, this guard should be removed to ensure all uploads are strictly validated.
const headBuf = await rawFile.slice(0, MAGIC_BYTES_MIN).arrayBuffer();
const magicError = validateMagicBytes({
fileName: rawFile.name,
head: new Uint8Array(headBuf),
});
if (magicError) {
return c.json({ error: magicError.message, code: magicError.code }, 400);
}There was a problem hiding this comment.
Code Review
This pull request implements magic byte validation for video uploads to prevent file extension spoofing (e.g., renaming an executable to .mp4). It introduces container sniffing for MP4, MOV, MKV, and AVI formats and integrates this check into the upload route. Feedback suggests removing a manual file size check before validation in the upload handler to allow the validation logic to handle short files consistently and return the correct error code.
| if (rawFile.size >= MAGIC_BYTES_MIN) { | ||
| const headBuf = await rawFile.slice(0, MAGIC_BYTES_MIN).arrayBuffer(); | ||
| const magicError = validateMagicBytes({ | ||
| fileName: rawFile.name, | ||
| head: new Uint8Array(headBuf), | ||
| }); | ||
| if (magicError) { | ||
| return c.json({ error: magicError.message, code: magicError.code }, 400); | ||
| } | ||
| } |
There was a problem hiding this comment.
The if (rawFile.size >= MAGIC_BYTES_MIN) guard prevents the validateMagicBytes function from returning the header_too_short error for files smaller than 12 bytes. Since validateMagicBytes already handles the length check internally and rawFile.slice() is safe to call even if the file is shorter than the requested length, this guard should be removed to ensure all files are properly validated against their claimed extension and to provide consistent error reporting for malformed or tiny files.
const headBuf = await rawFile.slice(0, MAGIC_BYTES_MIN).arrayBuffer();
const magicError = validateMagicBytes({
fileName: rawFile.name,
head: new Uint8Array(headBuf),
});
if (magicError) {
return c.json({ error: magicError.message, code: magicError.code }, 400);
}There was a problem hiding this comment.
💡 Codex Review
spooool/src/workers/upload-validation.ts
Line 197 in b16fe88
For MP4/MOV detection, brand validation is only enforced when sniff.brand is truthy. Because sniffContainer trims the 4-byte brand field, an attacker can use a header like ....ftyp (blank brand) to get family: 'mp4' with brand: '', which skips the allowlist and is accepted for .mp4/.mov. This creates an easy bypass of the new codec-brand guard for crafted non-video payloads.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // actually one of our supported containers. Rejects rename-attacks | ||
| // (e.g. .exe → .mp4) and obscure quasi-mp4 containers before we pay | ||
| // for an R2 write or open a multipart upload. | ||
| if (rawFile.size >= MAGIC_BYTES_MIN) { |
There was a problem hiding this comment.
Reject short first chunks instead of skipping header validation
The new container gate is bypassable because this branch skips validateMagicBytes whenever chunk 0 is under 12 bytes. A client can submit chunkIndex=0 with a tiny first chunk (for example 1 byte) and chunkCount>1, then upload arbitrary bytes in later chunks; the upload session is created and R2 writes proceed without any magic-byte/container check. That defeats the intended rename-attack protection and leaves the new header_too_short path unreachable in production.
Useful? React with 👍 / 👎.
…O-140) Reject non-video uploads (renamed images, malformed files, unsupported ISO BMFF brands) before R2 writes by sniffing the first 32 bytes of chunk 0 and matching the detected container against the file extension. - sniffContainer recognizes mp4/mov/3gp via 'ftyp', webm/mkv via EBML, plus AVI, MPEG (PS/TS), Ogg, FLV signatures. - ISO BMFF major-brand allowlist gates the codec surface (avc1/av01/hev1/...). - validateMagicBytes wires brand + container into the chunk-0 init path in videos.ts before createMultipartUpload or single-shot R2.put runs.
b16fe88 to
b138091
Compare
|
ECC bundle files are already tracked in this repository. Skipping generation of another bundle PR. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (6)
src/workers/videos.test.ts (1)
8-24: 💤 Low value
mp4Bytestruncation branch is unreachable from current callers.
mp4Bytes(64)andmp4Bytes(200)are both ≥header.length(32), so thetotalSize >= header.length ? out : new Uint8Array(out.buffer.slice(0, totalSize))ternary always takes the first branch. If you intend to add a "header too short" test in the future, that path will be useful; otherwise it's dead code that could be simplified to justreturn out;.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/workers/videos.test.ts` around lines 8 - 24, The truncation branch in mp4Bytes is unreachable because callers pass totalSize values (e.g., 64 and 200) that are always >= header.length (32); simplify mp4Bytes by removing the ternary and always returning the constructed Uint8Array (out), or if you need the short-header behavior keep the ternary but ensure tests call mp4Bytes with totalSize < header.length to cover the truncation branch; update the function mp4Bytes accordingly (and remove dead slice logic if you choose the simplification).src/workers/upload-validation.test.ts (2)
278-280: 💤 Low valueMPEG sequence-header (0xb3) branch is uncovered.
sniffContaineraccepts both0xba(program stream) and0xb3(video sequence header) as MPEG markers, but the test fixture only exercises0xba. A one-line variant would cover the second branch and prevent regressions if either marker is dropped.🧪 Suggested addition
it('detects an mpeg program-stream header', () => { expect(sniffContainer(mpegPsHeader()).container).toBe('mpeg'); }); + + it('detects an mpeg video sequence header', () => { + expect( + sniffContainer(Uint8Array.from([0x00, 0x00, 0x01, 0xb3, 0x00, 0x00])).container, + ).toBe('mpeg'); + });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/workers/upload-validation.test.ts` around lines 278 - 280, The test only exercises the MPEG program-stream marker 0xBA; add a second test that covers the MPEG sequence-header marker 0xB3 so the sniffContainer branch for sequence headers is exercised. Create a small fixture (e.g., mpegSequenceHeader()) or extend mpegPsHeader to accept a marker param, then add an it(...) that calls expect(sniffContainer(mpegSequenceHeader()).container).toBe('mpeg') (or equivalent) to cover the 0xB3 branch.
304-378: 💤 Low valueSolid coverage; consider rounding out a few extension/format pairs.
The mismatch and brand-unsupported paths are well covered for the most common pairs (mp4↔webm, PNG-with-mp4, jpm brand). A few small additions would make the matrix complete:
- Acceptance cases for
.3gp,.flv,.ogv,.mpeg/.mpg(these helpers already exist).- A mismatch case where a non-ISO container is uploaded with a
.movextension, sinceEXT_TO_CONTAINERS.mov = {mov, mp4}is the only multi-entry value and worth pinning.Not blocking — happy with the current 16 added cases.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/workers/upload-validation.test.ts` around lines 304 - 378, Add tests expanding format/extension coverage by using existing header helpers (isoBmffHeader, ebmlHeader, mpegTsHeader, aviHeader and the 3gp/flv/ogv/mpeg helpers you already have) to assert acceptance for .3gp, .flv, .ogv and .mpeg/.mpg extensions (expect validateMagicBytes(...) toBeNull()), and add one mismatch test where a non-ISO container header (e.g., ebmlHeader()) is given a .mov filename to assert validateMagicBytes(...) returns code 'magic_bytes_mismatch' (this pins the special-case EXT_TO_CONTAINERS.mov mapping). Ensure each test references validateMagicBytes and the appropriate header helper so they sit alongside the existing cases.src/workers/upload-validation.ts (2)
215-217: 💤 Low value
readBrandassumesbytes[offset+3]exists.The only call site is at offset 8 inside the
bytes.length >= 12branch, so this is safe today. IfreadBrandis ever called from a new branch, the lack of an internal length check would silently produce a string withNaN-derived characters (becauseString.fromCharCode(undefined)returns'\u0000'after NaN coercion). A small guard or a contract-style assertion would future-proof this helper.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/workers/upload-validation.ts` around lines 215 - 217, readBrand currently reads four bytes without verifying they exist; add a guard inside the readBrand(bytes: Uint8Array, offset: number) helper to ensure bytes.length >= offset + 4 (or assert the same) before calling String.fromCharCode, and if the condition fails either throw a clear Error or return a safe default (e.g., empty string) so callers (including the existing offset 8 usage inside the bytes.length >= 12 branch) cannot produce NaN/undefined-derived characters.
285-292: ⚖️ Poor tradeoffMPEG-TS detection is effectively a single-byte check.
A 32-byte header is too short to verify the second sync byte at offset 188, so any file whose first byte happens to be
0x47and whose extension is.ts(or.mpg/.mpeg) will passvalidateMagicBytes. The downstream extension match doesn't strengthen this — the extension is the only thing that lets it through. Two options worth considering:
- Bump
MAGIC_HEADER_BYTESto ≥189 (e.g., 256 or 512) and verify the second0x47at offset 188; this materially tightens TS detection.- Or accept the current heuristic and rely on Stream to reject non-TS uploads, but document the gap explicitly as a known false-positive surface.
The current code does call out the limitation in the comment, so this is a hardening suggestion rather than a defect.
src/workers/videos.ts (1)
385-396: 💤 Low valueEarly magic-byte gate is wired correctly.
Sliced read is non-consuming, so
rawFile.stream()further down still emits the full payload; the gate runs after extension/MIME validation but before any R2 work, which is the right order. Returning{ error, code }keeps the response shape consistent with the other validation paths.One residual gap worth flagging in the PR description (not a blocker): only chunk 0 is sniffed, so a multipart upload that prefixes a valid
ftypand then writes arbitrary bytes for later parts will still complete multipart. Stream-side rejection eventually catches this, but the multipart abort path is more expensive than the chunk-0 reject. Consider whether that's worth addressing in a follow-up (e.g., final-chunk re-sniff or moov-atom check on completion).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/workers/videos.ts` around lines 385 - 396, Current code only sniffs chunk 0 via rawFile.slice(0, MAGIC_HEADER_BYTES) and validateMagicBytes, which leaves a gap for multipart uploads that can prepend a valid header and then send bad bytes in later parts; to fix, add a final validation step during multipart completion (e.g., in the multipart upload finalizer or the upload completion handler) that re-sniffs the assembled file (or the final chunk) for magic bytes and/or performs a moov-atom sanity check before committing the multipart upload—use the same validateMagicBytes function or a new validateMoovAtom check and ensure it rejects and aborts the multipart upload if validation fails, referencing rawFile.stream(), validateMagicBytes, MAGIC_HEADER_BYTES, and the multipart completion logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/workers/upload-validation.ts`:
- Around line 186-191: The allowlist ISO_BMFF_MP4_BRANDS contains audio-only
ftyp brands that should be removed; update the Set<string> named
ISO_BMFF_MP4_BRANDS to drop the audio-only entries 'M4A ', 'M4P ', and 'M4B '
while keeping video brands such as 'M4V ' intact so that audio-only files
renamed to .mp4 won't bypass the video-only validation in upload-validation.ts.
---
Nitpick comments:
In `@src/workers/upload-validation.test.ts`:
- Around line 278-280: The test only exercises the MPEG program-stream marker
0xBA; add a second test that covers the MPEG sequence-header marker 0xB3 so the
sniffContainer branch for sequence headers is exercised. Create a small fixture
(e.g., mpegSequenceHeader()) or extend mpegPsHeader to accept a marker param,
then add an it(...) that calls
expect(sniffContainer(mpegSequenceHeader()).container).toBe('mpeg') (or
equivalent) to cover the 0xB3 branch.
- Around line 304-378: Add tests expanding format/extension coverage by using
existing header helpers (isoBmffHeader, ebmlHeader, mpegTsHeader, aviHeader and
the 3gp/flv/ogv/mpeg helpers you already have) to assert acceptance for .3gp,
.flv, .ogv and .mpeg/.mpg extensions (expect validateMagicBytes(...)
toBeNull()), and add one mismatch test where a non-ISO container header (e.g.,
ebmlHeader()) is given a .mov filename to assert validateMagicBytes(...) returns
code 'magic_bytes_mismatch' (this pins the special-case EXT_TO_CONTAINERS.mov
mapping). Ensure each test references validateMagicBytes and the appropriate
header helper so they sit alongside the existing cases.
In `@src/workers/upload-validation.ts`:
- Around line 215-217: readBrand currently reads four bytes without verifying
they exist; add a guard inside the readBrand(bytes: Uint8Array, offset: number)
helper to ensure bytes.length >= offset + 4 (or assert the same) before calling
String.fromCharCode, and if the condition fails either throw a clear Error or
return a safe default (e.g., empty string) so callers (including the existing
offset 8 usage inside the bytes.length >= 12 branch) cannot produce
NaN/undefined-derived characters.
In `@src/workers/videos.test.ts`:
- Around line 8-24: The truncation branch in mp4Bytes is unreachable because
callers pass totalSize values (e.g., 64 and 200) that are always >=
header.length (32); simplify mp4Bytes by removing the ternary and always
returning the constructed Uint8Array (out), or if you need the short-header
behavior keep the ternary but ensure tests call mp4Bytes with totalSize <
header.length to cover the truncation branch; update the function mp4Bytes
accordingly (and remove dead slice logic if you choose the simplification).
In `@src/workers/videos.ts`:
- Around line 385-396: Current code only sniffs chunk 0 via rawFile.slice(0,
MAGIC_HEADER_BYTES) and validateMagicBytes, which leaves a gap for multipart
uploads that can prepend a valid header and then send bad bytes in later parts;
to fix, add a final validation step during multipart completion (e.g., in the
multipart upload finalizer or the upload completion handler) that re-sniffs the
assembled file (or the final chunk) for magic bytes and/or performs a moov-atom
sanity check before committing the multipart upload—use the same
validateMagicBytes function or a new validateMoovAtom check and ensure it
rejects and aborts the multipart upload if validation fails, referencing
rawFile.stream(), validateMagicBytes, MAGIC_HEADER_BYTES, and the multipart
completion logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: f8af27fb-4231-471b-9a12-788455dc6c26
📒 Files selected for processing (4)
src/workers/upload-validation.test.tssrc/workers/upload-validation.tssrc/workers/videos.test.tssrc/workers/videos.ts
M4A/M4P/M4B are audio brands (AAC / protected AAC / audiobook). Including them let an .m4a renamed to .mp4 slip past the magic-byte gate even though the upload pipeline targets video. M4V is the video brand and stays. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
ECC bundle files are already tracked in this repository. Skipping generation of another bundle PR. |
|
Re: the ecc-tools bot comment — that's an informational message from automation reporting that the ECC bundle files are already tracked in this repo, so no bundle PR is needed. There is no actionable change requested. Re-verified locally: |
Closes ALO-140.
Summary
createMultipartUpload. Recognizes ISO BMFF (mp4/mov/3gp viaftyp), EBML (webm/mkv), RIFF AVI, MPEG (PS / TS), Ogg, and FLV.isom,mp42,avc1,av01,hev1,hvc1,qt,3gp4, …) so the codec surface is sane and renamed-extension uploads are rejected up front.evil.mp4containing PNG bytes is rejected withmagic_bytes_unrecognized;fake.mp4containing webm is rejected withmagic_bytes_mismatch;weird.mp4with brandjpmis rejected withmagic_bytes_brand_unsupported.Implementation notes
src/workers/upload-validation.ts:sniffContainer,validateMagicBytes,MAGIC_HEADER_BYTES,DetectedContainer.src/workers/videos.tslives inside thechunkIndex === 0block, aftervalidateInitialFileand before the storage-quota check / R2 writes.rawFile.slice(0, 32).arrayBuffer()doesn't consume the underlying Blob, so the subsequentrawFile.stream()still streams the full payload.Test plan
npm test— 516 passing (44 new cases acrosssniffContainerandvalidateMagicBytes)npm run lintnpm run type-check🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests