feat: optional size limits on streaming CAR decode and encode#186
Open
tabcat wants to merge 12 commits into
Open
feat: optional size limits on streaming CAR decode and encode#186tabcat wants to merge 12 commits into
tabcat wants to merge 12 commits into
Conversation
Add opt-in maxHeaderLength, maxBlockLength and maxCidLength options to the four streaming fromIterable decoders (CarBlockIterator, CarCIDIterator, CarReader, CarIndexer). Each caps a single contiguous read and rejects an oversized declared length from its varint prefix, before the bytes are buffered, so a malicious CAR can no longer force a near-arbitrary allocation. All three default to undefined (off); existing callers and the in-memory fromBytes paths are unaffected. Refs ipld#185
- All six size-cap checks (decode + encode) throw RangeError, matching the capacity-violation precedent in buffer-writer.js, so callers can catch them via instanceof RangeError. Messages are otherwise unchanged. - maxCidLength now bounds the full CID (version + codec + multihash): via cid.bytes.length on encode, version+codec+mhLength on decode, rather than just the multihash. Matches the unit CARs store and go-car's MaxIndexCidSize. CIDv0's bare multihash stays skipped on both ends.
- New README Size Limits section: the three caps, what each measures, the streaming-only scope, no-defaults note, and how to choose values (with go-car reference numbers). - CarCodecOptions reordered to header, cid, block; JSDoc on maxCidLength noting it bounds the full CID, not the digest. - options @param on the six streaming entry points points at the new section; CarBufferWriter option lines left untouched.
Drops the options = {} default on the streaming decode/encode functions and
guards each cap check with options?.x. Avoids allocating a throwaway empty
object per readCid/readBlockHead call when no caps are passed, and the param
type now matches the optional [options] annotation. No behavior change.
- decode rejections use assert.isRejected(p, RangeError, 'substring'), matching test-errors.spec.js (asserts both type and message in one call) - encode writer rejections use expect(...).to.eventually.be.rejectedWith(...), matching test-writer.spec.js, which avoids the documented case where assert.isRejected on a writer.put can throw an uncatchable error in Chrome - reuse the collector / concatBytes helper shape from test-writer.spec.js - drop the standalone RangeError tests; the type is now checked inline per cap
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Decoding a CAR from a streaming source (
*.fromIterable) reads a varint length, then buffers that many bytes into one contiguous allocation viareader.exactly(length)before yielding anything. This happens for the header, each CID, and each block body, and none of those lengths is bounded, so a malicious CAR can declare a huge length and force a near-arbitrary allocation. The caller cannot prevent it: a block is only visible once fully materialized, so a downstream size check runs too late. (The in-memoryfromBytespath is unaffected; it slices an already-bounded buffer.)The encode side is a conformance concern rather than allocation safety: a caller who decodes untrusted CARs under a chosen set of caps could still write a CAR that violates that same profile, and so emit output it would refuse to read back. Applying the same caps on the streaming writer gives one size profile enforced on both ends.
Change
Adds a shared, off-by-default
CarCodecOptionsto the four streaming decoders and the two streaming writer entry points:The three caps, all in bytes:
maxHeaderLength: the CAR header (the dag-cbor header block; on decode also covers the v2 pragma and the inner v1 header)maxCidLength: each CIDv1 CID, measured as the full CID (version + codec + multihash). CIDv0's bare multihash is not checked.maxBlockLength: each block bodyA length strictly greater than a cap throws a
RangeError(matching the capacity-violation precedent inCarBufferWriter): on decode before the bytes are buffered, on encode before they are written. All three default toundefined, so existing callers, thefromBytespaths, andCarBufferWriterare unaffected. Setting the sameCarCodecOptionson both ends gives one size profile that holds on read and write.Closes #185