Skip to content

feat: optional size limits on streaming CAR decode and encode#186

Open
tabcat wants to merge 12 commits into
ipld:masterfrom
tabcat:feat/streaming-read-limits
Open

feat: optional size limits on streaming CAR decode and encode#186
tabcat wants to merge 12 commits into
ipld:masterfrom
tabcat:feat/streaming-read-limits

Conversation

@tabcat

@tabcat tabcat commented Jun 23, 2026

Copy link
Copy Markdown

Problem

Decoding a CAR from a streaming source (*.fromIterable) reads a varint length, then buffers that many bytes into one contiguous allocation via reader.exactly(length) before yielding anything. This happens for the header, each CID, and each block body, and none of those lengths is bounded, so a malicious CAR can declare a huge length and force a near-arbitrary allocation. The caller cannot prevent it: a block is only visible once fully materialized, so a downstream size check runs too late. (The in-memory fromBytes path is unaffected; it slices an already-bounded buffer.)

The encode side is a conformance concern rather than allocation safety: a caller who decodes untrusted CARs under a chosen set of caps could still write a CAR that violates that same profile, and so emit output it would refuse to read back. Applying the same caps on the streaming writer gives one size profile enforced on both ends.

Change

Adds a shared, off-by-default CarCodecOptions to the four streaming decoders and the two streaming writer entry points:

// decode
CarReader.fromIterable(stream, { maxHeaderLength, maxCidLength, maxBlockLength })
CarBlockIterator.fromIterable(stream, options)
CarCIDIterator.fromIterable(stream, options)
CarIndexer.fromIterable(stream, options)

// encode
CarWriter.create(roots, options)
CarWriter.createAppender(options)

The three caps, all in bytes:

  • maxHeaderLength: the CAR header (the dag-cbor header block; on decode also covers the v2 pragma and the inner v1 header)
  • maxCidLength: each CIDv1 CID, measured as the full CID (version + codec + multihash). CIDv0's bare multihash is not checked.
  • maxBlockLength: each block body

A length strictly greater than a cap throws a RangeError (matching the capacity-violation precedent in CarBufferWriter): on decode before the bytes are buffered, on encode before they are written. All three default to undefined, so existing callers, the fromBytes paths, and CarBufferWriter are unaffected. Setting the same CarCodecOptions on both ends gives one size profile that holds on read and write.

Closes #185

tabcat added 10 commits June 24, 2026 01:31
Add opt-in maxHeaderLength, maxBlockLength and maxCidLength options to the
four streaming fromIterable decoders (CarBlockIterator, CarCIDIterator,
CarReader, CarIndexer). Each caps a single contiguous read and rejects an
oversized declared length from its varint prefix, before the bytes are
buffered, so a malicious CAR can no longer force a near-arbitrary allocation.

All three default to undefined (off); existing callers and the in-memory
fromBytes paths are unaffected.

Refs ipld#185
- All six size-cap checks (decode + encode) throw RangeError, matching the
  capacity-violation precedent in buffer-writer.js, so callers can catch them
  via instanceof RangeError. Messages are otherwise unchanged.
- maxCidLength now bounds the full CID (version + codec + multihash): via
  cid.bytes.length on encode, version+codec+mhLength on decode, rather than
  just the multihash. Matches the unit CARs store and go-car's MaxIndexCidSize.
  CIDv0's bare multihash stays skipped on both ends.
- New README Size Limits section: the three caps, what each measures, the
  streaming-only scope, no-defaults note, and how to choose values (with go-car
  reference numbers).
- CarCodecOptions reordered to header, cid, block; JSDoc on maxCidLength noting
  it bounds the full CID, not the digest.
- options @param on the six streaming entry points points at the new section;
  CarBufferWriter option lines left untouched.
@tabcat tabcat changed the title feat: optional read-size limits on streaming CAR decode feat: optional size limits on streaming CAR decode and encode Jun 26, 2026
tabcat added 2 commits June 27, 2026 01:09
Drops the options = {} default on the streaming decode/encode functions and
guards each cap check with options?.x. Avoids allocating a throwaway empty
object per readCid/readBlockHead call when no caps are passed, and the param
type now matches the optional [options] annotation. No behavior change.
- decode rejections use assert.isRejected(p, RangeError, 'substring'), matching
  test-errors.spec.js (asserts both type and message in one call)
- encode writer rejections use expect(...).to.eventually.be.rejectedWith(...),
  matching test-writer.spec.js, which avoids the documented case where
  assert.isRejected on a writer.put can throw an uncatchable error in Chrome
- reuse the collector / concatBytes helper shape from test-writer.spec.js
- drop the standalone RangeError tests; the type is now checked inline per cap
@tabcat tabcat marked this pull request as ready for review June 26, 2026 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optional max read size on streaming CAR decode

1 participant