Skip to content

fix: decode APP0 segments using declared segment length#27

Open
byrnehollander wants to merge 1 commit into
diegomura:mainfrom
byrnehollander:fix/app0-segment-length
Open

fix: decode APP0 segments using declared segment length#27
byrnehollander wants to merge 1 commit into
diegomura:mainfrom
byrnehollander:fix/app0-segment-length

Conversation

@byrnehollander

@byrnehollander byrnehollander commented Jun 10, 2026

Copy link
Copy Markdown

Summary

Fix APP0/JFIF decoding so jay-peg consumes the full APP0 segment payload described by the JPEG segment length, instead of assuming the canonical fixed-size JFIF header is the entire segment.

Previously, any APP0 bytes beyond the fixed-size JFIF struct (thumbnails, JFXX payloads, trailing data) were left in the stream, so the parser desynchronized and misread the next marker:

  • JPEGs produced from iOS/Apple HEIC conversion carry trailing Apple MPF bytes (AMPF) in the APP0/JFIF segment before APP1/Exif. The leftover 0x414d (AM) was read as the next marker type and decode failed with Unknown version 16717.
  • The existing tests/images/ExifTool.jpg fixture (which carries a JFXX segment) failed the same way with Unknown version 1542, which is why it was excluded from the snapshot suite.

Fixes #21. Fixes #9 — the first regression test below embeds the exact segment bytes reported there. Also implements the JFXX parsing requested in #17: the extension code and raw payload are exposed (extensionCode, data), though thumbnail contents aren't further decoded.

This failure mode also surfaces downstream when react-pdf embeds JPEGs: diegomura/react-pdf#2734 reports Unknown version errors thrown from this same decode path for images that render fine elsewhere, which is likely this desync.

What changed

  • Decode APP0 as a length-framed payload, advancing the stream by the declared segment length — the same approach exif.js already uses for APP1.
  • Preserve the existing JFIF fields for JFIF\0 payloads; expose thumbnail bytes as thumbnail and any extra trailing bytes as data.
  • Handle JFXX\0 payloads (name: "JFXX", extensionCode, data) without desynchronizing the stream.
  • Preserve unknown APP0 payloads as APP0 with their identifier and raw data.
  • Throw descriptive errors for malformed segments: Invalid APP0 length N when the declared length is smaller than the 2 length bytes themselves, and Truncated APP0 segment: declared length N exceeds remaining buffer when it points past the end of the input.

Behavior notes

  • Output compatibility: all 122 pre-existing snapshot entries are byte-identical. The snapshot diff only adds the two new ExifTool.jpg entries.
  • The legacy JFIF type behavior is intentionally preserved to avoid output churn: JFIF markers still expose the JFIF version in type after decode() maps the internal version field. This quirk is pre-existing on main (the old struct's version field had the same collision) — happy to address it in a follow-up if you'd like.
  • Truncated/invalid APP0 segments now throw a descriptive error up front; previously they produced garbage field values or an opaque restructure error downstream. Well-formed input is unaffected.

Tests

  • ExifTool.jpg is re-enabled in the full snapshot suite (the nonPassingImages exclusion list is now empty and removed); it decodes through JFXX/APP0 all the way to EOI.
  • Focused APP0 unit tests, each run against both Buffer and Uint8Array inputs:
    • JFIF ... AMPF followed by APP1/Exif, matching the reported failure mode (segment bytes from Parsing fails with non-standard APP0 #9)
    • JFIF thumbnail payload consumption
    • JFXX payload consumption
    • empty APP0 segments (declared length of exactly 2)
    • APP0 lengths smaller than the length field itself
    • truncated APP0 payloads

Validation

yarn test --run   # 168 tests pass
yarn build        # parcel build succeeds
yarn prettier --check src/markers/jfif.js tests/index.test.js

Disclosure

I used GPT-5.5 and Claude (Fable) to help write and test this PR.

@byrnehollander byrnehollander force-pushed the fix/app0-segment-length branch 2 times, most recently from 21a64b7 to 5f9a11c Compare June 10, 2026 00:33
APP0 was decoded as a fixed-size JFIF struct, leaving any remaining
segment bytes (thumbnails, JFXX payloads, trailing data such as Apple
MPF "AMPF") in the stream and desynchronizing the parser on the next
marker. Decode the payload as length-framed instead, matching how the
EXIF marker is handled, and re-enable ExifTool.jpg in the snapshot
suite.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unknown version 16717 Parsing fails with non-standard APP0

1 participant