Skip to content

Some PDF streams not detected #19

@samcowger

Description

@samcowger

If I run polyfile on an off-the-shelf PDF, it will generally detect all streams. If I use Mac's Preview to highlight/annotate the file, in this case 5 times, a number of streams are added. (Many of them appear to be ICC color spaces, and oddly, many are identical - 20, seemingly, in this case.) polyfile is unable to detect most newly-added streams:

> ls -lh optician*
-rw-r--r--@ 1 sam  staff   515K Jan  8 10:41 optician-with-annots.pdf
-rw-r--r--@ 1 sam  staff   370K Jan  8 10:41 optician.pdf
> strings optician.pdf | grep -c endstream
158
> polyfile -q optician.pdf | jq | grep -c "\"type\": \"EndStream\""
158
> strings optician-with-annots.pdf | grep -c endstream
198
> polyfile -q optician-with-annots.pdf | jq | grep -c "\"type\": \"EndStream\""
158

The files in question:
optician-with-annots.pdf
optician.pdf

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions