Skip to content
This repository was archived by the owner on Jun 15, 2023. It is now read-only.
This repository was archived by the owner on Jun 15, 2023. It is now read-only.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 62: invalid continuation byte #48

@Helias

Description

@Helias

Running pdfx file.pdf -v > output.txt I get this issue:

  File "/home/helias/.local/bin/pdfx", line 8, in <module>
    sys.exit(main())
  File "/home/helias/.local/lib/python3.8/site-packages/pdfx/cli.py", line 158, in main
    pdf = pdfx.PDFx(args.pdf)
  File "/home/helias/.local/lib/python3.8/site-packages/pdfx/__init__.py", line 128, in __init__
    self.reader = PDFMinerBackend(self.stream)
  File "/home/helias/.local/lib/python3.8/site-packages/pdfx/backends.py", line 236, in __init__
    refs = self.resolve_PDFObjRef(page.annots)
  File "/home/helias/.local/lib/python3.8/site-packages/pdfx/backends.py", line 273, in resolve_PDFObjRef
    return [self.resolve_PDFObjRef(item) for item in obj_ref]
  File "/home/helias/.local/lib/python3.8/site-packages/pdfx/backends.py", line 273, in <listcomp>
    return [self.resolve_PDFObjRef(item) for item in obj_ref]
  File "/home/helias/.local/lib/python3.8/site-packages/pdfx/backends.py", line 305, in resolve_PDFObjRef
    return Reference(obj_resolved["A"]["URI"].decode("utf-8"), self.curpage)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 62: invalid continuation byte

I guess it is related to some utf-8 codec, is there a way to solve it?

It should be related to this: https://github.com/metachris/pdfx/blob/master/pdfx/backends.py#L305

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions