Skip to content

Issue with PDFs containing Arabic script/RTL script #101

@florisre

Description

@florisre

Current behavior:

The text is not selected where it is in the document. Click & pull to select results in the following selection:
Current behavior
Right-clicking the selection and copying it to the clipboard results in the following output:

د ه د ا ب ش ب ه ا ی ش ا ع ر ا ن و ن و ی س ن د گ ا ن د ر ا ن ج م ن ف ر ه س گ ی ا ب ر ا ن ۹ آ ل م ا

Correct behavior:

Chromium's pdfium (I hope that is actually what's displaying PDFs in Chroium), and thus all Chromium-based browsers I have tried, do handle this correctly:
Chromium's behavior
The selected text copies correctly as:


ده د
شبهای شاعران ونویسندگان اب
درانجمن فرهسگی ابران ۹آلمان 

Bigger scope

This issue is prominent and related to how RTL-documents are handled in PDF standards. Also see this contribution over at Adobe community and this discussion of the issue over at tesseract.

For further evaluation, I have attached the first page of the document shown in the screenshots here: https://bwsyncandshare.kit.edu/s/ZwQ7zyWXmKLHpdH

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions