The current wrapper implementation only provides access to the page->text method results.
There is a similar text_list method in the original Poppler code (since version 0.63.0?) which provides access to single words and their bounding boxes. With this, functionality like selecting a clipping region, re-ordering the text or filtering too small text can be achieved. This roughly corresponds to the -bbox option of the CLI.
It would be great if the Python wrapper could provide access to the words with their bounding boxes for further post-processing.
The current wrapper implementation only provides access to the
page->textmethod results.There is a similar
text_listmethod in the original Poppler code (since version 0.63.0?) which provides access to single words and their bounding boxes. With this, functionality like selecting a clipping region, re-ordering the text or filtering too small text can be achieved. This roughly corresponds to the-bboxoption of the CLI.It would be great if the Python wrapper could provide access to the words with their bounding boxes for further post-processing.