First of all, thanks for the handy module!
I'd be interested in having access to more of the features offered by pdftotext/xpdf to tune the quality of the extracted text.
As far as I know it is not possible to pass arguments freely to pdftotext but there are a few hardcoded parameters (password, raw).
Would that be something you would be open to add?
I'm not fluent in C++ but it seems that I could get inspiration from the existing code to try to have my arguments in.
The parameters/options in most interested in are nodiag, lineprinter, linespacing and fixed. The full list can be found here: http://www.xpdfreader.com/pdftotext-man.html
First of all, thanks for the handy module!
I'd be interested in having access to more of the features offered by pdftotext/xpdf to tune the quality of the extracted text.
As far as I know it is not possible to pass arguments freely to pdftotext but there are a few hardcoded parameters (password, raw).
Would that be something you would be open to add?
I'm not fluent in C++ but it seems that I could get inspiration from the existing code to try to have my arguments in.
The parameters/options in most interested in are nodiag, lineprinter, linespacing and fixed. The full list can be found here: http://www.xpdfreader.com/pdftotext-man.html