The library was organized so most integrations go through a few simple entry points.
Main configuration object.
Most useful fields:
input_dir: directory containing PDFs.output_excel: final Excel path.invoice_number_pattern: invoice number regex.invoice_date_pattern: invoice date regex.worksheet_name: Excel sheet name.status_completed: success status text.persist_to_database: enables MySQL persistence.database:DatabaseConfiginstance.recursive: recursive search.
Used only when persist_to_database=True.
Fields:
hostuserpassworddatabasetable
Main library class.
Most important method:
process() -> ProcessingResult
Consolidated processing result.
Properties:
recordsoutput_excelsuccess_counterror_count
Represents one processed PDF.
Fields:
invoice_numberinvoice_datefile_namestatus
Extracts text from the first page of the PDF.
Processes a single PDF and returns an InvoiceRecord.
from pydf import InvoiceProcessor, ProcessorConfig
config = ProcessorConfig(
input_dir="examples/pdf_invoices",
output_excel="output/api_usage.xlsx",
)
result = InvoiceProcessor(config).process()
print(result.success_count)Use the API when:
- you want to integrate processing into another Python system;
- you need to inspect the
ProcessingResultin memory; - you want more control over customization.