pypdfium2

Here are 4 public repositories matching this topic...

vugarfamiloglu / multimodal-document-ocr

python ocr ai vision structured-output multimodal document-extraction pydantic fastapi tool-use document-ai pypdfium2

Updated May 19, 2026
Python

sherozshaikh / paperflight

Preflight checks for document extraction pipelines — validate, render, and screen PDFs before they reach your LLM. Pure-Python wheel, in-memory only.

python pdf python-library pdf-to-image pdf-processing invoice-processing document-ai ocr-preprocessing pypdfium2 llm-preprocessing blank-detection

Updated Apr 13, 2026
Python

No0Bitah / PDF-Highlight-Extractor

Star

A Python tool for extracting highlighted text from PDF files while preserving formatting attributes (headers, bold, italic) and removing unwanted line breaks and page breaks. Perfect for integrating with content management systems.

pdf opencv automation numpy documentation-tool crm pillow python3 scrapping pymupdf pdf-document-processor pypdfium2

Updated May 15, 2025
Python

elsheraey / ocr-or-not

Star

20 PDF classifiers, one verdict matrix: should this PDF go through fast text extraction, or do we need OCR?

python pdf benchmark ocr text-extraction evaluation-framework pdfminer claude pikepdf pypdfium2 ai-collaboration pdf-classification

Updated May 25, 2026
Python

Improve this page

Add a description, image, and links to the pypdfium2 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pypdfium2 topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly