-
Updated
May 19, 2026 - Python
pypdfium2
Here are 4 public repositories matching this topic...
Preflight checks for document extraction pipelines — validate, render, and screen PDFs before they reach your LLM. Pure-Python wheel, in-memory only.
-
Updated
Apr 13, 2026 - Python
A Python tool for extracting highlighted text from PDF files while preserving formatting attributes (headers, bold, italic) and removing unwanted line breaks and page breaks. Perfect for integrating with content management systems.
-
Updated
May 15, 2025 - Python
20 PDF classifiers, one verdict matrix: should this PDF go through fast text extraction, or do we need OCR?
-
Updated
May 25, 2026 - Python
Improve this page
Add a description, image, and links to the pypdfium2 topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the pypdfium2 topic, visit your repo's landing page and select "manage topics."