GitHub - gzmerel/OCR_PDF_TXT_extractor: A simple yet powerful tool to extract and convert text from PDF files using Optical Character Recognition

OCR_PDF_TXT_extractor A simple, user-friendly Python desktop app to extract text from PDF files—whether they are selectable or scanned images—using built-in PDF parsing and OCR (Optical Character Recognition) as a fallback.

Features Easy-to-use graphical interface (Tkinter) Extracts text from standard, selectable PDFs Automatically uses OCR for scanned/image-based PDFs Saves extracted text to .txt files Progress bar and file status updates Works on Windows (requires Tesseract and Poppler)

Requirements Python 3.x PyPDF2 pdf2image pytesseract Pillow poppler (for Windows) Tesseract OCR (set path in code)

Installation Install Python dependencies:

pip install PyPDF2 pdf2image pytesseract Pillow Install Tesseract OCR: Download and install from here. Update the path in the script if Tesseract is not in your PATH: pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe" Install Poppler for Windows: Download from here and update the script’s poppler_path accordingly.

Usage Run the script:

python OCR_PDF_TXT_extractor.py Click "Browse PDF" to select a PDF file.

The app will try to extract text directly. If the PDF is image-based, it will automatically use OCR. Review and edit extracted text as needed. Click "Save As" to save the output as a .txt file.

Notes For large PDFs or image-heavy files, OCR may take longer.

This app is intended for Windows; minor edits are needed for Mac/Linux (adjust Tesseract/Poppler paths).

License MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
build/app		build/app
OCR_PDF_TXT_extractor.py		OCR_PDF_TXT_extractor.py
OCR_PDF_TXT_extractor.spec		OCR_PDF_TXT_extractor.spec
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages