AI HTR TEXT EXTRACTOR

Multi-Provider AI Text Extractor

A Python utility that extracts text from images and PDF files using multiple AI providers (Google Gemini, OpenAI GPT, and Anthropic Claude).

Features

Multi-Provider Support: Automatically detects and uses available AI providers based on your API keys
Flexible File Support: Handles images (PNG, JPG, JPEG, WEBP, BMP, GIF, TIF, TIFF) and PDF files
Customizable Prompts: Specify custom extraction prompts for different use cases
Model Selection: Choose specific models for each provider
Auto-Save: Automatically saves extracted text to timestamped files
PDF Multi-Page: Processes all pages in PDF files with per-page extraction

Prerequisites

Required Python Packages

pip install python-dotenv pillow pdf2image google-generativeai openai anthropic

Additional Requirements

For PDF processing: Install Poppler utilities
- Windows: Download Poppler and set --poppler_path or add to PATH
- macOS: brew install poppler
- Linux: sudo apt-get install poppler-utils

API Keys Setup

Create a .env file in the same directory with your API keys:

GOOGLE_API_KEY=your_gemini_api_key_here
OPENAI_API_KEY=your_openai_api_key_here
ANTHROPIC_API_KEY=your_anthropic_api_key_here

You only need to provide keys for the providers you want to use.

Usage

Basic Usage

# Extract text using all available providers
python multi_extractor.py document.pdf

# Extract text from an image
python multi_extractor.py image.png

Advanced Usage

# Use a specific provider
python multi_extractor.py document.pdf --provider openai

# Use a specific model
python multi_extractor.py image.jpg --provider gemini --model gemini-1.5-flash-latest

# Custom extraction prompt
python multi_extractor.py receipt.png --prompt "Extract all prices and item names from this receipt"

# Windows with custom Poppler path
python multi_extractor.py document.pdf --poppler_path "C:\poppler\bin"

Command Line Arguments

file_path: Path to the image or PDF file (required)
--provider: Specify AI provider (gemini, openai, anthropic)
--prompt: Custom extraction prompt (default: general text extraction)
--model: Specific model name (when using single provider)
--poppler_path: Path to Poppler bin directory (Windows only)

Default Models

Gemini: gemini-2.0-flash
OpenAI: gpt-4o
Anthropic: claude-3-5-sonnet-20241022

Output

The tool creates timestamped text files for each provider:

Format: {filename}_{provider}_extracted_{timestamp}.txt
Example: document_gemini_extracted_20241225-143022.txt

Supported File Types

Images

PNG, JPG/JPEG, WEBP, BMP, GIF, TIF/TIFF

Documents

PDF (multi-page support)

Error Handling

Gracefully handles missing API keys
Continues processing with available providers if one fails
Provides detailed error messages for troubleshooting
Skips unsupported file types

Example Output

Auto-detected providers: gemini, openai. Using default models.

--- Processing with: GEMINI (Model: gemini-2.0-flash) ---
  Processing PDF page 1/3 with gemini...
  Processing PDF page 2/3 with gemini...
  Processing PDF page 3/3 with gemini...

--- Extracted Text (using GEMINI) ---
[Extracted text content here]

Extracted text from GEMINI saved to: document_gemini_extracted_20241225-143022.txt

--- Processing Complete ---

Tips

For receipts/invoices: Use custom prompts like "Extract all line items, prices, and totals"
For forms: Try "Extract all field labels and their values"
For handwritten text: Some providers perform better than others; try multiple
Large PDFs: Processing may take time as each page is analyzed separately

Troubleshooting

Import errors: Ensure all required packages are installed
PDF conversion fails: Check Poppler installation
API errors: Verify API keys and rate limits
No providers detected: Check .env file configuration

This software is freely distributed under the BSD 3-clause OSI license. Please see the LICENSE file for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
haley_price_paleography_work		haley_price_paleography_work
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
azure-gcp-vision-extractor.py		azure-gcp-vision-extractor.py
gdocai.py		gdocai.py
mtext_extractor.py		mtext_extractor.py
multi_extractor.py		multi_extractor.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI HTR TEXT EXTRACTOR

Multi-Provider AI Text Extractor

Features

Prerequisites

Required Python Packages

Additional Requirements

API Keys Setup

Usage

Basic Usage

Advanced Usage

Command Line Arguments

Default Models

Output

Supported File Types

Images

Documents

Error Handling

Example Output

Tips

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI HTR TEXT EXTRACTOR

Multi-Provider AI Text Extractor

Features

Prerequisites

Required Python Packages

Additional Requirements

API Keys Setup

Usage

Basic Usage

Advanced Usage

Command Line Arguments

Default Models

Output

Supported File Types

Images

Documents

Error Handling

Example Output

Tips

Troubleshooting

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages