A Python utility that extracts text from images and PDF files using multiple AI providers (Google Gemini, OpenAI GPT, and Anthropic Claude).
- Multi-Provider Support: Automatically detects and uses available AI providers based on your API keys
- Flexible File Support: Handles images (PNG, JPG, JPEG, WEBP, BMP, GIF, TIF, TIFF) and PDF files
- Customizable Prompts: Specify custom extraction prompts for different use cases
- Model Selection: Choose specific models for each provider
- Auto-Save: Automatically saves extracted text to timestamped files
- PDF Multi-Page: Processes all pages in PDF files with per-page extraction
pip install python-dotenv pillow pdf2image google-generativeai openai anthropic- For PDF processing: Install Poppler utilities
- Windows: Download Poppler and set
--poppler_pathor add to PATH - macOS:
brew install poppler - Linux:
sudo apt-get install poppler-utils
- Windows: Download Poppler and set
Create a .env file in the same directory with your API keys:
GOOGLE_API_KEY=your_gemini_api_key_here
OPENAI_API_KEY=your_openai_api_key_here
ANTHROPIC_API_KEY=your_anthropic_api_key_hereYou only need to provide keys for the providers you want to use.
# Extract text using all available providers
python multi_extractor.py document.pdf
# Extract text from an image
python multi_extractor.py image.png# Use a specific provider
python multi_extractor.py document.pdf --provider openai
# Use a specific model
python multi_extractor.py image.jpg --provider gemini --model gemini-1.5-flash-latest
# Custom extraction prompt
python multi_extractor.py receipt.png --prompt "Extract all prices and item names from this receipt"
# Windows with custom Poppler path
python multi_extractor.py document.pdf --poppler_path "C:\poppler\bin"file_path: Path to the image or PDF file (required)--provider: Specify AI provider (gemini,openai,anthropic)--prompt: Custom extraction prompt (default: general text extraction)--model: Specific model name (when using single provider)--poppler_path: Path to Poppler bin directory (Windows only)
- Gemini:
gemini-2.0-flash - OpenAI:
gpt-4o - Anthropic:
claude-3-5-sonnet-20241022
The tool creates timestamped text files for each provider:
- Format:
{filename}_{provider}_extracted_{timestamp}.txt - Example:
document_gemini_extracted_20241225-143022.txt
- PNG, JPG/JPEG, WEBP, BMP, GIF, TIF/TIFF
- PDF (multi-page support)
- Gracefully handles missing API keys
- Continues processing with available providers if one fails
- Provides detailed error messages for troubleshooting
- Skips unsupported file types
Auto-detected providers: gemini, openai. Using default models.
--- Processing with: GEMINI (Model: gemini-2.0-flash) ---
Processing PDF page 1/3 with gemini...
Processing PDF page 2/3 with gemini...
Processing PDF page 3/3 with gemini...
--- Extracted Text (using GEMINI) ---
[Extracted text content here]
Extracted text from GEMINI saved to: document_gemini_extracted_20241225-143022.txt
--- Processing Complete ---
- For receipts/invoices: Use custom prompts like "Extract all line items, prices, and totals"
- For forms: Try "Extract all field labels and their values"
- For handwritten text: Some providers perform better than others; try multiple
- Large PDFs: Processing may take time as each page is analyzed separately
- Import errors: Ensure all required packages are installed
- PDF conversion fails: Check Poppler installation
- API errors: Verify API keys and rate limits
- No providers detected: Check
.envfile configuration
This software is freely distributed under the BSD 3-clause OSI license. Please see the LICENSE file for more information.