A Python tool that processes multiple files through Google Gemini API in parallel, applying a custom system prompt to each file and saving the outputs as markdown files.
- Batch Processing: Process multiple files at once from an input folder
- Parallel Processing: Send multiple API requests concurrently to save time
- Multiple File Types: Supports text files (txt, md), PDFs, DOCX, and other document formats
- Custom System Prompts: Define your own system prompt to customize LLM behavior
- Configurable: All settings in one configuration file
- Cross-Platform: Works on Windows and MacOS
- Python 3.8 or higher
- Google Gemini API key (Get one here)
Download this repository to your local machine.
Windows:
cd path\to\llm-batch-processor
python -m venv venv
venv\Scripts\activateMacOS/Linux:
cd path/to/llm-batch-processor
python3 -m venv venv
source venv/bin/activatepip install -r requirements.txt-
Copy
.env.exampleto.env:# Windows copy .env.example .env # MacOS/Linux cp .env.example .env
-
Edit
.envand add your Google Gemini API key:GOOGLE_API_KEY=your_actual_api_key_here
Edit config.py to customize your settings:
-
Set Input/Output Folders: Update the paths to your input and output folders
# Windows example INPUT_FOLDER = Path(r"C:\Users\YourName\Documents\llm_inputs") OUTPUT_FOLDER = Path(r"C:\Users\YourName\Documents\llm_outputs") # MacOS/Linux example INPUT_FOLDER = Path("/Users/YourName/Documents/llm_inputs") OUTPUT_FOLDER = Path("/Users/YourName/Documents/llm_outputs")
-
Choose Model: Set your preferred Gemini model
MODEL_TYPE = "gemini-1.5-pro" # or "gemini-1.5-flash" for faster processing
-
Set Max Workers: Adjust parallel processing threads
MAX_WORKERS = 5 # Increase for faster processing, decrease if hitting rate limits
-
Set System Prompt: Specify which prompt file to use
SYSTEM_PROMPT_FILE = "default_prompt.md" # or create your own in prompts/
Create the folders you specified in config.py:
Windows:
mkdir C:\Users\YourName\Documents\llm_inputs
mkdir C:\Users\YourName\Documents\llm_outputsMacOS/Linux:
mkdir -p /Users/YourName/Documents/llm_inputs
mkdir -p /Users/YourName/Documents/llm_outputsEdit prompts/default_prompt.md or create a new prompt file in the prompts/ folder. Then update SYSTEM_PROMPT_FILE in config.py to use your custom prompt.
Place the files you want to process in your input folder (the one you configured in config.py).
Supported file types:
- Text files:
.txt,.md - Documents:
.pdf,.docx,.doc - Other file types supported by Gemini
Make sure your virtual environment is activated, then run:
python batch_processor.pyProcessed files will be saved as markdown (.md) files in your output folder, with the same name as the input file.
For example:
- Input:
document.pdf→ Output:document.md - Input:
notes.txt→ Output:notes.md
All configuration is done in config.py:
| Setting | Description | Default |
|---|---|---|
MODEL_TYPE |
Gemini model to use | gemini-1.5-pro |
SYSTEM_PROMPT_FILE |
System prompt filename (in prompts/) | default_prompt.md |
INPUT_FOLDER |
Folder containing files to process | User-configured |
OUTPUT_FOLDER |
Folder to save processed outputs | User-configured |
MAX_WORKERS |
Number of parallel API requests | 5 |
GENERATION_CONFIG |
Model generation parameters | See config.py |
- Loads Configuration: Reads settings from
config.pyand API key from.env - Loads System Prompt: Reads the system prompt from the specified file in
prompts/ - Scans Input Folder: Finds all files in the input folder
- Parallel Processing:
- For text files (
.txt,.md): Reads content and sends directly to Gemini - For binary files (
.pdf,.docx, etc.): Uploads to Gemini and references in prompt
- For text files (
- Saves Outputs: Saves each LLM response as a
.mdfile in the output folder
- Make sure you created a
.envfile (not.env.example) - Verify your API key is set correctly in
.env
- Check that the folder path in
config.pyis correct - Make sure the folder exists on your system
- On Windows, use raw strings:
Path(r"C:\path\to\folder")
- Verify the
SYSTEM_PROMPT_FILEinconfig.pymatches a file in theprompts/folder - Make sure the file has a
.mdextension
- Decrease
MAX_WORKERSinconfig.pyto send fewer parallel requests - Check your Gemini API quota and limits
- Check that your files are not corrupted
- For PDFs and DOCX files, ensure they are valid documents
- Check the console output for specific error messages
You can create multiple system prompt files for different use cases:
-
Create new prompt files in
prompts/:prompts/summarize.md- For summarization tasksprompts/analyze.md- For analysis tasksprompts/translate.md- For translation tasks
-
Switch between them by changing
SYSTEM_PROMPT_FILEinconfig.py
Adjust the GENERATION_CONFIG in config.py to control output:
GENERATION_CONFIG = {
"temperature": 0.7, # Lower = more focused, Higher = more creative
"top_p": 0.95, # Nucleus sampling threshold
"top_k": 40, # Top-k sampling value
"max_output_tokens": 8192, # Maximum length of response
}This project is provided as-is for educational and personal use.
For issues or questions:
- Check the Troubleshooting section above
- Review the Google Gemini API documentation
- Verify your API key and quota at Google AI Studio