LLM Batch Processor

A Python tool that processes multiple files through Google Gemini API in parallel, applying a custom system prompt to each file and saving the outputs as markdown files.

Features

Batch Processing: Process multiple files at once from an input folder
Parallel Processing: Send multiple API requests concurrently to save time
Multiple File Types: Supports text files (txt, md), PDFs, DOCX, and other document formats
Custom System Prompts: Define your own system prompt to customize LLM behavior
Configurable: All settings in one configuration file
Cross-Platform: Works on Windows and MacOS

Prerequisites

Python 3.8 or higher
Google Gemini API key (Get one here)

Installation & Setup

1. Clone or Download the Repository

Download this repository to your local machine.

2. Create a Virtual Environment

Windows:

cd path\to\llm-batch-processor
python -m venv venv
venv\Scripts\activate

MacOS/Linux:

cd path/to/llm-batch-processor
python3 -m venv venv
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Set Up Environment Variables

Copy .env.example to .env:

# Windows
copy .env.example .env

# MacOS/Linux
cp .env.example .env

Edit .env and add your Google Gemini API key:
```
GOOGLE_API_KEY=your_actual_api_key_here
```

5. Configure Settings

Edit config.py to customize your settings:

Set Input/Output Folders: Update the paths to your input and output folders

# Windows example
INPUT_FOLDER = Path(r"C:\Users\YourName\Documents\llm_inputs")
OUTPUT_FOLDER = Path(r"C:\Users\YourName\Documents\llm_outputs")

# MacOS/Linux example
INPUT_FOLDER = Path("/Users/YourName/Documents/llm_inputs")
OUTPUT_FOLDER = Path("/Users/YourName/Documents/llm_outputs")

Choose Model: Set your preferred Gemini model

MODEL_TYPE = "gemini-1.5-pro"  # or "gemini-1.5-flash" for faster processing

Set Max Workers: Adjust parallel processing threads

MAX_WORKERS = 5  # Increase for faster processing, decrease if hitting rate limits

Set System Prompt: Specify which prompt file to use

SYSTEM_PROMPT_FILE = "default_prompt.md"  # or create your own in prompts/

6. Create Your Input and Output Folders

Create the folders you specified in config.py:

Windows:

mkdir C:\Users\YourName\Documents\llm_inputs
mkdir C:\Users\YourName\Documents\llm_outputs

MacOS/Linux:

mkdir -p /Users/YourName/Documents/llm_inputs
mkdir -p /Users/YourName/Documents/llm_outputs

7. Customize Your System Prompt (Optional)

Edit prompts/default_prompt.md or create a new prompt file in the prompts/ folder. Then update SYSTEM_PROMPT_FILE in config.py to use your custom prompt.

Usage

1. Add Files to Process

Place the files you want to process in your input folder (the one you configured in config.py).

Supported file types:

Text files: .txt, .md
Documents: .pdf, .docx, .doc
Other file types supported by Gemini

2. Run the Processor

Make sure your virtual environment is activated, then run:

python batch_processor.py

3. Check the Results

Processed files will be saved as markdown (.md) files in your output folder, with the same name as the input file.

For example:

Input: document.pdf → Output: document.md
Input: notes.txt → Output: notes.md

Configuration Options

All configuration is done in config.py:

Setting	Description	Default
`MODEL_TYPE`	Gemini model to use	`gemini-1.5-pro`
`SYSTEM_PROMPT_FILE`	System prompt filename (in prompts/)	`default_prompt.md`
`INPUT_FOLDER`	Folder containing files to process	User-configured
`OUTPUT_FOLDER`	Folder to save processed outputs	User-configured
`MAX_WORKERS`	Number of parallel API requests	`5`
`GENERATION_CONFIG`	Model generation parameters	See config.py

How It Works

Loads Configuration: Reads settings from config.py and API key from .env
Loads System Prompt: Reads the system prompt from the specified file in prompts/
Scans Input Folder: Finds all files in the input folder
Parallel Processing:
- For text files (.txt, .md): Reads content and sends directly to Gemini
- For binary files (.pdf, .docx, etc.): Uploads to Gemini and references in prompt
Saves Outputs: Saves each LLM response as a .md file in the output folder

Troubleshooting

"GOOGLE_API_KEY not found"

Make sure you created a .env file (not .env.example)
Verify your API key is set correctly in .env

"Input folder not found"

Check that the folder path in config.py is correct
Make sure the folder exists on your system
On Windows, use raw strings: Path(r"C:\path\to\folder")

"System prompt file not found"

Verify the SYSTEM_PROMPT_FILE in config.py matches a file in the prompts/ folder
Make sure the file has a .md extension

Rate Limit Errors

Decrease MAX_WORKERS in config.py to send fewer parallel requests
Check your Gemini API quota and limits

File Processing Errors

Check that your files are not corrupted
For PDFs and DOCX files, ensure they are valid documents
Check the console output for specific error messages

Advanced Usage

Multiple System Prompts

You can create multiple system prompt files for different use cases:

Create new prompt files in prompts/:
- prompts/summarize.md - For summarization tasks
- prompts/analyze.md - For analysis tasks
- prompts/translate.md - For translation tasks
Switch between them by changing SYSTEM_PROMPT_FILE in config.py

Custom Generation Settings

Adjust the GENERATION_CONFIG in config.py to control output:

GENERATION_CONFIG = {
    "temperature": 0.7,        # Lower = more focused, Higher = more creative
    "top_p": 0.95,             # Nucleus sampling threshold
    "top_k": 40,               # Top-k sampling value
    "max_output_tokens": 8192, # Maximum length of response
}

License

This project is provided as-is for educational and personal use.

Support

For issues or questions:

Check the Troubleshooting section above
Review the Google Gemini API documentation
Verify your API key and quota at Google AI Studio

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Batch Processor

Features

Prerequisites

Installation & Setup

1. Clone or Download the Repository

2. Create a Virtual Environment

3. Install Dependencies

4. Set Up Environment Variables

5. Configure Settings

6. Create Your Input and Output Folders

7. Customize Your System Prompt (Optional)

Usage

1. Add Files to Process

2. Run the Processor

3. Check the Results

Configuration Options

How It Works

Troubleshooting

"GOOGLE_API_KEY not found"

"Input folder not found"

"System prompt file not found"

Rate Limit Errors

File Processing Errors

Advanced Usage

Multiple System Prompts

Custom Generation Settings

License

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
prompts		prompts
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
batch_processor.py		batch_processor.py
config.py		config.py
plan.md		plan.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

LLM Batch Processor

Features

Prerequisites

Installation & Setup

1. Clone or Download the Repository

2. Create a Virtual Environment

3. Install Dependencies

4. Set Up Environment Variables

5. Configure Settings

6. Create Your Input and Output Folders

7. Customize Your System Prompt (Optional)

Usage

1. Add Files to Process

2. Run the Processor

3. Check the Results

Configuration Options

How It Works

Troubleshooting

"GOOGLE_API_KEY not found"

"Input folder not found"

"System prompt file not found"

Rate Limit Errors

File Processing Errors

Advanced Usage

Multiple System Prompts

Custom Generation Settings

License

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages