AutoPrompt v2.0 is a high-performance, asynchronous LLM prompt optimization and benchmarking framework powered by Groq and Llama-3.1. It automates the search for optimal data extraction prompts using evolutionary meta-prompting and rigorous train/test validation.
AutoPrompt v2.0 transforms manual prompt engineering into an automated, 3-phase optimization pipeline:
- Phase 1: Meta-Prompt Optimization: A Meta-LLM generates 4 diverse prompt strategies. These are tested against a Train Set using Ground Truth labels. The system locks the highest-scoring candidate as the "God Prompt" and collects perfect responses for Few-Shot RAG injection.
- Phase 2: Asynchronous Execution: The framework fires the Baseline and Optimized pipelines concurrently using
asyncio. It utilizes Semaphore pacing to maximize throughput while staying within API rate limits. - Phase 3: Formal Evaluation: Results are compared on a hidden Test Set to measure real-world performance, overall accuracy, and hallucinatory failure rates.
- ⚡ Async Scalability: Built with
AsyncGroqandasyncio.gatherfor 10x faster concurrent processing. - 🧠 Meta-Prompting Engine: Uses AI to engineer its own prompts, evolving strategies dynamically based on training success.
- 🔒 Native JSON Mode: Enforces
response_format={"type": "json_object"}at the backend level to guarantee 100% parseable structured data. - 🧪 Train/Test Splitting: Prevents "Data Leakage" by optimizing on a training batch and validating on an independent test batch.
- 🛡️ Robust Fault Tolerance: Implements
tenacityretries with exponential backoff for a zero-crash extraction pipeline. - 📖 Industry-Level Documentation: Includes a Technical Deep Dive detailing the architecture and logic flow.
-
Clone & Navigate:
git clone https://github.com/iussg/auto-prompting cd auto-prompting -
Environment Setup:
python -m venv venv .\venv\Scripts\Activate.ps1 # Windows # source venv/bin/activate # Linux/Mac
-
Install Dependencies:
pip install -r requirements.txt
-
Configure API Keys: Create a
.envfile in the root directory:GROQ_API_KEY=your_gsk_key_here
Check your API connectivity and available Groq models:
python check_model.pyRun the v2.0 Optimization Pipeline:
python main.py--train-limit: Number of rows to use for prompt evolution (default: 5).--test-limit: Number of rows for the final benchmark (default: 30).--data: Path to custom reviews CSV/JSON.--model: Override the model (e.g.,llama-3.1-70b-versatile).
Once complete, the system generates a detailed audit trail in logs/run.log and a final performance comparison in results/benchmark_report.json.