GitHub - paoloanzn/minigepa: Mini GEPA implementation https://arxiv.org/pdf/2507.19457

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
example-prompts		example-prompts
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README		README
cli.py		cli.py
client.py		client.py
evaluation.py		evaluation.py
gepa.py		gepa.py
prompts.py		prompts.py
repl.py		repl.py
requirements.txt		requirements.txt
spinner.py		spinner.py

Repository files navigation

Mini GEPA implementation https://arxiv.org/pdf/2507.19457 in ~1000 lines of code.

SETUP

  python -m venv .venv && source .venv/bin/activate
  pip install -r requirements.txt

CONFIGURATION

  Create a .env file with API keys for the providers you use:

    ANTHROPIC_OAUTH_TOKEN=...    (for Anthropic API, install claude code and run `claude setup-token`)
    OPENROUTER_API_KEY=...       (for OpenRouter API)
    OPENAI_API_KEY=...           (for OpenAI API)

  Defaults use Anthropic for the teacher model and OpenRouter for the student model:

    STUDENT_MODEL=...    model used for running the target prompt during optimization (default: nvidia/nemotron-3-nano-30b-a3b)
    TEACHER_MODEL=...    model used for dataset generation and grading (default: claude-haiku-4-5)

  Prefix a model with a provider to override the default client:

    STUDENT_MODEL=openai:gpt-4.1
    TEACHER_MODEL=openrouter:<model-id>

  Valid provider prefixes are anthropic, openrouter, and openai. If no prefix is provided,
  STUDENT_MODEL uses OpenRouter and TEACHER_MODEL uses Anthropic.

  Required keys depend on the command:

    repl/generate     teacher provider key
    evaluate/optimize teacher provider key and student provider key

PROMPT FOLDER

  A prompt folder must contain exactly three files:

    target_prompt.txt (or .md)   - the prompt to evaluate/optimize, must contain {{task}}
    dataset_prompt.txt (or .md)  - instructs the model to generate test cases as JSON
    grader_prompt.txt (or .md)   - evaluates model outputs, produces a 1-5 score and a feedback object

  Example: example-prompts/
  Use the files in example-prompts/ as the reference structure for new prompt folders.

PROMPT CREATION REPL

  Interactive tool assistant for creating and editing minigepa prompt folders:

    python repl.py

  The REPL uses the teacher client and can read, write, edit, search files, and run shell commands.
  Ask it to inspect example-prompts/ first, then create or update target_prompt.txt,
  dataset_prompt.txt, and grader_prompt.txt in your prompt folder.

  Commands inside the REPL:
    /q, quit, exit          quit
    /c                     clear the conversation

  You can also ask it to run CLI commands directly, for example:
    .venv/bin/python cli.py generate --prompts my-prompts
    .venv/bin/python cli.py evaluate --prompts my-prompts --dataset .output/dataset-abc123.json
    .venv/bin/python cli.py optimize --prompts my-prompts --budget 120

USAGE

  python cli.py <command> [options]

Commands:

  generate    Generate a dataset from the dataset prompt
  evaluate    Evaluate a target prompt against a dataset
  optimize    Run GEPA optimization to improve a target prompt

Examples:

  Generate a dataset:
    python cli.py generate --prompts example-prompts --output .output

  Evaluate a prompt (generates dataset automatically):
    python cli.py evaluate --prompts example-prompts --output .output

  Evaluate using an existing dataset:
    python cli.py evaluate --prompts example-prompts --output .output --dataset .output/dataset-abc123.json

  Run GEPA optimization (default budget: 120):
    python cli.py optimize --prompts example-prompts --output .output

  Run with custom settings and a pre-existing dataset:
    python cli.py optimize --prompts example-prompts --output .output --dataset .output/dataset-abc123.json --budget 300 --minibatch 6 --pareto-ratio 0.4

Flags:

  All commands:
    --prompts <dir>        prompt folder (default: example-prompts)
    --output <dir>         output folder (default: .output)

  evaluate:
    --dataset <path>       load existing dataset JSON (generates one if omitted)

  optimize:
    --dataset <path>       load existing dataset JSON (generates one if omitted)
    --budget <int>         rollout budget (default: 120)
    --minibatch <int>      minibatch size (default: 3)
    --pareto-ratio <float> pareto set ratio (default: 0.4)

OUTPUT

  All results are saved to .output/ by default:
    dataset-<id>.json          generated datasets
    evaluation-<id>.json       evaluation results
    gepa-<id>.json             full GEPA run data
    optimized-prompt-<id>.txt  best optimized prompt