Skip to content

paoloanzn/minigepa

Repository files navigation

Mini GEPA implementation https://arxiv.org/pdf/2507.19457 in ~1000 lines of code.

SETUP

  python -m venv .venv && source .venv/bin/activate
  pip install -r requirements.txt

CONFIGURATION

  Create a .env file with API keys for the providers you use:

    ANTHROPIC_OAUTH_TOKEN=...    (for Anthropic API, install claude code and run `claude setup-token`)
    OPENROUTER_API_KEY=...       (for OpenRouter API)
    OPENAI_API_KEY=...           (for OpenAI API)

  Defaults use Anthropic for the teacher model and OpenRouter for the student model:

    STUDENT_MODEL=...    model used for running the target prompt during optimization (default: nvidia/nemotron-3-nano-30b-a3b)
    TEACHER_MODEL=...    model used for dataset generation and grading (default: claude-haiku-4-5)

  Prefix a model with a provider to override the default client:

    STUDENT_MODEL=openai:gpt-4.1
    TEACHER_MODEL=openrouter:<model-id>

  Valid provider prefixes are anthropic, openrouter, and openai. If no prefix is provided,
  STUDENT_MODEL uses OpenRouter and TEACHER_MODEL uses Anthropic.

  Required keys depend on the command:

    repl/generate     teacher provider key
    evaluate/optimize teacher provider key and student provider key

PROMPT FOLDER

  A prompt folder must contain exactly three files:

    target_prompt.txt (or .md)   - the prompt to evaluate/optimize, must contain {{task}}
    dataset_prompt.txt (or .md)  - instructs the model to generate test cases as JSON
    grader_prompt.txt (or .md)   - evaluates model outputs, produces a 1-5 score and a feedback object

  Example: example-prompts/
  Use the files in example-prompts/ as the reference structure for new prompt folders.

PROMPT CREATION REPL

  Interactive tool assistant for creating and editing minigepa prompt folders:

    python repl.py

  The REPL uses the teacher client and can read, write, edit, search files, and run shell commands.
  Ask it to inspect example-prompts/ first, then create or update target_prompt.txt,
  dataset_prompt.txt, and grader_prompt.txt in your prompt folder.

  Commands inside the REPL:
    /q, quit, exit          quit
    /c                     clear the conversation

  You can also ask it to run CLI commands directly, for example:
    .venv/bin/python cli.py generate --prompts my-prompts
    .venv/bin/python cli.py evaluate --prompts my-prompts --dataset .output/dataset-abc123.json
    .venv/bin/python cli.py optimize --prompts my-prompts --budget 120

USAGE

  python cli.py <command> [options]

Commands:

  generate    Generate a dataset from the dataset prompt
  evaluate    Evaluate a target prompt against a dataset
  optimize    Run GEPA optimization to improve a target prompt

Examples:

  Generate a dataset:
    python cli.py generate --prompts example-prompts --output .output

  Evaluate a prompt (generates dataset automatically):
    python cli.py evaluate --prompts example-prompts --output .output

  Evaluate using an existing dataset:
    python cli.py evaluate --prompts example-prompts --output .output --dataset .output/dataset-abc123.json

  Run GEPA optimization (default budget: 120):
    python cli.py optimize --prompts example-prompts --output .output

  Run with custom settings and a pre-existing dataset:
    python cli.py optimize --prompts example-prompts --output .output --dataset .output/dataset-abc123.json --budget 300 --minibatch 6 --pareto-ratio 0.4

Flags:

  All commands:
    --prompts <dir>        prompt folder (default: example-prompts)
    --output <dir>         output folder (default: .output)

  evaluate:
    --dataset <path>       load existing dataset JSON (generates one if omitted)

  optimize:
    --dataset <path>       load existing dataset JSON (generates one if omitted)
    --budget <int>         rollout budget (default: 120)
    --minibatch <int>      minibatch size (default: 3)
    --pareto-ratio <float> pareto set ratio (default: 0.4)

OUTPUT

  All results are saved to .output/ by default:
    dataset-<id>.json          generated datasets
    evaluation-<id>.json       evaluation results
    gepa-<id>.json             full GEPA run data
    optimized-prompt-<id>.txt  best optimized prompt

About

Mini GEPA implementation https://arxiv.org/pdf/2507.19457

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages