A self-serve simulation lab, enabling you to build and refine long-horizon AI agents.
Agents fail in production on multi-step workflows: malformed tool calls, state drift, unrecoverable retry loops. Traditional evals don't catch these. SimLab is a self-serve CLI for spinning up realistic simulation environments, running your agent through long-horizon tasks, and verifying the results programmatically.
SimLab is toolset, agent harness, and sandbox agnostic. Browse pre-built scenario templates or bring your own CLI/MCP toolset.
- Simulate realistic workflows — spin up environments with seeded data, tool servers, and NPC interactions that mirror production
- Run any agent against tasks using any LLM provider (OpenAI, Gemini, Anthropic, Fireworks, or custom endpoints)
- Run Harbor tasks directly from a Harbor task directory with
tasks run --harbor - Generate custom tasks with built-in task generation pipelines
- Verify programmatically — deterministic verifiers score pass/fail on actual environment state, not LLM-as-judge
- Scale to the cloud with Daytona for remote sandbox execution — no local Docker required
- Pick a scenario — choose from pre-built templates (HR, coding, project management, etc.)
- Run your agent — SimLab handles seeding, tool servers, and orchestration
- Get a verdict — programmatic verifiers score pass/fail with detailed execution traces
- (Optional) Generate more tasks — use the built-in task generation pipeline to create custom tasks for your scenario
uv tool install --python 3.13 "simulationlab[daytona]"Requires Python 3.13.
You need two keys to get started: a Collinear API key and an LLM provider key. Daytona is optional (omit --daytona to run locally via Docker).
simlab auth login # saves Collinear key (required)
export SIMLAB_AGENT_API_KEY="sk-..." # your LLM key — OpenAI/Anthropic/etc (required)
export DAYTONA_API_KEY="dtn_..." # optional — omit to use local Docker
# Provider examples: openai, anthropic, gemini, groq, mistral, together_ai, deepseek, openrouterThe fastest way to get started — one guided command:
simlab quickstartThis sets up the HR environment, lets you pick a task, and runs your agent. Add --daytona to run in a remote sandbox instead of local Docker. Use --template <name> to pick a different scenario.
Or run each step manually
simlab templates list # see available templates
simlab env init my-env --template hr # HR workflows: recruiting, onboarding, compensation
simlab tasks list --env my-env
simlab tasks run --env my-env \
--task hr__0_weaver_flag_biased_compensation_adjustment_request \
--daytona \
--agent-model <model> \
--agent-api-key "$SIMLAB_AGENT_API_KEY"For the full walkthrough — task generation, custom agents, verifiers, and more — see the Quickstart Guide.
If you have a single Harbor task directory, you can run it without creating a named SimLab environment first:
simlab tasks run --harbor ./examples/harbor/hello-world \
--agent-model <model>This compiles the Harbor task into a generated SimLab env and local task
bundle, then runs the normal agent + verifier flow. Add --daytona to run the
generated Harbor env in Daytona instead of local Docker. Use --keep-alive to
retain the generated Harbor workspace under output/harbor_runs/ for
inspection after the run.
| Key | Required | How to get it |
|---|---|---|
| Collinear API key | Yes | platform.collinear.ai (Developers > API Keys) |
| LLM API key | For running agents | Any LiteLLM-supported provider (OpenAI, Gemini, Anthropic, Fireworks, etc.) |
| Daytona API key | Optional (recommended) | app.daytona.io — cloud sandboxes so you don't need local Docker |
SimLab resolves configuration in this order: config file < environment variables < CLI flags.
Config file: ~/.config/simlab/config.toml (override with --config-file or SIMLAB_CONFIG)
collinear_api_key = "col_..."
[agent]
model = "gpt-5-mini"
provider = "openai"
api_key = "sk-..."
[daytona]
api_key = "dtn_..."
[verifier]
model = "claude-sonnet-4-6"
api_key = "sk-ant-..."
[npc_chat]
model = "gpt-4o-mini" # LLM model for NPC chat responses (default: gpt-4o-mini)
api_key = "sk-..." # API key (falls back to agent key, then provider env vars)| Variable | Description |
|---|---|
SIMLAB_COLLINEAR_API_KEY |
Collinear API key |
SIMLAB_AGENT_API_KEY |
Agent API key |
OPENAI_API_KEY |
Fallback agent API key (when provider is openai) |
DAYTONA_API_KEY |
Daytona API key |
SIMLAB_DAYTONA_API_KEY |
Daytona API key (alternative) |
SIMLAB_SCENARIO_MANAGER_API_URL |
Override Scenario Manager API URL |
SIMLAB_VERIFIER_MODEL |
Verifier model |
SIMLAB_VERIFIER_API_KEY |
Verifier API key |
SIMLAB_NPC_CHAT_MODEL |
NPC chat LLM model (default: gpt-4o-mini) |
SIMLAB_NPC_CHAT_API_KEY |
NPC chat API key (falls back to agent key) |
SIMLAB_ENVIRONMENTS_DIR |
Root directory for environments |
SIMLAB_DISABLE_TELEMETRY |
Set to 1 to disable CLI telemetry |
| Command | Description |
|---|---|
simlab env init <name> |
Create a new environment (from template or interactive) |
simlab env custom-tools add <env> <name> |
Scaffold and enable an env-local custom tool |
simlab env down <name> |
Stop and remove environment containers |
simlab env seed <name> |
Seed initial data into a running environment |
simlab tasks list |
List available tasks for an environment |
simlab tasks run |
Run an agent against a task from an env, local bundle, or Harbor task directory |
simlab tasks-gen init |
Initialize task generation config (with templates) |
simlab tasks-gen validate |
Validate a task generation config |
simlab tasks-gen run |
Generate custom tasks via the API |
simlab templates list |
List available scenario templates |
simlab templates info <name> |
Show details for a specific template |
simlab tools list |
List available tool servers |
simlab tools info <name> |
Show details for a specific tool server |
Run simlab --help for full usage details.
- Quickstart Guide — full setup and usage walkthrough
- Env-Local Custom Tools — add custom tool definitions under one environment
- Agent Integrations — adapter architecture and custom framework integration guide
- Docs — complete documentation
- Collinear Platform — get your API key
This project is licensed under the Apache 2.0 License.
Questions or feedback? Reach out to us:
