Skip to content

collinear-ai/simlab

Repository files navigation

SimLab

A self-serve simulation lab, enabling you to build and refine long-horizon AI agents.

PyPI Docs Website Python 3.13 License Discord


Agents fail in production on multi-step workflows: malformed tool calls, state drift, unrecoverable retry loops. Traditional evals don't catch these. SimLab is a self-serve CLI for spinning up realistic simulation environments, running your agent through long-horizon tasks, and verifying the results programmatically.

SimLab is toolset, agent harness, and sandbox agnostic. Browse pre-built scenario templates or bring your own CLI/MCP toolset.

  • Simulate realistic workflows — spin up environments with seeded data, tool servers, and NPC interactions that mirror production
  • Run any agent against tasks using any LLM provider (OpenAI, Gemini, Anthropic, Fireworks, or custom endpoints)
  • Run Harbor tasks directly from a Harbor task directory with tasks run --harbor
  • Generate custom tasks with built-in task generation pipelines
  • Verify programmatically — deterministic verifiers score pass/fail on actual environment state, not LLM-as-judge
  • Scale to the cloud with Daytona for remote sandbox execution — no local Docker required

How it works

  1. Pick a scenario — choose from pre-built templates (HR, coding, project management, etc.)
  2. Run your agent — SimLab handles seeding, tool servers, and orchestration
  3. Get a verdict — programmatic verifiers score pass/fail with detailed execution traces
  4. (Optional) Generate more tasks — use the built-in task generation pipeline to create custom tasks for your scenario

Quickstart

Install

uv tool install --python 3.13 "simulationlab[daytona]"

Requires Python 3.13.

Authenticate

You need two keys to get started: a Collinear API key and an LLM provider key. Daytona is optional (omit --daytona to run locally via Docker).

simlab auth login                          # saves Collinear key (required)
export SIMLAB_AGENT_API_KEY="sk-..."       # your LLM key — OpenAI/Anthropic/etc (required)
export DAYTONA_API_KEY="dtn_..."           # optional — omit to use local Docker
# Provider examples: openai, anthropic, gemini, groq, mistral, together_ai, deepseek, openrouter

Run your first task

The fastest way to get started — one guided command:

simlab quickstart

This sets up the HR environment, lets you pick a task, and runs your agent. Add --daytona to run in a remote sandbox instead of local Docker. Use --template <name> to pick a different scenario.

Or run each step manually
simlab templates list                      # see available templates
simlab env init my-env --template hr       # HR workflows: recruiting, onboarding, compensation
simlab tasks list --env my-env
simlab tasks run --env my-env \
  --task hr__0_weaver_flag_biased_compensation_adjustment_request \
  --daytona \
  --agent-model <model> \
  --agent-api-key "$SIMLAB_AGENT_API_KEY"

For the full walkthrough — task generation, custom agents, verifiers, and more — see the Quickstart Guide.

Run a Harbor task directly

If you have a single Harbor task directory, you can run it without creating a named SimLab environment first:

simlab tasks run --harbor ./examples/harbor/hello-world \
  --agent-model <model>

This compiles the Harbor task into a generated SimLab env and local task bundle, then runs the normal agent + verifier flow. Add --daytona to run the generated Harbor env in Daytona instead of local Docker. Use --keep-alive to retain the generated Harbor workspace under output/harbor_runs/ for inspection after the run.

API Keys

Key Required How to get it
Collinear API key Yes platform.collinear.ai (Developers > API Keys)
LLM API key For running agents Any LiteLLM-supported provider (OpenAI, Gemini, Anthropic, Fireworks, etc.)
Daytona API key Optional (recommended) app.daytona.io — cloud sandboxes so you don't need local Docker

Configuration

SimLab resolves configuration in this order: config file < environment variables < CLI flags.

Config file: ~/.config/simlab/config.toml (override with --config-file or SIMLAB_CONFIG)

collinear_api_key = "col_..."

[agent]
model = "gpt-5-mini"
provider = "openai"
api_key = "sk-..."

[daytona]
api_key = "dtn_..."

[verifier]
model = "claude-sonnet-4-6"
api_key = "sk-ant-..."

[npc_chat]
model = "gpt-4o-mini"       # LLM model for NPC chat responses (default: gpt-4o-mini)
api_key = "sk-..."           # API key (falls back to agent key, then provider env vars)

Environment Variables

Variable Description
SIMLAB_COLLINEAR_API_KEY Collinear API key
SIMLAB_AGENT_API_KEY Agent API key
OPENAI_API_KEY Fallback agent API key (when provider is openai)
DAYTONA_API_KEY Daytona API key
SIMLAB_DAYTONA_API_KEY Daytona API key (alternative)
SIMLAB_SCENARIO_MANAGER_API_URL Override Scenario Manager API URL
SIMLAB_VERIFIER_MODEL Verifier model
SIMLAB_VERIFIER_API_KEY Verifier API key
SIMLAB_NPC_CHAT_MODEL NPC chat LLM model (default: gpt-4o-mini)
SIMLAB_NPC_CHAT_API_KEY NPC chat API key (falls back to agent key)
SIMLAB_ENVIRONMENTS_DIR Root directory for environments
SIMLAB_DISABLE_TELEMETRY Set to 1 to disable CLI telemetry

CLI Reference

Command Description
simlab env init <name> Create a new environment (from template or interactive)
simlab env custom-tools add <env> <name> Scaffold and enable an env-local custom tool
simlab env down <name> Stop and remove environment containers
simlab env seed <name> Seed initial data into a running environment
simlab tasks list List available tasks for an environment
simlab tasks run Run an agent against a task from an env, local bundle, or Harbor task directory
simlab tasks-gen init Initialize task generation config (with templates)
simlab tasks-gen validate Validate a task generation config
simlab tasks-gen run Generate custom tasks via the API
simlab templates list List available scenario templates
simlab templates info <name> Show details for a specific template
simlab tools list List available tool servers
simlab tools info <name> Show details for a specific tool server

Run simlab --help for full usage details.

Documentation

License

This project is licensed under the Apache 2.0 License.

Contact

Questions or feedback? Reach out to us:

About

SimLab is the data layer for creating simulations to QA, evaluate, hillclimb, and refine agents.

Resources

License

Stars

Watchers

Forks

Contributors