SIA — Self-Improving Agents

A standalone CLI for running autonomous improvement experiments on any measurable artifact.

Based on Karpathy's AutoResearch pattern, generalized for any domain.

Installation

# From source
cargo install --path .

# Or download binary
curl -L https://github.com/hmbldv/sia/releases/latest/download/sia-linux-x86_64 -o ~/.local/bin/sia
chmod +x ~/.local/bin/sia

Quick Start

# Improve a prompt file, measuring test success rate
sia run \
  --target prompt.md \
  --evaluate "python test_prompt.py 2>&1" \
  --metric "success_rate: ([\d.]+)" \
  --direction maximize \
  --max-iterations 20

# Improve code, measuring test pass count
sia run \
  --target src/lib.rs \
  --evaluate "cargo test 2>&1 | tail -1" \
  --metric "(\d+) passed" \
  --direction maximize \
  --timeout 1h

# Dry run (show plan without executing)
sia plan --target config.yaml --goal "reduce memory usage"

# Resume interrupted session
sia resume --session abc123

# View experiment history
sia history --last 10

Security

sia includes a guard layer that runs before every proposed change is applied. This is a first-class feature, not an afterthought.

What the guard blocks:

Forbidden paths — writes to /.ssh, /.aws, /.env, and other sensitive directories are rejected outright
Secret pattern detection — API keys, private keys, and JWTs in proposed changes are caught before they reach disk
Dangerous commands — rm -rf, curl|sh, sudo, and similar patterns are flagged and blocked
Metacharacter injection — token-level rejection of shell metacharacters in LLM-generated content

Checkpointing: every change — kept or reverted — is committed to git (or backed up if not in a git repo) before application. Full rollback is always available via sia rollback --session <id>.

Core Pattern

LOOP until (max_iterations OR timeout OR stop_signal):
  1. Read current state (target files, experiment history)
  2. Generate hypothesis (LLM proposes change)
  3. Checkpoint (git commit)
  4. Apply change to target
  5. Run evaluation (external command → numeric metric)
  6. Compare to baseline:
     - IF improved → keep change, update baseline
     - ELSE → revert to checkpoint
  7. Log result to experiment history

Configuration

# sia.yaml
target: src/prompt.md
evaluate: "./measure.sh"
metric: "score: ([\d.]+)"
direction: maximize

limits:
  max_iterations: 50
  timeout_seconds: 3600
  max_tokens_per_iteration: 4096
  max_eval_seconds: 300

llm:
  provider: anthropic  # anthropic | openai | ollama
  model: claude-sonnet-4
  # api_key read from ANTHROPIC_API_KEY env var

goal: "Improve the prompt to get higher task completion rates"
constraints:
  - "Keep the prompt under 2000 tokens"
  - "Maintain the existing structure"

LLM Providers

Provider	Config	API Key Env
Anthropic	`provider: anthropic`	`ANTHROPIC_API_KEY`
OpenAI	`provider: openai`	`OPENAI_API_KEY`
Ollama	`provider: ollama`, `base_url: http://localhost:11434`	—

Integrations

sia runs standalone. It can be wired to any scheduler or event system via its exit codes and structured JSON output — a non-zero exit means no improvement was found this iteration.

Safety

Every change is checkpointed (git commit or file backup)
Full rollback always possible (sia rollback --session <id>)
Resource limits enforced (iterations, time, tokens, eval time)
No network access during evaluation by default

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
eval		eval
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SIA — Self-Improving Agents

Installation

Quick Start

Security

Core Pattern

Configuration

LLM Providers

Integrations

Safety

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SIA — Self-Improving Agents

Installation

Quick Start

Security

Core Pattern

Configuration

LLM Providers

Integrations

Safety

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages