Skip to content

hmbldv/sia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SIA — Self-Improving Agents

A standalone CLI for running autonomous improvement experiments on any measurable artifact.

Based on Karpathy's AutoResearch pattern, generalized for any domain.

Installation

# From source
cargo install --path .

# Or download binary
curl -L https://github.com/hmbldv/sia/releases/latest/download/sia-linux-x86_64 -o ~/.local/bin/sia
chmod +x ~/.local/bin/sia

Quick Start

# Improve a prompt file, measuring test success rate
sia run \
  --target prompt.md \
  --evaluate "python test_prompt.py 2>&1" \
  --metric "success_rate: ([\d.]+)" \
  --direction maximize \
  --max-iterations 20

# Improve code, measuring test pass count
sia run \
  --target src/lib.rs \
  --evaluate "cargo test 2>&1 | tail -1" \
  --metric "(\d+) passed" \
  --direction maximize \
  --timeout 1h

# Dry run (show plan without executing)
sia plan --target config.yaml --goal "reduce memory usage"

# Resume interrupted session
sia resume --session abc123

# View experiment history
sia history --last 10

Security

sia includes a guard layer that runs before every proposed change is applied. This is a first-class feature, not an afterthought.

What the guard blocks:

  • Forbidden paths — writes to /.ssh, /.aws, /.env, and other sensitive directories are rejected outright
  • Secret pattern detection — API keys, private keys, and JWTs in proposed changes are caught before they reach disk
  • Dangerous commandsrm -rf, curl|sh, sudo, and similar patterns are flagged and blocked
  • Metacharacter injection — token-level rejection of shell metacharacters in LLM-generated content

Checkpointing: every change — kept or reverted — is committed to git (or backed up if not in a git repo) before application. Full rollback is always available via sia rollback --session <id>.

Core Pattern

LOOP until (max_iterations OR timeout OR stop_signal):
  1. Read current state (target files, experiment history)
  2. Generate hypothesis (LLM proposes change)
  3. Checkpoint (git commit)
  4. Apply change to target
  5. Run evaluation (external command → numeric metric)
  6. Compare to baseline:
     - IF improved → keep change, update baseline
     - ELSE → revert to checkpoint
  7. Log result to experiment history

Configuration

# sia.yaml
target: src/prompt.md
evaluate: "./measure.sh"
metric: "score: ([\d.]+)"
direction: maximize

limits:
  max_iterations: 50
  timeout_seconds: 3600
  max_tokens_per_iteration: 4096
  max_eval_seconds: 300

llm:
  provider: anthropic  # anthropic | openai | ollama
  model: claude-sonnet-4
  # api_key read from ANTHROPIC_API_KEY env var

goal: "Improve the prompt to get higher task completion rates"
constraints:
  - "Keep the prompt under 2000 tokens"
  - "Maintain the existing structure"

LLM Providers

Provider Config API Key Env
Anthropic provider: anthropic ANTHROPIC_API_KEY
OpenAI provider: openai OPENAI_API_KEY
Ollama provider: ollama, base_url: http://localhost:11434

Integrations

sia runs standalone. It can be wired to any scheduler or event system via its exit codes and structured JSON output — a non-zero exit means no improvement was found this iteration.

Safety

  • Every change is checkpointed (git commit or file backup)
  • Full rollback always possible (sia rollback --session <id>)
  • Resource limits enforced (iterations, time, tokens, eval time)
  • No network access during evaluation by default

License

MIT

About

Self-Improving Agents - Autonomous experiment loop for any measurable artifact

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors