Χ©ΦΈΧΧΦΈΧ (Shammah) - Hebrew: "watchman" or "guardian"
A local-first AI coding assistant that actually works offline.
Quick Start β’ Download β’ Features β’ Documentation
Problem: Cloud AI assistants require constant internet, cost money per query, and can't learn your specific coding patterns.
Solution: Shammah runs entirely on your machine with:
- β Works offline after initial setup
- β Instant responses from local models
- β Learns your style through weighted LoRA fine-tuning
- β Privacy-first - your code never leaves your machine
- β Free to run - no per-query costs
Unlike training a model from scratch (months + expensive GPUs), Shammah uses pre-trained models that work great immediately and adapt to your needs over time.
Option 1: One-Liner Install (Easiest)
curl -sSL https://raw.githubusercontent.com/schancel/shammah/main/install.sh | bashThis will:
- Detect your platform automatically
- Download the latest release
- Install to
~/.local/bin/shammah - Verify the installation
Option 2: Download Pre-Built Binary
# macOS (Apple Silicon)
curl -L https://github.com/schancel/shammah/releases/latest/download/shammah-macos-aarch64.tar.gz | tar xz
./shammah --version
# macOS (Intel)
curl -L https://github.com/schancel/shammah/releases/latest/download/shammah-macos-x86_64.tar.gz | tar xz
./shammah --version
# Linux
curl -L https://github.com/schancel/shammah/releases/latest/download/shammah-linux-x86_64.tar.gz | tar xz
./shammah --versionOption 3: Build from Source
git clone https://github.com/schancel/shammah
cd shammah
cargo build --release
./target/release/shammah --version# 1. Run setup wizard (interactive)
./shammah setup
# Enter:
# - Your Claude API key (from console.anthropic.com) - for fallback
# - Your HuggingFace token (from huggingface.co/settings/tokens) - for model downloads
# - Choose model size (auto-selected based on your RAM)
# 2. Start using it!
./shammah
# REPL appears instantly - you can start asking questions right away
> How do I implement a binary search tree in Rust?
# First time: Model downloads in background (1-14GB depending on RAM)
# You get Claude responses while model loads
# Once ready: Future queries use fast local model
> Explain Rust lifetimes
# Now using local Qwen model! β‘That's it! π
Works from day 1 - no training period required.
- Multiple model support - Qwen, Llama, Mistral, Phi via ONNX
- Adaptive sizing - Auto-selects based on your RAM:
- 8GB β 1.5B model (fast, 500ms responses)
- 16GB β 3B model (balanced)
- 32GB β 7B model (powerful)
- 64GB+ β 14B model (maximum capability)
- Instant startup - REPL ready in <100ms
- Hardware acceleration - Uses Metal (Apple Silicon), CUDA, or CPU
- Offline capable - No internet needed after first download
Model adapts to YOUR coding style and patterns.
How it works:
> /feedback high
This is a critical error - never use .unwrap() in production.
Always handle errors properly.
# This feedback has 10x impact on future responses
# Model learns to avoid this pattern stronglyThree feedback levels:
- π΄ High (10x): Critical errors, anti-patterns, security issues
- π‘ Medium (3x): Style preferences, better approaches
- π’ Normal (1x): Good examples to remember
Benefits:
- Specializes to your frameworks and libraries
- Remembers your architectural preferences
- Learns from mistakes without degrading base quality
- Efficient - trains only 0.1-1% of parameters
AI can inspect and modify your code:
> Read my Cargo.toml and suggest dependency updates
π§ Tool: Read (approved)
File: Cargo.toml
β Success
> Find all TODO comments in Rust files
π§ Tool: Glob (approved)
Pattern: **/*.rs
Found: 15 files
π§ Tool: Grep (approved)
Pattern: TODO.*
23 matches found
> Run the test suite
π§ Tool: Bash (requires confirmation)
Command: cargo test
Approve? [y/N/always]: y
β All tests passedAvailable tools:
- Read - Inspect files
- Glob - Find files by pattern
- Grep - Search with regex
- WebFetch - Get documentation
- Bash - Run commands
- Restart - Self-improvement
Safety built-in:
- Approve once or save patterns
- Session or persistent approvals
- Wildcards and regex matching
- Manage with
/patternscommands
Run as an OpenAI-compatible API server:
# Start daemon
./shammah daemon --bind 127.0.0.1:11435
# Use from any OpenAI-compatible client
curl http://127.0.0.1:11435/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-20250514",
"messages": [{"role": "user", "content": "Hello!"}]
}'Features:
- OpenAI-compatible API (drop-in replacement)
- Tool execution on client side (proper context/security)
- Session management with auto-cleanup
- Prometheus metrics for monitoring
- Production-ready (run as service)
User Query
β
Is local model ready?
ββ NO β Forward to Claude API (graceful fallback)
ββ YES β Use local model + LoRA adapters
β
Response to User
β
User provides feedback (optional)
ββ π΄ High-weight (10x) β Critical issues
ββ π‘ Medium-weight (3x) β Improvements
ββ π’ Normal-weight (1x) β Good examples
β
Background LoRA training (non-blocking)
β
Future responses incorporate learnings
./shammah
> How do I use lifetimes in Rust?
> Read my src/main.rs and suggest improvements
> Run the tests to see if my changes work
> /feedback high - Never use unsafe without documenting why
> /train - Manually trigger LoRA training
> /model status - Check current model
> /patterns - Manage tool approvals./shammah query "What's the best way to handle errors in Rust?"
# Or pipe input
echo "Explain closures" | ./shammah# Start daemon
./shammah daemon-start
# Check status
./shammah daemon-status
# Stop daemon
./shammah daemon-stopConfig file: ~/.shammah/config.toml
streaming_enabled = true
tui_enabled = true
[backend]
enabled = true
execution_target = "coreml" # or "cpu", "cuda"
model_family = "Qwen2"
model_size = "Medium" # or "Small", "Large", "XLarge"
[[teachers]]
provider = "claude"
api_key = "sk-ant-..." # Your Claude API key
model = "claude-sonnet-4-20250514"
name = "Claude (Primary)"
[client]
use_daemon = true
daemon_address = "127.0.0.1:11435"
auto_spawn = trueDay 1:
- β High-quality responses (pre-trained Qwen)
- β All coding queries work well
- π Start collecting feedback
Week 1:
- β Learns your code style
- β Adapts to preferred libraries
- π Building specialized adapter
Month 1:
- β Specialized for your domain
- β Remembers critical feedback
- β Handles codebase patterns
Month 3+:
- β Highly specialized to your work
- β Multiple domain adapters
- β Recognizes anti-patterns
| Metric | Value |
|---|---|
| REPL startup | <100ms |
| Model loading (cached) | 2-3 seconds |
| First download | 1.5-14GB |
| Local response time | 500ms-2s |
| LoRA overhead | +50-100ms |
| RAM usage | 3-28GB (model dependent) |
| Disk space | Model + ~5MB per adapter |
- β Works offline after setup
- β Faster local responses
- β Learns your patterns
- β Privacy - code stays local
- β No per-query costs
- β Immediate quality (day 1)
- β No training period
- β Efficient LoRA learning
- β Trains on your machine
- β Full tool execution
- β Weighted feedback
- β Instant startup
- β Apple Silicon GPU acceleration
- macOS (Apple Silicon or Intel), Linux, or Windows
- Rust 1.70+ (for building from source)
- 8GB+ RAM (16GB+ recommended)
- 2-15GB disk space (for models)
- Claude API key (free tier works) - for fallback
- HuggingFace account (free) - for model downloads
# Check HuggingFace token
cat ~/.cache/huggingface/token
# Should show: hf_...
# If not, get token from https://huggingface.co/settings/tokens# Switch to smaller model
./shammah
> /model select 1.5B# Check if using GPU
> /model status
# Should show: Device: Metal β (on Mac)
# If not, try:
> /model device metal# Run setup again to reconfigure
./shammah setup
# Or manually edit config
vim ~/.shammah/config.toml- Architecture - System design
- Contributing - Development guide
- Changelog - Release history
- Roadmap - Future plans
- Issues: https://github.com/schancel/shammah/issues
- Discussions: https://github.com/schancel/shammah/discussions
- Discord: [Coming soon]
We welcome contributions! Areas of interest:
- Additional model backends
- LoRA training optimizations
- Multi-GPU support
- Quantization for lower memory
- Additional tool implementations
See CONTRIBUTING.md for guidelines.
MIT OR Apache-2.0
Shammah - Your AI coding watchman that learns and improves with you. π‘οΈ
Download β’ Docs β’ Report Bug
Made with β€οΈ for developers who value privacy and local-first tools