The full-stack agent framework that runs entirely on your hardware.
Manages the complete agent lifecycle — tool execution in containers, network-level security, voice I/O, persistent memory with semantic search, and multi-node cluster routing. No cloud required.
The security architecture is described in our whitepaper:
The Capability-Container Pattern: Infrastructure-Level Security for Autonomous AI Agents Ricardo Ledan, 2026. DOI: 10.5281/zenodo.18614503
pip install harombe
harombe init # detects hardware, writes config
ollama pull qwen2.5:7b
harombe chat # autonomous agent with toolsOr use it as a library:
import asyncio
from harombe.agent.loop import Agent
from harombe.llm.ollama import OllamaClient
from harombe.tools.registry import get_enabled_tools
async def main():
llm = OllamaClient(model="qwen2.5:7b")
tools = get_enabled_tools(shell=True, filesystem=True, web_search=True)
agent = Agent(llm=llm, tools=tools, system_prompt="You are a helpful assistant.")
response = await agent.run("Find all Python files in src/ and count them.")
print(response)
asyncio.run(main())See examples/ for more.
harombe manages the full stack for AI agents — hardware detection, tool execution (in containers), security (network-level), I/O (voice + CLI + API), memory (SQLite + vector search), and networking (multi-node clusters with service discovery). Think of it as an operating system for AI agents — as a pip install.
harombe's security is infrastructure-level, not bolt-on:
- Every tool runs in its own Docker container with resource limits
- Per-container network egress filtering (iptables, DNS allowlists)
- Audit logging with automatic credential redaction
- Human-in-the-loop approval gates with risk classification
- Secret management via Vault/SOPS
- ZKP-based audit proofs (experimental)
| Responsibility | What it does |
|---|---|
| Execution | ReAct agent loop, tool calling (shell, filesystem, web search, browser, code exec) |
| Voice I/O | Whisper STT + Piper TTS, push-to-talk, VAD, WebSocket streaming |
| Memory | SQLite conversations, ChromaDB vectors, cross-session semantic search |
| Security | Container isolation, network filtering, audit logging, HITL gates, anomaly detection |
| Networking | Multi-node clusters, mDNS discovery, complexity-based routing, circuit breakers |
| Privacy | PII detection, data sanitization, local/hybrid/cloud routing modes |
| Extensibility | MCP server + client, container-isolated plugins with auto-generated MCP scaffolds |
┌─────────────────────────────────────┐
│ Layer 6: Clients │ Voice, iOS, Web, CLI
├─────────────────────────────────────┤
│ Layer 5: Privacy Router │ Hybrid local/cloud AI
│ PII detection, context sanitizer │ local-only / hybrid / cloud
├─────────────────────────────────────┤
│ Layer 4: Agent & Memory │ ReAct loop, tools, memory
├─────────────────────────────────────┤
│ Layer 3: Security │ Defense-in-depth
│ MCP Gateway, container isolation │ Credential vault, audit log
│ Per-tool egress, secret scanning │ HITL gates, browser pre-auth
├─────────────────────────────────────┤
│ Layer 2: Orchestration │ Smart routing, health monitoring
│ Cluster config, mDNS discovery │ Circuit breakers, metrics
├─────────────────────────────────────┤
│ Layer 1: Runtimes │ llama.cpp, Whisper, TTS, embeddings
└─────────────────────────────────────┘
Each layer only talks to its neighbors. Security (Layer 3) wraps every tool invocation — there is no path from the agent to a tool that bypasses the gateway. See ARCHITECTURE.md for full design documentation.
harombe enforces the Capability-Container Pattern: agents never execute tools directly. Every tool call goes through the MCP Gateway, which routes it to an isolated Docker container.
Agent ──→ MCP Gateway ──→ [ Container: shell ]
──→ [ Container: browser ]
──→ [ Container: web_search ]
Container Isolation — Each tool runs in its own Docker container with CPU/memory limits, read-only filesystem mounts, and no host network access.
Network Egress — Per-container iptables rules and DNS allowlists. A web_search container can reach DuckDuckGo; a filesystem container can reach nothing.
Audit Logging — Every tool call, approval decision, and security event is logged to SQLite with automatic redaction of API keys, passwords, and tokens.
HITL Gates — Operations are risk-classified (LOW/MEDIUM/HIGH/CRITICAL). High-risk operations require explicit human approval with default-deny on timeout.
Anomaly Detection — Per-agent Isolation Forest models learn baseline behavior and flag deviations. Integrated with SIEM (Splunk, Elasticsearch, Datadog).
See docs/security-quickstart.md for setup instructions.
ReAct agent loop with autonomous planning, tool calling, and multi-step execution. Tools: shell, read/write files, web search, browser automation. Configurable step limits and confirmation gates.
Whisper STT (tiny to large-v3) + Piper TTS. Push-to-talk CLI (harombe voice), REST + WebSocket API, voice activity detection. Cross-platform audio I/O.
SQLite conversation persistence with token-based context windowing. ChromaDB vector store with sentence-transformers embeddings (local, no API calls). Semantic search across sessions. RAG-enabled agents auto-inject relevant context.
Define nodes in YAML, route queries by complexity. mDNS auto-discovery, health monitoring, circuit breakers, load balancing. Works with any hardware mix: Apple Silicon, NVIDIA, AMD, CPU.
JSON-RPC 2.0 MCP server and client. Gateway-mediated tool execution with per-tool container isolation. Compatible with the broader MCP ecosystem.
Container-isolated plugins with auto-generated MCP scaffolds. ZKP audit proofs, compliance reporting, and container-based extensions ship as built-in plugins. See src/harombe/plugins/ for examples.
PII detection and data sanitization. Three routing modes: local-only (nothing leaves your machine), hybrid (sensitive data stays local, general queries can use cloud), and cloud. Configurable per-agent.
| # | Example | Description |
|---|---|---|
| 01 | simple_agent.py |
Basic single-node agent with all tools |
| 02 | api_usage.py |
Programmatic agent creation and tool usage |
| 03 | data_pipeline.py |
Data processing with autonomous agents |
| 04 | code_review.py |
Automated code review workflows |
| 05 | research_agent.py |
Research automation with web search |
| 06 | memory_conversation.py |
Persistent conversation history |
| 07 | cluster_routing.py |
Task-based routing across nodes |
| 08 | semantic_memory.py |
Semantic search and RAG |
| 09 | voice_assistant.py |
Voice-enabled assistant (STT + TTS) |
Configuration lives at ~/.harombe/harombe.yaml. Minimal example:
model:
name: qwen2.5:7b
temperature: 0.7
tools:
shell: true
filesystem: true
web_search: true
confirm_dangerous: true
memory:
enabled: true
storage_path: ~/.harombe/memory.dbAll fields have sensible defaults — you can run with no config at all. See harombe.yaml.example for the full reference.
| Production | Experimental | Planned |
|---|---|---|
| ReAct agent loop | Hardware security modules | iOS/Web clients |
| Tool execution (shell, fs, web) | Distributed cryptography | Multi-modal vision |
| Container isolation + egress | ||
| Code execution sandbox | ||
| ZKP audit proofs | ||
| Audit logging + secret management | ||
| HITL approval gates | ||
| Conversation memory + RAG | ||
| Voice (STT + TTS) | ||
| Multi-node clusters | ||
| MCP server + client | ||
| Privacy router + PII detection | ||
| Anomaly detection + SIEM |
2400+ tests. Python 3.11-3.13. CI on Ubuntu + macOS. See docs/roadmap.md for full phase history.
git clone https://github.com/smallthinkingmachines/harombe.git
cd harombe && pip install -e ".[dev]"
pytestSee docs/DEVELOPMENT.md for detailed setup and docs/CONTRIBUTING.md for contribution guidelines.
Apache 2.0 — see LICENSE.