Applied Epistemic Engineering toolkit for AI-assisted development.
Intelligence proposes. Constraints decide. The ledger remembers.
specsmith treats belief systems like code: codable, testable, and deployable. It scaffolds epistemically-governed projects, stress-tests requirements as BeliefArtifacts, runs cryptographically-sealed trace vaults, and orchestrates AI agents under formal AEE governance.
0.11.0 — EU AI Act / NIST AI RMF compliance, context window management, and governance tools panel. Specsmith now ships a full compliance and auditability layer aligned to the EU AI Act (2024/1689) and the NIST AI Risk Management Framework 1.0. Every agent action is cryptographically sealed, every AI-generated output is disclosed, context windows are GPU-aware and protected against overflow, and a dedicated governance tools panel in Kairos surfaces compliance settings per-session and per-project.
specsmith governance-serve --port 7700 # Kairos governance REST API
specsmith sync # sync YAML → JSON → MD (YAML-first mode)
specsmith generate docs # regenerate REQUIREMENTS.md + TESTS.md from YAML
specsmith validate --strict # YAML schema checks: dup IDs, orphans, coverage
specsmith agent permissions-check git_push # check tool permission (REQ-012)
specsmith ollama gpu # detect GPU VRAM, recommend context size
specsmith export # generate full compliance report
# Update channel management (REQ-248)
specsmith channel set stable # pin to stable releases
specsmith channel set dev # opt in to dev/pre-release builds
specsmith channel get --json # show current channel + source
# ESDB extended lifecycle (REQ-249..253)
specsmith esdb export --json # dump all records to JSON snapshot
specsmith esdb import backup.json # validate + stage an import
specsmith esdb backup # create timestamped snapshot
specsmith esdb rollback --steps 2 # report WAL rollback (stub)
specsmith esdb compact # request WAL compaction
# Skills lifecycle (REQ-254..255)
specsmith skills deactivate <skill-id> # set active=false in skill.json
specsmith skills delete <skill-id> --yes # permanently remove skill
# MCP config generation (REQ-256)
specsmith mcp generate "Search USPTO patents" --json # JSON config stub
# Agent ask dispatcher — no LLM required (REQ-257)
specsmith agent ask "show esdb status" --json-output
specsmith agent ask "build skill for summarizing"It also co-installs the standalone epistemic Python library for direct use in any project:
from epistemic import AEESession # works in any Python 3.10+ project
from epistemic import BeliefArtifact, StressTester, CertaintyEngineAEE treats requirements, decisions, and assumptions — the beliefs your project depends on — as engineering artifacts subject to the same discipline as code: version control, testing, and refactoring.
The 4-step core method: Frame → Disassemble → Stress-Test → Reconstruct
The 5 foundational axioms:
- Observability — every belief must be inspectable
- Falsifiability — every belief must be challengeable
- Irreducibility — beliefs decompose to atomic primitives
- Reconstructability — every failed belief can be rebuilt
- Convergence — stress-test + recovery always reaches Equilibrium
specsmith tracks your project through the full AEE development cycle:
🌱 Inception → 🏗 Architecture → 📋 Requirements → ✅ Test Spec
→ ⚙ Implementation → 🔬 Verification → 🚀 Release
specsmith phase # show current phase + readiness checklist
specsmith phase next # advance to the next phase (runs checks first)
specsmith phase set requirements # jump to a specific phase
specsmith phase list # list all phasesThe current phase is persisted in scaffold.yml as aee_phase and displayed in the
Kairos Governance page. Each phase has a checklist of file/command criteria, recommended
commands, and a readiness percentage.
Recommended — via pipx (works with Kairos, any terminal, and CI):
pipx install specsmith # core CLI + epistemic library
pipx inject specsmith anthropic # + Claude support
pipx inject specsmith openai # + GPT / O-series support
pipx inject specsmith google-generativeai # + Gemini supportOr with pip:
pip install specsmith # core
pip install "specsmith[anthropic]" # + Claude
pip install "specsmith[openai]" # + GPT/O-series
pip install "specsmith[gemini]" # + GeminiUpdate:
pipx upgrade specsmith
specsmith self-update# New project (interactive)
specsmith init
# Adopt an existing project
specsmith import --project-dir ./my-project
# Check governance health
specsmith audit --project-dir ./my-project
# Run AEE stress-test on requirements
specsmith stress-test --project-dir ./my-project
# Full epistemic audit (certainty + logic knots + recovery proposals)
specsmith epistemic-audit --project-dir ./my-project
# Start the agentic REPL
specsmith run --project-dir ./my-project
# AG2 agent shell — Planner/Builder/Verifier over Ollama
specsmith agent status # check agent config + Ollama
specsmith agent plan "add logging" # plan only (no execution)
specsmith agent run "fix lint errors" # full Plan → Build → Verify
specsmith agent improve "add tests" # self-improvement with reports
specsmith agent verify # run Verifier on current state
specsmith agent reports # list improvement reports
# Check current AEE workflow phase
specsmith phase --project-dir ./my-projectAs of v0.12, specsmith uses YAML-first governance: docs/requirements/*.yml
and docs/tests/*.yml are the canonical sources. REQUIREMENTS.md and TESTS.md
are generated artifacts — do not hand-edit them.
# YAML-first pipeline (v0.12+)
specsmith sync # YAML → .specsmith/*.json → docs/*.md (all in one)
specsmith generate docs # regenerate only the Markdown artifacts from YAML
specsmith generate docs --check # dry-run: report what would change
specsmith validate --strict # enforce schema: dup IDs, orphans, missing fields
specsmith validate --strict --json # machine-readable validation result
# CI guard (already in .github/workflows/ci.yml)
specsmith sync --check # exits 1 if JSON cache is out of sync with YAMLTo add a new requirement, edit the appropriate docs/requirements/<domain>.yml
file and run specsmith sync. Never hand-edit docs/REQUIREMENTS.md — it will
be overwritten by the next sync.
Domain files:
| File | REQ range | Domain |
|---|---|---|
docs/requirements/governance.yml |
REQ-001..064 | Core AEE governance |
docs/requirements/agent.yml |
REQ-065..129 | Nexus + CI |
docs/requirements/harness.yml |
REQ-130..160 | Slash commands + subagents |
docs/requirements/intelligence.yml |
REQ-161..220 | Instinct, eval, memory |
docs/requirements/context.yml |
REQ-244..247 | Context window |
docs/requirements/esdb.yml |
REQ-248..262 | ESDB + skills + MCP |
docs/requirements/ai_intelligence.yml |
REQ-263..299 | AI model intelligence |
docs/requirements/yaml_governance.yml |
REQ-300..399 | YAML governance layer |
Migration from Markdown-primary: Run
scripts/migrate_governance_to_yaml.py once to convert an existing project.
Idempotent — safe to re-run.
specsmith agent permissions # show active permission profile
specsmith agent permissions-check git_push # check if git_push is allowed
specsmith agent permissions-check git_push --no-log # dry-run (no ledger write)Configure in docs/SPECSMITH.yml:
agent:
permissions:
preset: standard # read_only | standard | extended | admin
# Or custom:
allow: [read_file, write_file, run_shell, git_status]
deny: [git_push, git_create_pr]specsmith is designed from the ground up for auditable, explainable, and human-overseen AI. It implements concrete compliance mechanisms mapped to the two major regulatory frameworks that govern AI systems in production today.
EU AI Act (Regulation 2024/1689) — The world's first comprehensive legal framework for AI, enforced across the European Union. High-risk AI systems must provide transparency, auditability, human oversight, and robustness. specsmith implements:
| EU AI Act Requirement | specsmith Mechanism |
|---|---|
| Art. 9 — Risk Management System | AEE verification loop with confidence scoring and equilibrium checks |
| Art. 12 — Logging & Record-Keeping | TraceVault SHA-256 chained ledger (tamper-evident, append-only) |
| Art. 13 — Transparency & Explainability | ai_disclosure block in every preflight response; /why in Nexus REPL |
| Art. 14 — Human Oversight | Human escalation threshold (--escalate-threshold); kill-switch CLI |
| Art. 15 — Accuracy & Robustness | Bounded retry (max 3×), confidence gates, hard context ceiling (REQ-247) |
| Art. 53 — GPAI Model Transparency | Provider + model name emitted in every ai_disclosure block |
NIST AI Risk Management Framework 1.0 (AI RMF) — The US standard for managing AI risk across the AI lifecycle. specsmith addresses all four core functions:
| NIST AI RMF Function | specsmith Mechanism |
|---|---|
| GOVERN — Policies & accountability | Governance rules (H1–H13), permissions profile, scaffold.yml policy |
| MAP — Risk identification | AEE stress-test, belief graph, contradictions and uncertainty metrics |
| MEASURE — Risk analysis | Confidence scoring, epistemic equilibrium, specsmith epistemic-audit |
| MANAGE — Risk treatment | Kill-switch, escalation, bounded retry, safe-write backup, permissions deny-list |
Every agent action, decision, milestone, and audit gate is recorded as a JSONL entry in
.specsmith/trace.jsonl. Each entry contains a SHA-256 hash of its own content plus the
hash of the previous entry, forming a cryptographic chain:
{"seq":1, "type":"DECISION", "description":"...", "hash":"a3f9...", "prev":"genesis"}
{"seq":2, "type":"MILESTONE", "description":"...", "hash":"7c2b...", "prev":"a3f9..."}Any modification to a past entry breaks every subsequent hash. specsmith trace verify
detects and reports the first corrupted entry. The file is append-only — overwrites are
blocked by safe_write. This satisfies EU AI Act Art. 12 (logging and record-keeping)
and NIST AI RMF GOVERN (accountability trail).
Every preflight response includes a mandatory ai_disclosure block:
{
"ai_disclosure": {
"governed_by": "specsmith",
"governance_gated": true,
"provider": "ollama",
"model": "qwen2.5:14b",
"spec_version": "0.11.0"
}
}This ensures every AI-generated output is traceable to its source model and version, meeting EU AI Act Art. 13 (transparency) and Art. 53 (GPAI transparency). It is impossible to suppress — the field is injected at the governance layer before any response is returned to the client.
When an action's confidence is below the escalation threshold, specsmith sets
escalation_required: true and includes an escalation_reason in the preflight payload.
Kairos surfaces this as a confirmation dialog before execution proceeds.
specsmith preflight "deploy to production" --escalate-threshold 0.85 --json
# → escalation_required: true, escalation_reason: "confidence 0.71 < threshold 0.85"This implements EU AI Act Art. 14 (human oversight) and NIST AI RMF MANAGE.
A kill-session CLI command and keyboard shortcut (surfaced in Kairos) immediately
terminates all active agent sessions and records a timestamped kill event in LEDGER.md:
specsmith kill-session # terminate all sessions, log kill event
specsmith kill-session --session abc123 # terminate a specific sessionThis satisfies EU AI Act Art. 14 §4 (ability to intervene and stop the AI system) and is required for certification of high-risk AI systems.
All governance file writes go through safe_write, which:
- Appends to
LEDGER.mdand.specsmith/ledger.jsonl— never truncates - Backs up any file before overwriting it (timestamped
.bakcopy) - Prevents accidental destruction of audit history
This satisfies EU AI Act Art. 12 (records must be kept for the lifetime of the system) and provides recovery capability per NIST AI RMF MANAGE.
Every agent tool call is gated through a permission profile. Tools outside the active profile are denied with exit code 3 and a ledger entry:
specsmith agent permissions-check git_push # exit 0 = allowed, exit 3 = denied
specsmith agent permissions # show active profileFour built-in presets (read_only, standard, extended, admin) plus full
custom allow/deny lists in .specsmith/config.yml. This implements NIST AI RMF GOVERN
(policy enforcement) and principle of least privilege per standard security practice.
Before any shell command is executed, agent.safety.is_safe_command() classifies it
against a deny list of destructive patterns (rm -rf, git push origin main,
kubectl apply, cat .env, etc.). Denied commands are blocked and logged.
This implements NIST AI RMF MANAGE (risk treatment at the action level).
specsmith export generates a full compliance report containing:
- AI System Inventory — all providers, models, and versions used
- Risk Classification — AEE phase, confidence scores, open work items
- Human Oversight Controls — active permission profile, escalation settings, kill-switch state
- Audit Trail Summary — TraceVault chain length, last verification, any tampering
specsmith export --format markdown > compliance-report.md
specsmith export --format json > compliance-report.jsonThis report is suitable for submission to regulators, internal audit teams, or SOC-2 / ISO-42001 reviewers.
Compliance settings are layered:
- Global defaults —
~/.specsmith/config.yml(user-level defaults) - Per-project policy —
.specsmith/config.yml(committed to the repo) - Per-session overrides — Kairos Governance panel or CLI flags
The Kairos Governance Tools Panel (Settings → Governance) exposes all compliance
controls in a live UI: escalation threshold, permission profile, kill-switch, audit log
viewer, and context window settings. Changes take effect immediately for the active
session and can optionally be written back to the per-project .specsmith/config.yml.
specsmith enforces safe, efficient use of LLM context windows — especially critical when running local models via Ollama where the context limit directly affects GPU VRAM.
specsmith ollama gpu # detect GPU VRAM (NVIDIA + AMD supported)
specsmith ollama available # show models within your VRAM budgetVRAM tiers and recommended context sizes:
| VRAM | Recommended Context |
|---|---|
| < 6 GB (CPU or low-end GPU) | 4,096 tokens |
| 6–11 GB | 8,192 tokens |
| 12–19 GB | 16,384 tokens |
| 20 GB+ | 32,768 tokens |
Override via SPECSMITH_OLLAMA_CONTEXT_LENGTH or ollama.context_length in .specsmith/config.yml.
The context fill tracker emits real-time JSONL events consumed by Kairos:
{"type": "context_fill", "used": 27500, "limit": 32768, "pct": 83.9}Kairos displays a compact fill bar in the agent footer. When fill reaches the compression threshold (default 80%), specsmith signals that context summarization should run before the next turn.
When fill reaches the compression threshold, specsmith automatically triggers conversation summarization — the current context is condensed to a compact summary that preserves key decisions and facts while freeing window space. This happens transparently before the next agent turn.
Configure in .specsmith/config.yml:
context:
compression_threshold_pct: 80 # trigger summarization at 80% fill
auto_compress: true # enable automatic compressionA hard reservation of 15% of the context window (minimum 2,048 tokens) is always
held back for the governance layer. Attempts to fill beyond the effective ceiling raise
ContextFullError — making it impossible to reach a state where even a compression
request cannot be processed. This is a safety invariant, not a configuration option.
Kairos is the companion Rust terminal runtime (BitConcepts/kairos). specsmith
acts as the governance backend: Kairos spawns specsmith governance-serve at startup
and routes all preflight and verify calls through it.
# Start the governance REST API (Kairos calls this automatically)
specsmith governance-serve --port 7700 --project-dir .
# Classify a natural-language utterance under Specsmith governance
specsmith preflight "fix the cleanup dry-run regression" --json
# Start the agentic REPL
specsmith run
> what does the cleanup module do? # read-only ask -> answered
> fix the cleanup dry-run regression # change -> Specsmith approves, runs
> delete the entire dist directory # destructive -> needs clarificationThe Nexus runtime is specsmith's local-first agentic REPL — a governance-gated broker that sits between you and the LLM.
Every utterance passes through specsmith preflight before execution.
The broker classifies intent, matches requirements, and gates the action.
After execution, specsmith verify checks equilibrium. The /why command
shows the full governance trace.
# Interactive REPL with governance
specsmith run
nexus> fix the cleanup bug # broker classifies → accepts → executes → verifies
nexus> /why # show governance trace for last action
nexus> /exitThe Nexus broker:
- Preflight gate: every change goes through
specsmith preflight - Bounded retry: failed actions retry up to 3× with strategy classification
- Execution trace: every action is sealed in the cryptographic trace vault
/whytoggle: shows governance rationale in human-readable form
**How it works.** A natural-language **broker** classifies intent, infers scope from
your requirements, and asks Specsmith to **preflight** the request. Only when the
preflight decision is `accepted` does Nexus drive the AG2 orchestrator — and it does so
through a **bounded-retry harness** so you can never accidentally run away. By default,
Nexus speaks plain English; toggle `/why` in the REPL to surface the underlying
requirement, test, and work-item identifiers Specsmith assigned.
**Pieces in this repo.**
- `specsmith preflight` — CLI subcommand emitting a deterministic governance JSON payload
(`decision`, `requirement_ids`, `test_case_ids`, `confidence_target`, `instruction`).
- `src/specsmith/agent/broker.py` — natural-language broker (intent + scope + narration).
- `src/specsmith/agent/repl.py` — Nexus REPL with the `/why` toggle and execution gate.
- `docker-compose.yml` — pinned vLLM `l1-nexus` model server with the Hermes tool-call parser.
- `scripts/nexus_smoke.py` — opt-in live smoke test (`NEXUS_LIVE=1` to run against
a running container).
---
## AI Model Intelligence
specsmith ships a complete AI model intelligence layer for tracking, scoring, and routing
to the best available LLM for each task type.
### HF Open LLM Leaderboard Sync (REQ-263..REQ-269)
Syncs benchmark data from the HuggingFace Open LLM Leaderboard and computes three
task-specific bucket scores — **reasoning**, **conversational**, and **longform** — for
every model. A 40+ model static fallback ensures scores are always available even without
network access.
```bash
specsmith model-intel sync # sync from HF leaderboard (static fallback if offline)
specsmith model-intel scores # list all cached bucket scores
specsmith model-intel scores --model gpt-4o # show scores for a specific model
specsmith model-intel recommendations # top-10 models for reasoning bucket
specsmith model-intel recommendations --bucket conversational # or longform
specsmith model-intel connection # test HF API connectivity + token status
Set SPECSMITH_HF_TOKEN for authenticated access (1000 req/5min instead of 500).
Scores persist to ~/.specsmith/model_scores.json. Background sync runs 15s after startup
then daily.
Bucket formulas (normalised 0-100):
- Reasoning = 0.35×MATH + 0.30×GPQA + 0.25×BBH + 0.10×IFEval
- Conversational = 0.40×IFEval + 0.35×MMLU-PRO + 0.25×BBH
- Longform = 0.35×MUSR + 0.35×IFEval + 0.30×MMLU-PRO
40+ pre-built model profiles cover all major providers (OpenAI, Anthropic, Google, Mistral,
Meta Llama, Qwen, DeepSeek, and local Ollama variants). Each profile specifies:
max_tokens, prompt_style (sections/xml/markdown), supports_vision,
supports_tool_calls, reasoning_mode, and context_window.
Context-aware history trimming preserves system messages while summarising older turns when the token budget is exceeded:
from specsmith.agent.model_profiles import get_profile, trim_history
profile = get_profile("qwen2.5:14b") # exact or prefix match; returns default if unknown
messages = trim_history(messages, budget_chars=12000)LLMClient wraps multiple providers with automatic fallback on 429 / 401 errors,
O-series parameter translation (max_completion_tokens, temperature=1, developer role),
and vLLM guided-JSON payload injection:
from specsmith.agent.llm_client import LLMClient
client = LLMClient([
{"provider_type": "cloud", "model": "gpt-4o", ...},
{"provider_type": "ollama", "model": "qwen2.5:14b", ...}, # local fallback
])
result = client.chat([{"role": "user", "content": "hello"}])A registry of 10+ pre-configured endpoint presets for common cloud and local LLM providers:
specsmith agent endpoint-presets # list all presets (vllm, lm_studio, openrouter, etc.)
specsmith agent endpoint-presets --json # machine-readable output
specsmith agent suggest-profiles # suggest optimal profiles based on env (API keys, hardware)
specsmith agent suggest-profiles --json # structured suggestions with bucket/role annotationsSuggestions are read-only (never persisted) and inspect OPENAI_API_KEY, ANTHROPIC_API_KEY,
GOOGLE_API_KEY, and local Ollama availability.
The Kairos Agents > AI Providers table gained three new columns — R (reasoning), C (conversational), L (longform) — showing each provider's HF bucket scores inline. A Sync Scores button triggers a background sync from the HF leaderboard without interrupting the active session.
Kairos is the recommended terminal client for specsmith. Kairos spawns specsmith as a managed governance child process at startup and routes all preflight, verify, and BYOE proxy calls through it. The Governance settings page shows live specsmith status, version, and one-click update.
# Kairos starts specsmith automatically; or run manually:
specsmith governance-serve --port 7700 --project-dir .The VS Code extension (specsmith-vscode) has been deprecated in favour of Kairos.
Use pipx install specsmith for standalone CLI usage from any terminal.
specsmith is open source and built by a small team. Every bit of support helps:
- ⭐ Star specsmith and kairos on GitHub
- 📣 Tell your friends and colleagues — word of mouth is our best marketing
- 🐛 Report bugs via GitHub Issues — even small ones help
- 💡 Suggest features via GitHub Discussions — we read every suggestion
- 🔧 Fix bugs and contribute — see CONTRIBUTING.md; PRs welcome
- 📝 Write about specsmith — blog posts, tutorials, and talks help the community grow
- ❤️ Sponsor BitConcepts — directly funds development
specsmith has first-class Ollama support, including:
specsmith ollama gpu # detect GPU and VRAM tier
specsmith ollama available # show catalog filtered by VRAM budget
specsmith ollama available --task code # filter by task type
specsmith ollama pull qwen2.5:14b # download a model
specsmith ollama suggest requirements # task-based recommendations
specsmith ollama list # show installed modelsGPU-aware context sizing: 4K/8K/16K/32K tokens based on detected VRAM.
Override via SPECSMITH_OLLAMA_CONTEXT_LENGTH env var or ollama.context_length in .specsmith/config.yml.
specsmith supports FPGA-specific project types with full governance:
# scaffold.yml
type: fpga-rtl-amd # or fpga-rtl-intel / fpga-rtl-lattice / fpga-rtl
fpga_tools:
- vivado
- gtkwave
- vsg
- ghdl
- verilatorSupported tools: Synthesis: vivado, quartus, radiant, diamond, gowin. Simulation: ghdl, iverilog, verilator, modelsim, questasim, xsim. Waveform: gtkwave, surfer. Linting: vsg, verible, svlint. Formal: symbiyosys. OSS flow: yosys, nextpnr, openFPGALoader.
Governance: init import audit validate diff upgrade compress doctor export architect
AEE Epistemic: stress-test epistemic-audit belief-graph trace seal/verify/log integrate
Workflow: phase show/set/next/list ledger add/list req list/add/gaps/trace
Agent: run agent run/plan/status/verify/improve/reports agent providers/tools/skills agent suggest-profiles agent endpoint-presets
Model Intel: model-intel sync model-intel scores model-intel recommendations model-intel connection
Ollama: ollama list/available/gpu/pull/suggest
Workspace: workspace init/audit/export
VCS: commit push sync branch pr status
Tools: tools scan [--fpga] tools install <tool> tools rules [--tool] [--list]
Tools: exec ps abort watch optimize credits self-update
Auth: auth set/list/remove/check
Patent: patent search/prior-art
Software: Python CLI/lib/web, Rust, Go, C/C++, .NET, Node.js/TypeScript, mobile, microservices, data/ML.
Hardware/Embedded: FPGA/RTL (Xilinx, Intel, Lattice, generic), Yocto BSP, embedded C/C++.
Documents: Technical specs, research papers, API specs, requirements management.
Business/Legal: Business plans, patent applications, compliance frameworks.
The standalone epistemic Python library works in any Python 3.10+ project — no specsmith coupling:
from epistemic import AEESession, BeliefArtifact, StressTester
session = AEESession("my-project", threshold=0.70)
session.add_belief(
artifact_id="HYP-001",
propositions=["The API always returns valid JSON"],
epistemic_boundary=["Valid auth token required"],
)
session.accept("HYP-001")
result = session.run()
print(result.summary())
# certainty=0.55, failures=2, equilibrium=FalseUse cases: linguistics research, compliance pipelines, AI alignment, patent prosecution.
13 hard rules enforced by specsmith validate:
- H11 — Every loop or blocking wait must have a timeout, fallback exit, and diagnostic message.
- H12 — Windows multi-step automation goes into
.cmdfiles, not inline shell invocations. - H13 — Agent tools must declare epistemic contracts (what they claim and what they cannot detect).
specsmith governs itself — the specsmith repo is a specsmith-managed project. Run specsmith audit
in this repo to check its governance health. This means every feature we add to specsmith is
immediately dogfooded on specsmith itself. Kairos
is the companion terminal and flagship client.
specsmith.readthedocs.io — Full manual: AEE primer, command reference, project types, tool registry, governance model, Ollama guide, Kairos integration.
MIT — Copyright (c) 2026 BitConcepts, LLC.