| layout | default |
|---|---|
| title | The Core |
The core category is the load-bearing set — the 15 skills that make Aeon autonomous rather than just scheduled. Everything else in the catalog is a workload; these are the machine. They group into three clusters: self-evolution & self-healing, fleet / self-replication, and autonomous real-world action.
If you're building a derivative architecture, this is the set to keep and validate first. It doesn't need to be 100% identical — but each skill below earns its place with a specific mechanism, and those mechanisms are what to preserve.
autoresearch — evolves an existing skill
Input: var = a skill name (required; aborts if empty)
- Loads the target SKILL.md, parses its purpose / data sources / output format / dependencies, and saves the original for the diff.
- Researches improvements — web-searches for better / alternative APIs, best practices, and known failure modes, plus reviews recent
memory/logs/runs andcron-state.jsonto see if it's been failing. - Generates 4 variations, each with a fixed thesis: A better inputs (data sources), B sharper output (format / quality), C more robust (fallbacks, edge cases, error handling), D rethink (a fundamentally different approach).
- Scores them against a rubric and ships the winner as a PR — tagging the chosen lineage as an HTML comment (
<!-- autoresearch: variation X -->) at the top of the file.
That comment is why you can see it already ran across the library: skill-repair is variation D, skill-health C, and create-skill / deploy-prototype / vuln-scanner B.
create-skill — generates a brand-new skill from one sentence
Input: a natural-language description (required)
- Parses the request into action verb + data source + output format + configurable param + cadence.
- Deep duplicate detection — keyword-greps existing skills, reads the top 3 candidates, and if one already does it, exits
CREATE_SKILL_DUPLICATE, suggesting you run the existing skill with a differentvar=. - Researches the data sources (WebSearch + WebFetch the canonical docs, cross-checked against a 2nd source to confirm the endpoint isn't deprecated) and records the URLs.
- New-secret guard — reads existing secret names (never values) via
gh api; if the new skill needs a secret that isn't configured, it forces the generated skill to degrade gracefully and documents the requirement in the PR. - Ships a complete, production-ready skill as a PR (never commits to main).
skill-health — the detector · daily 18:00
Audits every enabled skill from cron-state.json + per-run Haiku quality scores (memory/skill-health/*.json) + a skill-runs fallback for sandbox-blocked runs. Classifies each as CRITICAL / DEGRADED / FLAPPING / WARNING / HEALTHY / NO-DATA via first-matching-rule, computes a severity score, and detects systemic patterns (≥2 skills sharing an API host or error signature).
It files issues into memory/issues/ISS-NNN.md, resolves them when a skill recovers (drops it from affected_skills, flips status to resolved), and notifies only on state change. It won't touch the issue tracker unless the operator has opted in by creating INDEX.md.
skill-repair — the fixer · reactive, depends_on: skill-health
Phases: PREFLIGHT → TRIAGE → DIAGNOSE → REPAIR → VERIFY → LOG, with a one-shot exit taxonomy (REPAIR_OK_FIXED, REPAIR_OK_SYSTEMIC, REPAIR_DIAGNOSED_NO_FIX, REPAIR_NO_TARGETS, REPAIR_DRY_RUN, REPAIR_BLOCKED).
- Triage auto-picks the worst fixable target from open issues + cron-state (success-rate < 0.5, low quality score), and clusters by normalized error signature — if 2+ skills share one, it switches to systemic mode: one shared issue, one shared fix, instead of N patches.
- Guards against looping: 24h per-skill cooldown (tracked in
skill-repair-history.json), and caps itself at 3 repair PRs/day. - Builds a diagnostic dossier, applies the fix, opens a PR, and verifies.
dry-run:NAMEdiagnoses without writing.
skill-evals — the regression catcher · Sun 06:00
Runs an assertion manifest (evals.json) against each skill's latest output: empty / stale file checks, word-count floors, required / forbidden regex patterns, numeric range checks, and a quality cross-check against skill-health/*.json.
The lede is the diff vs. the previous run — each skill classified NEW_FAIL / FIXED / STILL_FAIL / NEW_PASS / STABLE, so it catches quality degradation between runs rather than just snapshotting. Files issues for every NEW_FAIL, runs a coverage audit (eval-audit) to surface enabled skills with no eval spec, and queues concrete fixes.
self-improve — broad self-tuning · every other day
Reads the last 2 days of logs + cron-state.json for the highest-impact, smallest fix (failing skills, timeouts, truncated notifications, low-quality output), makes one minimal targeted change (tighten a prompt, add backoff, fix a config), and opens a PR with Problem / Fix / Evidence.
Backpressure: if 3+ improvement PRs are already open, it exits without creating more debt. Explicitly forbidden from rewriting skills wholesale or touching architecture / secrets.
skill-health / skill-evals detect → file issue → skill-repair fixes → PR → merge → cron-state recovers → skill-health resolves the issue. CLAUDE.md codifies the contract: health skills file issues, repair skills close them.
spawn-instance — clones the agent into a new repo
Input: var = "name: purpose"
Forks the repo (using the upstream parent if this is itself a fork), sanitizes the name into aeon-<name>, configures its skill plan for the stated purpose, validates, enables Actions, and registers it in memory/instances.json. Full exit taxonomy with idempotent recovery (SPAWN_FORK_EXISTS_RECOVERED vs. ..._REGISTERED, SPAWN_PUSH_FAILED, etc.) and preflight checks (gh auth, rate limit ≥50).
Seeds the repo but never propagates secrets — each clone is inert until its owner adds their own API keys, giving billing isolation and blast-radius containment.
fleet-control — operates the managed fleet · twice daily 9/15
Three modes via var: Health Check (default), Status (status), Dispatch (dispatch <instance|*> <skill> [var=…]). For each registered instance it runs 3 parallel gh calls (repo metadata, last-24h workflow runs, child cron-state) into /tmp, classifies each as healthy / reachable, and emits a verdict-first report with a per-instance next-action column.
Dispatch mode lets the parent trigger a skill on one child — or all healthy / degraded children at once. State-change-gated notify; bails on missing gh auth or low rate limit.
fleet-scorecard — fleet economics · daily 13:00
Discovers the fleet at runtime (self + every non-archived instance — never hardcoded). All data is gathered by scripts/prefetch-fleet-scorecard.sh outside the sandbox, so the skill just reads /tmp/fleet-scorecard/* and writes the report — no network needed.
Aggregates runs / failures / generations / tokens / est. cost / cache discount (tokens in OpenRouter shape, cached ⊆ prompt), builds an Alerts block (any skill with ≥25% fail rate over 14d, cost spikes > 1.5× median daily delta, failure jumps > 10), writes memory/scorecard.md and appends a trend row to scorecard-history.csv.
contributor-reward → distribute-tokens — the pay-your-contributors flywheel
contributor-reward (Mon 09:30) reads the latest fork-contributor leaderboard, scores each contributor against a tier table (rank 1 = 25 USDC … rank 5 = 5, +5 first-PR bonus tracked once-ever per login, eligibility floor score ≥10 + must have an @handle), and writes the plan into memory/distributions.yml with a one-command run line. It deliberately stops short of sending — keeping a human-visible git diff as the audit trail.
distribute-tokens then does the actual on-chain send via the Bankr Wallet API with serious money-safety engineering: two-phase RESOLVE → EXECUTE (validate config / key / balance, resolve @handles → addresses, build plan; then send), per-recipient idempotency key + txHash so nothing double-sends across re-runs, dry-run mode, and recovery from partial runs. Wallet API for transfers only; read-only keys → 403 guard.
external-feature / feature — ships code to watched repos unprompted
Input: owner/repo, owner/repo#N, or empty (auto-pick)
Clones the repo, deeply reads it (CLAUDE.md, manifests, recent commits, open issues / PRs, test setup), then picks one change by priority: (1) fix an open issue, (2) code improvement — TODOs, missing error handling, untested critical paths, security fixes, (3) a new feature / DX improvement if the codebase is clean.
Implements it matching the repo's exact style on a branch (ai/...), commits with conventional-commit format (Closes #N), pushes, and opens a PR.
Hard rules: one enhancement per run, never push to main, no unrelated refactors, "if nothing's worth doing, log and exit." feature is the batch version that does this across the whole watched-repo list, preferring yesterday's repo.
deploy-prototype — generates a live web app and ships it to Vercel
Input: empty (auto-pick from signals), a plain brief, or a typed type:slug descriptor
Scans memory/topics/ and recent logs for a prototype-worthy signal, scores candidates on leverage / concreteness / novelty (must clear 9/15 or it exits DEPLOY_PROTOTYPE_EMPTY). Commits to a shape (slug, tagline, primary action, static-vs-API-vs-Next), then writes the files into .pending-deploy/files/ against a strict quality bar: self-contained, sub-1s load, mobile-first, OG tags, real data from public no-auth endpoints (no lorem), light / dark via prefers-color-scheme, no secrets.
Runs pre-flight checks (≤20 files, ≤4MB, slug regex, greps for leaked tokens and for TODO / placeholder), writes a prototype record + prototypes.md row. The actual GitHub-repo-create + Vercel deploy is handled by scripts/postprocess-deploy.sh (reads .pending-deploy/, uses VERCEL_TOKEN / GH_GLOBAL) — the skill flags DEPLOY_PROTOTYPE_NO_POSTPROCESS if that script is missing.
vuln-scanner — finds real vulns and discloses responsibly
Input: owner/repo, or auto-select from github-trending
Selection filters skip CTF / teaching repos, repos scanned in the last 30 days, and repos with no safe disclosure channel. Forks + shallow-clones, then runs purpose-built scanners (not grep): Semgrep, TruffleHog --only-verified (filesystem + git history), osv-scanner for dependency CVEs, Slither if Solidity is present — recording each tool's ok/fail status (empty ≠ clean).
Triages every hit by hand (read the code at the line, write one sentence on what an attacker controls, check reachability, assign severity; drop test / example findings). Then routes by finding type: dependency CVEs → public PR (already-public, net-positive); code flaws / verified secrets / contract bugs → a Private Vulnerability Report via gh api .../security-advisories (optionally a fix pushed to its own fork only, linked in the advisory — never a public PR). If no PVR and no SECURITY channel, it does nothing public and logs "no safe channel."
Core principles: do no harm, never post exploit chains publicly, all-scanners-failed ≠ clean (report as error). Dedup state in memory/vuln-scanned.json.