Skip to content

barryroodt/sleuthly

Repository files navigation

Sleuthly

Slack-triggered incident-investigation agent. Right-click an alert in Slack → "Investigate" → a Block Kit report appears in the thread.

Sleuthly is self-hosted. There is no shared service. Each org runs its own Cloudflare Worker + GitHub Actions against its own fork of this repo.

How it works

Slack message shortcut
   ↓
Cloudflare Worker  (verify HMAC, ack <3s, send working msg, repository_dispatch)
   ↓
GitHub Action      (pull GHCR image, mount runbooks/ + .sleuthly.yml, run container)
   ↓
Docker container   (parse → match runbook → render prompt → agent.run → post)
   ↓
Slack thread reply (chat.postMessage with Block Kit blocks)

Features

  • Agent-agnostic. RuntimeAgent interface; ships with Pi (default) and Claude Code runners. Pick per-deployment via .sleuthly.yml.
  • Runbooks as code. Markdown files in runbooks/, matched by exact title-substring against the Slack message, or by alert-name regex.
  • Pluggable integrations. Each integration is a self-contained directory under integrations/ with a plugin.yml describing its CLI install + env + agent-facing usage. Enabled set is a comma-separated list in .sleuthly.yml and a build-arg on the image.
  • Image caching. Pre-built on GHCR per integration-set hash + commit SHA. Runbook edits do not require a rebuild — they are bind-mounted at runtime.

Self-hosting

docs/self-hosting.md walks the org admin through fork → Slack App → Cloudflare Worker → first dispatch.

docs/adoption.md describes the user experience for oncall engineers triggering investigations.

Writing runbooks

docs/writing-runbooks.md describes the frontmatter schema and "Instructions for Agents" structure.

Writing integrations

docs/writing-integrations.md describes how to add a new CLI-based integration (Datadog, Sentry, your in-house tool, …).

Local development

pnpm install
pnpm typecheck
pnpm test                       # 60 tests, ~700 ms
(cd cloudflare-worker && pnpm install && pnpm typecheck && pnpm test)

# Build the container locally:
docker build --build-arg AGENT=pi --build-arg ENABLED_INTEGRATIONS=slack -t sleuthly:dev .

# Run the container against a fixture in DRY_RUN (no Slack post):
DISPATCH_PAYLOAD='{"callback_id":"investigate_incident","channel_id":"C","message_ts":"1","message_text":"[FIRING:1] Boom","message_attachments":[],"user":{"id":"U","name":"n"},"response_url":"https://hooks.slack.com/x","team_id":"T"}' \
AGENT_API_TOKEN=tok SLACK_BOT_TOKEN=xoxb-stub AXIOM_TOKEN=stub AXIOM_ORG_ID=stub DRY_RUN=true \
docker run --rm -i \
  -e DISPATCH_PAYLOAD -e AGENT_API_TOKEN -e SLACK_BOT_TOKEN -e AXIOM_TOKEN -e AXIOM_ORG_ID -e DRY_RUN \
  sleuthly:dev

Project layout

sleuthly/
├── src/                container TypeScript
│   ├── config/         Zod schemas (env, .sleuthly.yml, types)
│   ├── parse/          Slack payload + incident detection
│   ├── runbooks/       loader + matcher
│   ├── integrations/   plugin manifest registry
│   ├── agent/          RuntimeAgent + runners (pi, claude-code)
│   ├── prompt/         system + user prompt templates
│   ├── post/           Block Kit extract + Slack post + failure fallback
│   └── workflow/       Mastra-style step chain
├── cloudflare-worker/  webhook receiver (verify + ack + dispatch)
├── integrations/       production plugin manifests (slack, axiom, …)
├── runbooks/           production runbooks (markdown + frontmatter)
├── scripts/            install-agent.sh, install-integrations.sh, replay.ts
├── tests/              vitest (mirrors src layout)
├── Dockerfile          multi-stage (base/deps/integrations/build/runtime)
└── .github/workflows/  test, smoke, build-and-publish, investigate

Docs

About

Slack-triggered AI incident investigation. Right-click an alert → Block Kit report in the thread. Self-hosted: Cloudflare Worker + GitHub Actions + Docker, pluggable agents (Pi default) and integration CLIs (Axiom + more), runbooks as code.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors