4,107 events/sec at p95 8ms. Production-grade webhook delivery engine with guaranteed delivery, exponential backoff, real-time observability, and multi-tenant isolation — deployable to Kubernetes in minutes.
| Metric | Result |
|---|---|
| Ingest throughput | 4,107 events/sec |
| p95 ingest latency | 8ms |
| Delivery throughput | 1,812 deliveries/sec |
| p95 delivery latency | 143ms |
| Steady-state error rate | 0.00% |
k6 load test — 4 vCPU / 8 GB, 1 API + 3 worker replicas. Full results: benchmarks/results/baseline-2025-02-13.txt
Hookstream is a standalone webhook delivery service. When your application fires an event, Hookstream handles getting that payload to subscriber endpoints — reliably, with retries, with full delivery history, and with observability built in.
Think of it as a self-hosted Svix or Hookdeck. Every SaaS company needs this internally; most build it badly as an afterthought. This is the correct version.
Built for systems where delivery failure is not acceptable.
Your Application
│
│ POST /events (fire and forget)
▼
┌──────────────────┐
│ Ingest API │ Express + TypeScript
│ - Auth check │ Validates, enqueues, returns 202
│ - Payload sign │ HMAC-SHA256 signature header
│ - Enqueue │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Redis / BullMQ │ Durable job queue
│ - Per-tenant │ Separate queues per tenant
│ - Priority │ Configurable concurrency
│ - Persistence │
└────────┬─────────┘
│
▼
┌──────────────────────────────────────────┐
│ Delivery Workers │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ For each subscriber endpoint: │ │
│ │ 1. POST payload with HMAC header │ │
│ │ 2. Expect 2xx within timeout │ │
│ │ 3. On failure → exponential retry │ │
│ │ Attempt 1: immediate │ │
│ │ Attempt 2: +30s │ │
│ │ Attempt 3: +5min │ │
│ │ Attempt 4: +30min │ │
│ │ Attempt 5: +2hr → dead letter │ │
│ └─────────────────────────────────────┘ │
│ │
│ Circuit breaker per endpoint: │
│ 5 consecutive failures → open circuit │
│ Auto-retry after 10min cooling period │
└────────┬─────────────────────────────────┘
│
┌────┴──────────────┐
▼ ▼
PostgreSQL Prometheus
(delivery log, (metrics: delivery
dead letter, rate, latency,
subscriber cfg) failure rate,
queue depth)
| Layer | Technology | Why |
|---|---|---|
| Runtime | Node.js 20 + TypeScript | Type-safe message contracts across producers and workers |
| Queue | BullMQ on Redis | Persistent, ordered, retry-aware job queue with dashboard |
| Database | PostgreSQL 15 | Delivery log, subscriber registry, dead letter store |
| Metrics | Prometheus + Grafana | Real-time delivery visibility; Grafana dashboard included |
| Containerisation | Docker + Docker Compose | Dev stack up in one command |
| Orchestration | Kubernetes + Helm | Production deployment with horizontal scaling of workers |
| CI/CD | GitHub Actions | Lint → test → build → push to registry |
| Logging | Pino | Structured JSON logs, low overhead |
- Guaranteed at-least-once delivery
- Exponential backoff with configurable retry schedule
- Per-endpoint circuit breaker — failed endpoints are paused, not hammered
- Dead letter queue for undeliverable events with manual replay
- HMAC-SHA256 signature on every delivery (
X-Hookstream-Signatureheader) - Configurable delivery timeout per endpoint
- Register endpoints via API with per-endpoint event type filters
- Endpoint verification challenge (similar to Stripe's webhook registration)
- Per-tenant endpoint isolation — tenants cannot see each other's delivery logs
- Pause / resume individual endpoints
- Rotate signing secrets without downtime
- Prometheus metrics exported at
/metrics:hookstream_deliveries_total(labels: tenant, event_type, status)hookstream_delivery_duration_secondshookstream_queue_depth(per tenant)hookstream_circuit_breaker_state(per endpoint)
- Grafana dashboard definition included (
dashboards/hookstream.json) - Structured JSON logging on every delivery attempt
- Health endpoint:
GET /health— liveness + queue connectivity
POST /api/v1/events # Ingest event
GET /api/v1/events/:id # Event detail + delivery attempts
POST /api/v1/subscribers # Register endpoint
GET /api/v1/subscribers # List endpoints
DELETE /api/v1/subscribers/:id # Remove endpoint
PATCH /api/v1/subscribers/:id/pause # Pause delivery
POST /api/v1/subscribers/:id/rotate-secret # Rotate HMAC secret
GET /api/v1/deliveries # Delivery log (filterable)
POST /api/v1/deliveries/:id/replay # Replay a failed delivery
POST /api/v1/deliveries/diagnose # AI-powered failure diagnosis
GET /api/v1/dead-letter # Dead letter queue
GET /metrics # Prometheus metrics
GET /health # Health check
POST /api/v1/deliveries/diagnose sends an endpoint's recent failure history to Claude and returns a structured root-cause diagnosis — useful when a subscriber endpoint is in the dead-letter queue and you need to understand why without manually querying delivery logs.
curl -X POST http://localhost:3000/api/v1/deliveries/diagnose \
-H "Authorization: Bearer $API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"subscriber_id": "sub_abc123"}'{
"data": {
"summary": "Endpoint returning 401 consistently across all 14 attempts over 6 hours.",
"likely_cause": "Authentication token has expired or been rotated on the receiving end.",
"pattern": "auth",
"recommended_action": "Rotate the signing secret on the subscriber and update the token on the receiving endpoint.",
"confidence": "high"
}
}Requires ANTHROPIC_API_KEY in .env. Returns 501 if not configured.
git clone https://github.com/ykachala/hookstream.git
cd hookstream
cp .env.example .env
docker compose upServices:
- API:
http://localhost:3000 - BullMQ Board:
http://localhost:3001 - Prometheus:
http://localhost:9090 - Grafana:
http://localhost:3002(admin / admin)
# Run tests
npm test
# Run load test (k6)
k6 run tests/load/ingest.js# Register your endpoint
curl -X POST http://localhost:3000/api/v1/subscribers \
-H "Authorization: Bearer $API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"url": "https://your-endpoint.com/hooks", "events": ["payment.completed", "user.created"]}'
# Fire an event
curl -X POST http://localhost:3000/api/v1/events \
-H "Authorization: Bearer $API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"type": "payment.completed", "payload": {"amount": 5000, "currency": "ZAR"}}'On your endpoint, verify the X-Hookstream-Signature header:
import crypto from 'crypto';
function verifySignature(payload: string, signature: string, secret: string): boolean {
const expected = crypto
.createHmac('sha256', secret)
.update(payload)
.digest('hex');
return crypto.timingSafeEqual(Buffer.from(signature), Buffer.from(`sha256=${expected}`));
}Benchmarked on a 4-vCPU / 8GB instance with 3 worker replicas:
| Metric | Value |
|---|---|
| Ingest throughput | 4,100 events/sec |
| Delivery throughput (200 OK endpoints) | 1,800 deliveries/sec |
| p95 ingest latency | 8ms |
| p95 delivery latency (fast endpoint) | 145ms |
| Queue drain time (10k backlog) | ~6 seconds |
Load test: k6 run tests/load/ingest.js — no sleep, targets ingest saturation. Captured output from the baseline run: benchmarks/results/baseline-2025-02-13.txt.
helm install hookstream ./helm/hookstream \
--set workers.replicas=3 \
--set redis.url=$REDIS_URL \
--set database.url=$DATABASE_URLWorkers scale independently of the API — scale them horizontally to increase delivery throughput without touching the ingest layer.
hookstream/
├── src/
│ ├── api/ # Ingest and management routes
│ ├── workers/ # BullMQ delivery workers
│ │ ├── DeliveryWorker.ts
│ │ └── CircuitBreaker.ts
│ ├── queue/ # Queue definitions and producers
│ ├── db/ # PostgreSQL queries (no ORM — raw queries for perf)
│ ├── metrics/ # Prometheus client setup and counters
│ └── config/
├── tests/
│ ├── unit/
│ ├── integration/
│ └── load/ # k6 scripts
├── helm/ # Kubernetes Helm chart
├── dashboards/ # Grafana dashboard JSON
├── docker-compose.yml
└── .github/workflows/
- nexus-scheduler — uses Hookstream to deliver booking events to subscribers
- saas-multitenant-kit — Hookstream integrates as the outbound event layer
Author: Yoweli Kachala | LinkedIn | Cape Town, South Africa