Skip to content

fix(triggers): durable completion-hook delivery — retry, dead-letter, redeliver (closes #97)#123

Open
umi-appcoder[bot] wants to merge 1 commit into
mainfrom
fix/durable-completion-hooks
Open

fix(triggers): durable completion-hook delivery — retry, dead-letter, redeliver (closes #97)#123
umi-appcoder[bot] wants to merge 1 commit into
mainfrom
fix/durable-completion-hooks

Conversation

@umi-appcoder

@umi-appcoder umi-appcoder Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

What

Makes completion-hook delivery durable — bounded retries, a dead-letter record on exhaustion, and a redeliver endpoint. Closes #97 — PR 5 (finale) of the reliability-hardening series. Builds on the merged #96 reconcile loop.

Why

_fire_completion_hook was fire-and-forget — a single network blip dropped the callback, which is silent data loss for webhook/cron triggers whose whole purpose is the response_url POST.

Backend (server.py)

  • _deliver_hook — bounded exponential-backoff retry (KC_HOOK_MAX_ATTEMPTS, default 4); a permanent 4xx (except 429) isn't retried. Records delivery state on the task meta:
    • success → hook_delivery = {state:'delivered', attempts, status, delivered_at}
    • exhaustion → hook_delivery = {state:'failed', attempts, last_error, last_attempt_at} — the dead-letter.
  • _build_hook_request — extracted so fire + redeliver share payload/HMAC.
  • redeliver_hook + POST /api/claude/tasks/{id}/redeliver-hook (auth + readonly gated) to re-attempt a failed delivery.
  • At-most-once on success preserved (hook_fired_at marks the initial fire under lock); retries run in the daemon thread, off the request path.

SPA

TaskDetail Info tab surfaces hook_delivery state (delivered / failed + attempts + error, with the redeliver endpoint noted); TaskDetail type gains HookDelivery.

Tests

tests/completion_hook_delivery_test.py (7): success→delivered, retry→dead-letter, permanent-4xx no-retry, 429 retried, redeliver missing/no-url/runs-delivery. Python suite 473, SPA 120, OK.

Closes #97

🤖 Generated with Claude Code

… redeliver (#97)

_fire_completion_hook was fire-and-forget: a single network blip dropped
the callback, which is silent data loss for triggers whose whole point is
the response_url POST.

Backend (server.py):
- _deliver_hook: bounded exponential-backoff retry (KC_HOOK_MAX_ATTEMPTS,
  default 4); a permanent 4xx (except 429) is not retried. Records
  delivery state on the task meta: hook_delivery={state:'delivered',
  attempts, status, delivered_at} on success, or {state:'failed',
  attempts, last_error, last_attempt_at} (the dead-letter) on exhaustion.
- _build_hook_request: extracted so fire + redeliver share payload/HMAC.
- redeliver_hook + POST /api/claude/tasks/{id}/redeliver-hook
  (auth + readonly gated) to re-attempt a failed delivery.
- At-most-once on *success* is preserved (hook_fired_at marks the initial
  fire under lock); retries run in the daemon thread, off the request path.

SPA: TaskDetail Info tab surfaces hook_delivery state (delivered / failed
+ attempts + error), with the redeliver endpoint noted; TaskDetail type
gains HookDelivery.

Tests: tests/completion_hook_delivery_test.py (7) — success records
delivered, retry→dead-letter, permanent-4xx no-retry, 429 retried,
redeliver missing/no-url/runs-delivery. Python suite 473, SPA 120, OK.

Closes #97
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(triggers): durable completion-hook delivery — retry, dead-letter, redeliver

0 participants