Skip to content

feat: update alert & implement alert.destination.disabled#672

Open
alexluong wants to merge 9 commits intomainfrom
alert
Open

feat: update alert & implement alert.destination.disabled#672
alexluong wants to merge 9 commits intomainfrom
alert

Conversation

@alexluong
Copy link
Collaborator

@alexluong alexluong commented Feb 3, 2026

Changes

Alert Payload Schema

  • Restructure consecutive_failures into a nested object with current, max, threshold
  • Remove will_disable field (threshold == 100 implies disable)
  • Remove custom MarshalJSON methods in favor of standard json.Marshal
  • Add tenant_id to top-level of alert data
  • Expand AlertDestination with filter, metadata, updated_at
  • Simplify DeliveryAttempt struct: pass Event, Destination, Attempt directly

New alert.destination.disabled Callback

  • Sent when destination is auto-disabled after consecutive failures
  • Invariant check ensures DisabledAt is set on returned destination
  • Uses a reason field (e.g., "consecutive_failure") instead of embedding trigger-specific data, keeping the payload extensible for future disable mechanisms (e.g., error rate)
  • When the threshold is reached, both alert.destination.disabled and alert.destination.consecutive_failure (with threshold: 100) are sent. The disabled alert is sent first, but ordering is not guaranteed.

Consecutive Failure Alert at Threshold 100

  • The destination in the consecutive failure alert reflects the post-disable state (includes disabled_at)

Error Handling

  • Notifications are best-effort (logged, not propagated)
  • DestinationDisabler returns disabled destination for timestamp consistency

Notification Delivery

Alert notifications are sent synchronously via HTTP within HandleAttempt, which itself is called asynchronously from the delivery pipeline (go h.handleAlertAttempt(...)). This means notifications don't block event delivery, but a slow notification (e.g., the disabled alert) will delay subsequent notifications (e.g., the consecutive failure alert) within the same handler call.

Future consideration: If notification latency becomes an issue, we could decouple the two notifications — either by sending them concurrently or by introducing a notification queue. For now, the synchronous approach keeps the implementation simple and the best-effort error handling makes the current behavior acceptable.


Alert Payloads

event is a full models.Event and attempt is a full models.Attempt.

alert.destination.consecutive_failure

{
  "topic": "alert.destination.consecutive_failure",
  "timestamp": "2025-01-15T10:30:00Z",
  "data": {
    "tenant_id": "tenant_123",
    "event": { ... },
    "attempt": { ... },
    "consecutive_failures": {
      "current": 10,
      "max": 20,
      "threshold": 50
    },
    "destination": {
      "id": "dest_xyz",
      "tenant_id": "tenant_123",
      "type": "webhook",
      "topics": ["*"],
      "filter": {},
      "config": {},
      "metadata": {},
      "created_at": "2025-01-01T00:00:00Z",
      "updated_at": "2025-01-01T00:00:00Z",
      "disabled_at": null
    }
  }
}

Note: At threshold: 100, destination.disabled_at will be set (reflects post-disable state).

alert.destination.disabled

{
  "topic": "alert.destination.disabled",
  "timestamp": "2025-01-15T10:30:00Z",
  "data": {
    "tenant_id": "tenant_123",
    "disabled_at": "2025-01-15T10:30:00Z",
    "reason": "consecutive_failure",
    "event": { ... },
    "attempt": { ... },
    "destination": {
      "id": "dest_xyz",
      "tenant_id": "tenant_123",
      "type": "webhook",
      "topics": ["*"],
      "filter": {},
      "config": {},
      "metadata": {},
      "created_at": "2025-01-01T00:00:00Z",
      "updated_at": "2025-01-15T10:30:00Z",
      "disabled_at": "2025-01-15T10:30:00Z"
    }
  }
}

@vercel
Copy link

vercel bot commented Feb 3, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
outpost-docs Ready Ready Preview, Comment Feb 6, 2026 10:21am
outpost-website Ready Ready Preview, Comment Feb 6, 2026 10:21am

Request Review

@alexbouchardd
Copy link
Contributor

I think we should rename alert.consecutive_failure to something like alert.destination.consecutive_failure or alert.destination.failure.

One thing to consider here is what we'll do once we have alerts based on failure rate and the associated event type.

  1. Should we include attempt instead of event?

Yes, your proposition makes more sense

2

No strong opinion, but the current payload does lack progress, which is the threshold that was triggered

@alexluong
Copy link
Collaborator Author

One thing to consider here is what we'll do once we have alerts based on failure rate and the associated event type.

yes, let's do alert.destination.consecutive_failure in case we want to expand in the future?

but the current payload does lack progress

that's just the computed value of current / max tho, right? Or you want the threshold itself, so 50/70/90/100?

also do you know current can be higher than max? In that case, would progress be fixed at 100 (or 1) or would we continue adding it up?

@alexbouchardd
Copy link
Contributor

I would represent the threshold itself which may not line up with the current / max exactly.

alexluong and others added 4 commits February 6, 2026 14:23
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
TDD setup - tests will pass once feature is implemented.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Send alert when destination is auto-disabled after reaching
consecutive failure threshold.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…bject

Replace flat fields (ConsecutiveFailures, MaxConsecutiveFailures, Progress,
WillDisable) with a scoped ConsecutiveFailures struct containing Current,
Max, and Threshold. This produces a cleaner JSON payload structure and
removes the redundant WillDisable field (threshold == 100 implies disable).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace ConsecutiveFailures struct in DestinationDisabledData with a
Reason string field. This decouples the disabled alert from the specific
trigger mechanism, allowing future disable reasons (e.g., error rate)
without restructuring the payload.

Also update e2e types and assertions for the nested ConsecutiveFailures
struct introduced in the previous commit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…d 100

After disabling a destination, update the consecutive failure alert's
destination to include DisabledAt, so consumers see the post-disable
state. Add test asserting this behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use generic "failed to send alert" / "alert sent" messages with a topic
field instead of alert-type-specific message strings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@alexluong
Copy link
Collaborator Author

@alexbouchardd updated, the PR description is up-to-date if you want to review the schema, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants