Skip to content

Add adaptive alert noise suppression with scoring and dedup windows#15

Merged
harishconti merged 2 commits intomainfrom
claude/add-alert-noise-suppression-zeoRN
Apr 11, 2026
Merged

Add adaptive alert noise suppression with scoring and dedup windows#15
harishconti merged 2 commits intomainfrom
claude/add-alert-noise-suppression-zeoRN

Conversation

@Sruthi-ng
Copy link
Copy Markdown

What does this PR do?

Implements a comprehensive alert noise suppression system that automatically extends deduplication windows for frequently-firing alerts, preventing alert fatigue without manual intervention.

The system tracks alert frequency across three time windows (1h, 24h, 7d) and computes a noise score (0–100) that drives an exponential dedup window multiplier. High-noise alerts are automatically throttled, and severity trends (worsening/improving/stable) are tracked to provide context.

Key Changes

Core Noise Suppression Logic (alerts/base.py)

  • _compute_noise_score(): Composite scoring function weighting recent alerts more heavily (1h×20, 24h×2, 7d×0.5), capped at 100
  • _compute_severity_trend(): Compares average severity weights of recent vs. older alerts to detect worsening/improving patterns
  • _get_adaptive_dedup_window(): Converts noise score to dedup window using exponential formula: min(max, min × 2^(score/25))
    • score=0 → 60 min (baseline)
    • score=25 → 120 min (2×)
    • score=50 → 240 min (4×)
    • score=100 → 480 min (8×, capped)
  • _refresh_noise_record(): Recalculates and persists AlertNoiseRecord after each successful dispatch, counting alerts from audit log
  • is_alert_deduped(): Enhanced to use adaptive window from noise record instead of fixed window; respects noise_suppression.enabled config flag

Data Model (backend/models.py)

  • AlertLog: Added severity column (fail/warn/info) to track alert severity for trend analysis
  • AlertNoiseRecord: New table tracking per-(table, alert_type) noise metrics:
    • Rolling counts: count_1h, count_24h, count_7d
    • Computed metrics: noise_score, severity_trend, is_throttled
    • Unique constraint on (table_name, alert_type) for upsert semantics

API Endpoints (backend/routers/alert_noise.py)

New /alerts/noise router with:

  • GET / — List all noise records sorted by score (descending)
  • GET /summary — High-level stats: total tracked, throttled count, worsening trends, top-5 noisiest
  • GET /{table}/{alert_type} — Fetch noise record for specific alert
  • POST /{table}/{alert_type}/reset — Zero out score and throttle flag (for post-incident cleanup)
  • GET /{table}/{alert_type}/history — Recent AlertLog rows for pattern analysis

Configuration (config/kit.yml)

New alerts.noise_suppression section with:

  • enabled: Toggle for the entire system (default: true)
  • min_dedup_window_minutes: Baseline window (default: 60)
  • max_dedup_window_minutes: Hard ceiling (default: 480)
  • auto_throttle_threshold: 24h count that triggers throttling (default: 10)

Database Migration (alembic/versions/003_add_alert_noise_suppression.py)

  • Adds severity column to alert_logs
  • Creates alert_noise_records table with indexes on table_name and alert_type

Testing

Comprehensive test suite in tests/test_noise_suppression.py (386 lines) covering:

  • Noise scoring: Zero counts, low/high rates, cap enforcement, per-window contributions
  • Severity trending: Stable/worsening/improving detection, empty group handling
  • Adaptive windows: Score-to-window conversion, custom min/max, exponential scaling
  • Noise record refresh: Creation, count accuracy, throttle flag logic, idempotent upserts
  • Integration: Dedup with adaptive windows, noise suppression toggle, high-noise window extension

All tests use

https://claude.ai/code/session_018pzrXAhbmudF2BCP3xxqVR

claude added 2 commits April 11, 2026 07:09
Addresses the #1 reason observability tools get abandoned: alert fatigue
from high-frequency alerts that engineers learn to ignore.

What ships:
- AlertNoiseRecord model tracks per-(table, alert_type) firing counts
  across 1h, 24h, and 7d rolling windows with a composite noise score (0-100)
- Adaptive dedup window: score=0→60min, score=25→2×, score=50→4×,
  score=75→8× (capped at configurable max, default 8h)
- Severity trending: compares recent vs older alert severity groups
  to surface worsening/improving/stable trends
- Auto-throttle flag set when count_24h >= auto_throttle_threshold (default 10)
- Noise score refreshed after every successful dispatch so the next
  dedup window reflects current firing rate accurately
- AlertLog.severity column added so trend analysis has raw data
- GET/POST /alerts/noise API: list all records, summary stats, per-alert
  history, and POST .../reset to give a clean slate after root-cause fix
- noise_suppression config block in kit.yml with documented score formula
  and all thresholds tunable (enabled, min/max window, throttle threshold)
- 26 unit + integration tests covering scoring, trending, adaptive window
  math, throttle flag, idempotent upsert, and noise-aware is_alert_deduped

https://claude.ai/code/session_018pzrXAhbmudF2BCP3xxqVR
- Remove forward-reference return type annotation that referenced
  AlertNoiseRecord before its import (F821)
- Remove unused pytest import from test file (F401)
- Sort imports to satisfy isort order (I001)

https://claude.ai/code/session_018pzrXAhbmudF2BCP3xxqVR
@harishconti harishconti merged commit 3e31c7d into main Apr 11, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants