Add adaptive alert noise suppression with scoring and dedup windows#15
Merged
harishconti merged 2 commits intomainfrom Apr 11, 2026
Merged
Conversation
Addresses the #1 reason observability tools get abandoned: alert fatigue from high-frequency alerts that engineers learn to ignore. What ships: - AlertNoiseRecord model tracks per-(table, alert_type) firing counts across 1h, 24h, and 7d rolling windows with a composite noise score (0-100) - Adaptive dedup window: score=0→60min, score=25→2×, score=50→4×, score=75→8× (capped at configurable max, default 8h) - Severity trending: compares recent vs older alert severity groups to surface worsening/improving/stable trends - Auto-throttle flag set when count_24h >= auto_throttle_threshold (default 10) - Noise score refreshed after every successful dispatch so the next dedup window reflects current firing rate accurately - AlertLog.severity column added so trend analysis has raw data - GET/POST /alerts/noise API: list all records, summary stats, per-alert history, and POST .../reset to give a clean slate after root-cause fix - noise_suppression config block in kit.yml with documented score formula and all thresholds tunable (enabled, min/max window, throttle threshold) - 26 unit + integration tests covering scoring, trending, adaptive window math, throttle flag, idempotent upsert, and noise-aware is_alert_deduped https://claude.ai/code/session_018pzrXAhbmudF2BCP3xxqVR
- Remove forward-reference return type annotation that referenced AlertNoiseRecord before its import (F821) - Remove unused pytest import from test file (F401) - Sort imports to satisfy isort order (I001) https://claude.ai/code/session_018pzrXAhbmudF2BCP3xxqVR
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Implements a comprehensive alert noise suppression system that automatically extends deduplication windows for frequently-firing alerts, preventing alert fatigue without manual intervention.
The system tracks alert frequency across three time windows (1h, 24h, 7d) and computes a noise score (0–100) that drives an exponential dedup window multiplier. High-noise alerts are automatically throttled, and severity trends (worsening/improving/stable) are tracked to provide context.
Key Changes
Core Noise Suppression Logic (
alerts/base.py)_compute_noise_score(): Composite scoring function weighting recent alerts more heavily (1h×20, 24h×2, 7d×0.5), capped at 100_compute_severity_trend(): Compares average severity weights of recent vs. older alerts to detect worsening/improving patterns_get_adaptive_dedup_window(): Converts noise score to dedup window using exponential formula:min(max, min × 2^(score/25))_refresh_noise_record(): Recalculates and persists AlertNoiseRecord after each successful dispatch, counting alerts from audit logis_alert_deduped(): Enhanced to use adaptive window from noise record instead of fixed window; respectsnoise_suppression.enabledconfig flagData Model (
backend/models.py)AlertLog: Addedseveritycolumn (fail/warn/info) to track alert severity for trend analysisAlertNoiseRecord: New table tracking per-(table, alert_type) noise metrics:count_1h,count_24h,count_7dnoise_score,severity_trend,is_throttledAPI Endpoints (
backend/routers/alert_noise.py)New
/alerts/noiserouter with:GET /— List all noise records sorted by score (descending)GET /summary— High-level stats: total tracked, throttled count, worsening trends, top-5 noisiestGET /{table}/{alert_type}— Fetch noise record for specific alertPOST /{table}/{alert_type}/reset— Zero out score and throttle flag (for post-incident cleanup)GET /{table}/{alert_type}/history— Recent AlertLog rows for pattern analysisConfiguration (
config/kit.yml)New
alerts.noise_suppressionsection with:enabled: Toggle for the entire system (default: true)min_dedup_window_minutes: Baseline window (default: 60)max_dedup_window_minutes: Hard ceiling (default: 480)auto_throttle_threshold: 24h count that triggers throttling (default: 10)Database Migration (
alembic/versions/003_add_alert_noise_suppression.py)severitycolumn toalert_logsalert_noise_recordstable with indexes on table_name and alert_typeTesting
Comprehensive test suite in
tests/test_noise_suppression.py(386 lines) covering:All tests use
https://claude.ai/code/session_018pzrXAhbmudF2BCP3xxqVR