Skip to content

Grafana dashboards and alerting rules #248

@test3207

Description

@test3207

Parent Epic: #190

Status: ✅ COMPLETED (2025-12-28)

Grafana dashboards and alerts deployed via Terraform IaC to Grafana Cloud.


Overview

Create Grafana dashboards and alerting rules for M3W production monitoring.

Deployed Resources

Resource URL Status
System Overview https://test3207.grafana.net/d/m3w-system-overview ✅ Working
Application Dashboard https://test3207.grafana.net/d/m3w-application ⚠️ Partial (needs structured logs)
Log Explorer https://test3207.grafana.net/d/m3w-log-explorer ⏳ Pending (needs logging PR merged)
Alert Rules https://test3207.grafana.net/alerting/list ✅ Working

Dashboards

1. System Overview ✅ Working Now

  • Node health (all 4 VMs)
  • CPU, Memory, Disk, Network per node
  • Container count
  • System load, Uptime

2. Application Dashboard ⚠️ Partial

3. Log Explorer ⏳ Pending

Alert Rules ✅ Deployed

Critical (Email immediately)

Alert Condition For Status
Node Down No metrics for 2min 2m ✅ Active
Disk Usage Critical >90% usage 5m ✅ Active
Memory Usage Critical >90% usage 5m ✅ Active

Warning (Email digest)

Alert Condition For Status
High CPU >80% for 10min 10m ✅ Active
High Memory >80% for 10min 10m ✅ Active
Disk Usage Warning >80% for 5min 5m ✅ Active

Tasks

  • Create System Overview dashboard
  • Create Application dashboard
  • Create Log Explorer dashboard
  • Configure alert contact point (Email)
  • Create alert rules
  • Test alerts (trigger manually)
  • Document dashboard access (in SECRETS.md)

IaC Files (m3w-k8s)

terraform/
├── grafana.tf           # Dashboard, alert, contact point resources
├── grafana/
│   ├── system-overview.json
│   ├── application.json
│   └── log-explorer.json
├── providers.tf         # grafana provider v4.21.0
└── variables.tf         # grafana_url, grafana_service_account_token

Dependencies

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions