Skip to content

Latest commit

 

History

History
362 lines (290 loc) · 7.16 KB

File metadata and controls

362 lines (290 loc) · 7.16 KB

API Documentation

Overview

Sentinel Backend API provides endpoints to monitor service health, manage incidents, and receive AI analysis reports.

Base URL: http://localhost:4000 (development)

Authentication: Currently none (future: JWT tokens)


📊 Endpoints

Health & Status

Get System Status

GET /api/status

Description: Get current health status of all monitored services

Response:

{
  "services": {
    "auth": {
      "status": "healthy",
      "code": 200,
      "lastUpdated": "2026-02-01T10:30:00Z"
    },
    "payment": {
      "status": "healthy",
      "code": 200,
      "lastUpdated": "2026-02-01T10:30:00Z"
    },
    "notification": {
      "status": "degraded",
      "code": 429,
      "lastUpdated": "2026-02-01T10:29:55Z"
    }
  },
  "aiAnalysis": "DEGRADED: Notification service experiencing rate limiting",
  "lastUpdated": "2026-02-01T10:30:00Z"
}

Status Values:

  • healthy - Service responding normally (HTTP 200-299)
  • degraded - Service experiencing issues (HTTP 400-499)
  • critical - Service down (HTTP 500+)
  • unknown - Never checked or unreachable

Activity & Events

Get Activity Log

GET /api/activity

Description: Get recent activity events and alerts

Query Parameters:

  • limit (optional): Number of events to return (default: 50, max: 100)
  • offset (optional): Pagination offset (default: 0)

Response:

{
  "activity": [
    {
      "id": 1706781000000,
      "message": "AUTH service is DOWN (HTTP 500)",
      "type": "alert",
      "severity": "critical",
      "timestamp": "2026-02-01T10:30:00Z"
    },
    {
      "id": 1706780950000,
      "message": "PAYMENT service is DEGRADED (HTTP 429)",
      "type": "alert",
      "severity": "warning",
      "timestamp": "2026-02-01T10:29:10Z"
    }
  ]
}

Fields:

  • id - Unique event ID (timestamp)
  • message - Human-readable event message
  • type - Event type (alert, info, warning)
  • severity - Alert severity (critical, warning, info)
  • timestamp - ISO 8601 timestamp

AI Insights

Get AI Analysis

GET /api/insights

Description: Get AI analysis from Kestra workflow executions

Query Parameters:

  • limit (optional): Number of insights to return (default: 20, max: 100)

Response:

{
  "insights": [
    {
      "id": "exec-001",
      "analysis": "DEGRADED: Payment service at 85% CPU. Recommend scaling workers.",
      "metrics": {
        "auth": { "code": 200 },
        "payment": { "code": 200 },
        "notification": { "code": 200 }
      },
      "summary": "System health at 98%. Minor CPU spike on payment service.",
      "timestamp": "2026-02-01T10:30:00Z"
    }
  ]
}

Fields:

  • id - Execution ID from Kestra
  • analysis - Raw AI analysis text
  • metrics - Service metrics snapshot
  • summary - Concise summary
  • timestamp - When analysis was generated

Webhooks

Kestra Webhook

POST /api/kestra-webhook
Content-Type: application/json

Description: Receive status updates from Kestra workflow

Request Body:

{
  "aiReport": "HEALTHY: All services operating normally",
  "metrics": {
    "auth": { "code": 200 },
    "payment": { "code": 200 },
    "notification": { "code": 200 }
  }
}

Response:

{
  "success": true
}

Used By: Kestra workflow at kestra-flows/intelligent-monitor.yaml


🔄 Data Models

Service Status

interface ServiceStatus {
  status: "healthy" | "degraded" | "critical" | "unknown";
  code: number;  // HTTP status code
  lastUpdated: Date;
}

Activity Event

interface ActivityEvent {
  id: number;
  message: string;
  type: "alert" | "info" | "warning";
  severity: "critical" | "warning" | "info";
  timestamp: Date;
}

AI Insight

interface AIInsight {
  id: string;
  analysis: string;  // Raw AI response
  metrics: Record<string, { code: number }>;
  summary: string;
  timestamp: Date;
}

📋 Status Codes

Code Meaning Example
200 Service healthy API responds normally
429 Rate limited Too many requests
500 Server error Database connection failed
503 Service unavailable Maintenance or crash

🔄 Response Examples

All Services Healthy

{
  "services": {
    "auth": { "status": "healthy", "code": 200 },
    "payment": { "status": "healthy", "code": 200 },
    "notification": { "status": "healthy", "code": 200 }
  },
  "aiAnalysis": "HEALTHY: All systems operational",
  "lastUpdated": "2026-02-01T10:35:00Z"
}

Partial Outage

{
  "services": {
    "auth": { "status": "healthy", "code": 200 },
    "payment": { "status": "critical", "code": 500 },
    "notification": { "status": "healthy", "code": 200 }
  },
  "aiAnalysis": "DEGRADED: Payment service down. Initiating auto-healing...",
  "lastUpdated": "2026-02-01T10:36:00Z"
}

🧪 Testing with curl

Check system status

curl http://localhost:4000/api/status

Get activity log

curl http://localhost:4000/api/activity

Get AI insights

curl http://localhost:4000/api/insights

Test webhook (simulate Kestra)

curl -X POST http://localhost:4000/api/kestra-webhook \
  -H "Content-Type: application/json" \
  -d '{
    "aiReport": "TEST: Webhook integration working",
    "metrics": {"auth": {"code": 200}}
  }'

🔐 Security Notes

Current Implementation:

  • CORS enabled (all origins)
  • No authentication required
  • No rate limiting

Production Recommendations:

  • Implement JWT authentication
  • Add rate limiting (100 req/min per IP)
  • Restrict CORS to known origins
  • Use HTTPS only
  • Add request signing

📈 Rate Limiting (Future)

Tier: Free
- 100 requests/minute
- 10,000 requests/day
- 30-day data retention

Tier: Pro
- 1,000 requests/minute
- 100,000 requests/day
- 1-year data retention

🚀 Usage Examples

Get latest incident

const response = await fetch('http://localhost:4000/api/status');
const data = await response.json();
const worstService = Object.entries(data.services)
  .sort((a, b) => b[1].code - a[1].code)[0];
console.log(`Worst service: ${worstService[0]}`);

Monitor for changes

let lastStatus = null;

setInterval(async () => {
  const response = await fetch('http://localhost:4000/api/status');
  const data = await response.json();
  
  if (JSON.stringify(data) !== JSON.stringify(lastStatus)) {
    console.log('Status changed:', data);
    lastStatus = data;
  }
}, 5000); // Poll every 5 seconds

Trigger alert on critical

const response = await fetch('http://localhost:4000/api/status');
const data = await response.json();

const critical = Object.values(data.services)
  .some(s => s.status === 'critical');

if (critical) {
  // Send alert
  console.alert('CRITICAL: Service down!');
}

📚 Related Documentation


Last Updated: February 1, 2026 API Version: 1.0.0