Multi-Agent Incident Response — Semantic Kernel (C#)

Companion code for Agents assemble. One agent is a hire. Many agents are a workforce. The Pragmatic Architect newsletter edition. A reference implementation of the six canonical multi-agent patterns, applied to autonomous incident response on a SaaS platform.

Why this use case?

Incident response is the highest-leverage agentic workload in tech right now. It is short-lived, tool-heavy, requires coordination across many specialists, and has a non-negotiable human-in-the-loop gate before anything writes to production. If your multi-agent framework can handle this, it can handle the rest.

What's in here

File	Pattern	Role
`Program.cs`	—	Entry point that wires the full pipeline
`Agents/TriageAgent.cs`	Sequential	Classifies the alert, sets severity
`Agents/LogAgent.cs`	Concurrent	Reads logs (Loki/CloudWatch tool)
`Agents/MetricsAgent.cs`	Concurrent	Reads Prometheus metrics
`Agents/KbSearchAgent.cs`	Concurrent	Searches runbooks + past incidents
`Agents/DiagnosticAgent.cs`	Group Chat	Argues for a root cause
`Agents/KnowledgeAgent.cs`	Group Chat	Counter-argues from prior art
`Agents/LeadAgent.cs`	Group Chat	Decides when debate is over
`Agents/RemediationAgents.cs`	Handoff	DB / Network / App specialists
`Agents/CommsAgent.cs`	Sequential (final)	Drafts status-page update
`Orchestration/IncidentOrchestrator.cs`	Magnetic / Hierarchical	Top-level orchestrator
`Plugins/ObservabilityPlugin.cs`	—	Mock tools for logs/metrics/traces
`Plugins/RemediationPlugin.cs`	—	Mock tools for kubectl/db ops
`HumanGate/ApprovalGate.cs`	—	Human-in-the-loop approval

Patterns covered

Sequential — Triage → Investigate → Debate → Remediate → Announce
Concurrent — LogAgent ‖ MetricsAgent ‖ KbSearchAgent
Group Chat — DiagnosticAgent ⇌ KnowledgeAgent refereed by LeadAgent
Handoff — Remediation routes to DB / Network / App specialist by symptom
Magnetic / Orchestrator-Worker — IncidentOrchestrator re-plans on tool failure
Hierarchical — Top orchestrator owns a team-of-teams; sub-managers expose summaries

Prerequisites

dotnet --version    # 8.0 or later

appsettings.json:

{
  "AzureOpenAI": {
    "Endpoint": "https://YOUR-RESOURCE.openai.azure.com/",
    "DeploymentName": "gpt-4o",
    "ApiKey": "..."
  }
}

Run

dotnet restore
dotnet run -- ./samples/alert-db-cpu-spike.json

You'll see a streamed transcript of every agent's reasoning, the parallel investigation results, the root-cause debate, and an approval prompt before any remediation tool fires.

Production checklist

Token budget per stage (hard caps)
Tool allow-lists per agent (read-only by default)
OpenTelemetry tracing on every invocation
Nightly eval harness against historical incidents
Feature-flagged kill switch ("advise-only" mode)
Per-incident and per-day cost guardrails

License

MIT — see LICENSE. This is reference code; harden before production use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Agent Incident Response — Semantic Kernel (C#)

Why this use case?

What's in here

Patterns covered

Prerequisites

Run

Production checklist

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Agents		Agents
HumanGate		HumanGate
Orchestration		Orchestration
Plugins		Plugins
samples		samples
.gitignore		.gitignore
IncidentResponse.csproj		IncidentResponse.csproj
Program.cs		Program.cs
README.md		README.md
appsettings.json		appsettings.json
code-examples.sln		code-examples.sln

Folders and files

Latest commit

History

Repository files navigation

Multi-Agent Incident Response — Semantic Kernel (C#)

Why this use case?

What's in here

Patterns covered

Prerequisites

Run

Production checklist

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages