Skip to content

RLASAF12/agent-canary

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AgentCanary 🐦

Behavioral smoke tests for deployed AI agents β€” like a canary in the coal mine for your AI endpoints.

Live Demo Built with Supabase Edge Functions

What it does

AgentCanary runs scheduled behavioral probes against your AI agent endpoints. Every 15 minutes it fires probe questions at your deployed agents, compares responses to expected keywords, and alerts you the moment behavior drifts β€” before your users notice.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    every 15 min    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  pg_cron    β”‚ ─────────────────► β”‚  probe-runner     β”‚
β”‚ (scheduler) β”‚                    β”‚  (Deno Edge Fn)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                             β”‚ POST probe questions
                                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                   β”‚  Your AI Agent    β”‚
                                   β”‚  endpoint_url     β”‚
                                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                             β”‚ response
                                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                   β”‚  keyword check     β”‚
                                   β”‚  pass / drift /    β”‚
                                   β”‚  error             β”‚
                                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                             β”‚
                                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                   β”‚  Supabase DB       β”‚
                                   β”‚  probe_runs +      β”‚
                                   β”‚  alerts table      β”‚
                                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Why it exists

Silent AI agent failure is a real production problem:

  • 52% accuracy decline observed over 4 months in a published study of deployed LLMs
  • A fintech team lost 12% conversion before detecting drift in their chat agent
  • Standard uptime monitors (200 OK) miss behavioral regressions entirely

AgentCanary closes that gap.

Features

  • πŸ”„ Scheduled probing β€” pg_cron fires Supabase Edge Function every 15 min
  • πŸ” Keyword baseline matching β€” define expected keywords per probe question
  • ⚠️ Drift detection β€” flags when responses stop containing expected patterns
  • 🌐 Live dashboard β€” single HTML file, deployable to GitHub Pages
  • πŸ”” Webhook alerts β€” optional outbound webhook on drift/error
  • πŸ“Š Pass rate tracking β€” per-canary health metrics over time

File structure

agent-canary/
β”œβ”€β”€ index.html                  # Single-file dashboard (Tailwind + Supabase JS CDN)
β”œβ”€β”€ supabase/
β”‚   β”œβ”€β”€ migrations/
β”‚   β”‚   └── 20260612_initial_schema.sql   # Full DB schema
β”‚   └── functions/
β”‚       └── probe-runner/
β”‚           └── index.ts                   # Deno Edge Function
└── README.md

Quick start

1. Create a Supabase project

# Or use the Supabase dashboard at supabase.com

2. Apply the schema

Run supabase/migrations/20260612_initial_schema.sql in the SQL editor.

3. Deploy the Edge Function

supabase functions deploy probe-runner --no-verify-jwt

4. Add a canary

INSERT INTO canaries (name, endpoint_url) VALUES (
  'My GPT-4 Agent',
  'https://api.openai.com/v1/chat/completions'
);

INSERT INTO probe_questions (canary_id, question, baseline_keywords)
VALUES (
  '<canary-id>',
  'What is 2+2?',
  ARRAY['4', 'four']
);

5. Open the dashboard

Update the Supabase URL + anon key in index.html, then open in browser or deploy to GitHub Pages.

Schema

Table Purpose
canaries Agent endpoints to monitor
probe_questions Questions + expected keywords per canary
probe_runs Every probe result (pass/drift/error)
alerts Drift/error events with optional webhook

Built by

RLASAF12 Β· Part of the ABC-TOM builder system.

About

🐦 Behavioral smoke tests for deployed AI agents β€” probes every 15 min, alerts on drift

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors