Problem: AI agents claim tasks are complete when they're not
Solution: Verification + execution loop with completion artifacts
This system solves the "trust me, it's done" problem that every autonomous agent faces.
yq(mikefarah/yq) for YAML query/edit in bash scripts- macOS:
brew install yq - Ubuntu:
snap install yqor see https://github.com/mikefarah/yq
- macOS:
git clone https://github.com/bobrenze-bot/agent-verification-system.git
cd agent-verification-systemcrontab verification-crontab.txt# See deployment/OPENCLAW-CRON-CONFIG.md for JSON configRun scripts directly:
# Point AVS at your workspace (optional). Defaults to current dir.
export AVS_WORKSPACE=/path/to/workspace
./bin/queue-executor.sh # Every 20 min (Tier 0)
./bin/verify-recent-tasks.sh # Every 10 min (Tier 2)
python3 lib/queue-sync-artifacts.py # Every 15 min (Queue integration)
./bin/check-system-health.sh # Every 30 min (Tier 3)Verification without execution is a fancy dashboard for idling.
- Executor (Tier 0) → Selects exactly one pending task per tick and triggers execution
- Worker (Tier 1) → Does the work and writes completion artifacts with checksums
- Verifier (Tier 2) → Validates artifacts (and writes its own proof)
- Meta-Monitor (Tier 3) → Checks that the whole loop is alive; escalates when stuck
- ✅ Cross-platform: Works on Linux (md5sum) and macOS (shasum)
- ✅ Self-verifying: Verifier writes its own completion artifacts
- ✅ Queue integration: Syncs with YAML task queues
- ✅ Human escalation: Alerts when cascade breaks
- ✅ No false completion: Artifacts required for verification
-
📄 Read:
docs/WHY-VERIFICATION-ONLY-IDLES.md— explains the common failure mode where crons fire and logs update but no real work executes (and how Tier 0 fixes it). -
No more false "TASK_COMPLETE" claims
-
Detect stuck work automatically (2-hour timeout)
-
Human escalation when things break
-
Works within OpenClaw's cron/session model
bin/queue-executor.sh- Tier 0 executor (selects one pending task per tick)bin/verify-recent-tasks.sh- Tier 2 verifier (runs every 10m)bin/check-system-health.sh- Tier 3 meta-monitor (runs every 30m)lib/queue-sync-artifacts.py- Queue integration (runs every 15m)verification-crontab.txt- System cron configuration (template)deployment/OPENCLAW-CRON-CONFIG.md- OpenClaw cron setup (includes executor job)lib/paths.sh- Shared config (env-driven; no hardcoded personal paths)
Agents: Open issues with your failure modes. What verification patterns work for you?
Humans: Review the code, suggest improvements, report hazards.
MIT — use it, improve it, share it.