You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ship a customer-facing trace-validator skill that learns correct deployment behavior from a small number of golden execution traces and structurally validates all future runs against that model. Based on dominator analysis from arXiv:2605.03159.
Today this is enforced purely by agent instructions and post-hoc integration tests. If the agent skips a step (or a prompt regression causes it to), there's no structural safety net.
The Research
The paper demonstrates that dominator analysis — borrowed from compiler control-flow theory — cleanly separates essential states from optional variations:
"A state d dominates state s if every path from the initial state to s must pass through d. [...] Loading screens do not dominate anything because they are optional."
Applied to git-ape: security_gate_evaluated dominates deployment_executed (you can't deploy without passing the gate), but cost_estimated does not (cost estimation is advisory).
The algorithm needs only 2–10 passing traces to learn this structure automatically:
"Using a model built from only three passing traces, the system achieved 100% accuracy in detecting product bugs and successfully identified false successes while tolerating valid UI variations."
Value to Customers
Regulated industries: Machine-verifiable proof that compliance gates were executed
Platform teams: Structural guardrails on what Copilot agents can skip
CI/CD pipelines: PR comments showing "5/5 essential states hit" or "⚠️ missing: security_gate_evaluated"
Multi-team orgs: Golden baselines per environment (prod requires all gates; dev allows shortcuts)
Design
Skill Interface
Name: trace-validator
Triggers: "validate deployment trace", "check if run was complete", post-deployment
Input: deployment-id (or path to trace.jsonl)
Output: PASS/FAIL with coverage metrics, matched/missing states, explanation
Algorithm (3 Phases from the Paper)
Phase 1: Build Prefix Tree Acceptors
Each golden trace (trace.jsonl) becomes a PTA — a directed graph where nodes = states, edges = transitions.
.github/skills/trace-validator/
├── SKILL.md # Skill definition (frontmatter + instructions)
├── scripts/
│ ├── build-model.sh # Ingest golden traces → generate dominator-model.json
│ └── validate-trace.sh # Validate a trace.jsonl against the model
└── references/
└── algorithm.md # Dominator extraction explained for the agent
The core algorithm (build-model and validate-trace) should be implemented as a Node.js script (matching the repo's existing scripts/ tooling) or as a standalone shell-based implementation using jq for JSON processing.
Integration with Deploy Workflow
In git-ape-deploy.yml (or .exampleyml), add a post-deployment step:
Summary
Ship a customer-facing
trace-validatorskill that learns correct deployment behavior from a small number of golden execution traces and structurally validates all future runs against that model. Based on dominator analysis from arXiv:2605.03159.Depends on: #148 (structured trace capture)
Why This Matters
The Problem
Git-ape orchestrates complex multi-stage deployment workflows. Customers need confidence that:
Today this is enforced purely by agent instructions and post-hoc integration tests. If the agent skips a step (or a prompt regression causes it to), there's no structural safety net.
The Research
The paper demonstrates that dominator analysis — borrowed from compiler control-flow theory — cleanly separates essential states from optional variations:
Applied to git-ape:
security_gate_evaluateddominatesdeployment_executed(you can't deploy without passing the gate), butcost_estimateddoes not (cost estimation is advisory).The algorithm needs only 2–10 passing traces to learn this structure automatically:
Value to Customers
Design
Skill Interface
Algorithm (3 Phases from the Paper)
Phase 1: Build Prefix Tree Acceptors
Each golden trace (
trace.jsonl) becomes a PTA — a directed graph where nodes = states, edges = transitions.Phase 2: Merge + Dominator Extraction
statefield for structured traces — no LLM needed for JSON states)Result for the example above:
Phase 3: Validate New Traces
Topological subsequence matching:
trace.jsonlState Equivalence Strategy
For git-ape traces (structured JSON, not screenshots), we use a simplified tiered approach:
statefieldThis is much simpler than the paper's visual equivalence (perceptual hash → SSIM → LLM) because our traces are structured.
File Layout
Dominator Model Schema
Skill Implementation
The core algorithm (
build-modelandvalidate-trace) should be implemented as a Node.js script (matching the repo's existingscripts/tooling) or as a standalone shell-based implementation usingjqfor JSON processing.Integration with Deploy Workflow
In
git-ape-deploy.yml(or.exampleyml), add a post-deployment step:PR Comment Output
Or on failure:
Customer Workflow
trace-validator --build --traces .azure/baselines/default/golden-traces/Acceptance Criteria
SKILL.mdcreated with correct frontmatter, triggers, and instructionsbuild-modelscript: ingests N trace files → producesdominator-model.jsonvalidate-tracescript: validates a trace against a model → outputs PASS/FAIL + coveragedominator-model.jsonin.github/schemas/.exampleyml).github/evals/trace-validator/website/docs/skills/trace-validator.mdNon-Goals (This Issue)
References
.github/agents/git-ape.agent.md.github/skills/prereq-check/SKILL.md