feat: Dominator-tree validation skill for deployment workflows

## Summary

Ship a customer-facing **`trace-validator`** skill that learns correct deployment behavior from a small number of golden execution traces and structurally validates all future runs against that model. Based on dominator analysis from [arXiv:2605.03159](https://arxiv.org/abs/2605.03159).

**Depends on:** #148 (structured trace capture)

## Why This Matters

### The Problem

Git-ape orchestrates complex multi-stage deployment workflows. Customers need confidence that:
1. Every mandatory checkpoint was actually executed (not just claimed by the agent)
2. The workflow followed a valid path through the required stages
3. Acceptable variations (optional cost estimation, architecture review) don't trigger false alarms

Today this is enforced purely by agent instructions and post-hoc integration tests. If the agent skips a step (or a prompt regression causes it to), there's no structural safety net.

### The Research

The paper demonstrates that **dominator analysis** — borrowed from compiler control-flow theory — cleanly separates essential states from optional variations:

> *"A state d dominates state s if every path from the initial state to s must pass through d. [...] Loading screens do not dominate anything because they are optional."*

Applied to git-ape: `security_gate_evaluated` dominates `deployment_executed` (you can't deploy without passing the gate), but `cost_estimated` does not (cost estimation is advisory).

The algorithm needs only **2–10 passing traces** to learn this structure automatically:

> *"Using a model built from only three passing traces, the system achieved 100% accuracy in detecting product bugs and successfully identified false successes while tolerating valid UI variations."*

### Value to Customers

- **Regulated industries:** Machine-verifiable proof that compliance gates were executed
- **Platform teams:** Structural guardrails on what Copilot agents can skip
- **CI/CD pipelines:** PR comments showing "5/5 essential states hit" or "⚠️ missing: security_gate_evaluated"
- **Multi-team orgs:** Golden baselines per environment (prod requires all gates; dev allows shortcuts)

## Design

### Skill Interface

```
Name: trace-validator
Triggers: "validate deployment trace", "check if run was complete", post-deployment
Input: deployment-id (or path to trace.jsonl)
Output: PASS/FAIL with coverage metrics, matched/missing states, explanation
```

### Algorithm (3 Phases from the Paper)

#### Phase 1: Build Prefix Tree Acceptors

Each golden trace (`trace.jsonl`) becomes a PTA — a directed graph where nodes = states, edges = transitions.

```
Golden trace 1: requirements → api_ref → template → security_gate → preflight → confirm → deploy → integration_tests
Golden trace 2: requirements → api_ref → template → security_gate → cost_est → preflight → confirm → deploy → integration_tests
Golden trace 3: requirements → api_ref → template → security_gate → policy → preflight → confirm → deploy → integration_tests
```

#### Phase 2: Merge + Dominator Extraction

1. **Merge PTAs** into a unified graph using state equivalence (exact match on `state` field for structured traces — no LLM needed for JSON states)
2. **Compute dominators** using Lengauer-Tarjan algorithm
3. **Extract dominator tree** by tracing back from terminal states

Result for the example above:
```
Dominator tree (essential states):
  requirements_gathered
    → api_reference_lookup
      → template_generated
        → security_gate_evaluated
          → preflight_validated
            → user_confirmation
              → deployment_executed
                → integration_tests_passed

Optional (not in dominator tree):
  cost_estimated, policy_assessed, architecture_reviewed
```

#### Phase 3: Validate New Traces

Topological subsequence matching:
- Extract state sequence from the new `trace.jsonl`
- Check that all dominator-tree states appear in the correct topological order
- Extra states (optional) are allowed between essential states
- Compute coverage = matched / total_essential × 100%

### State Equivalence Strategy

For git-ape traces (structured JSON, not screenshots), we use a simplified tiered approach:

| Tier | Method | When |
|------|--------|------|
| 1 | Exact match on `state` field | Default — handles 95% of cases |
| 2 | Normalized match (ignore timestamp, meta variations) | Same state ID with different metadata |
| 3 | LLM semantic equivalence | Future: for free-form trace states from other agent systems |

This is much simpler than the paper's visual equivalence (perceptual hash → SSIM → LLM) because our traces are structured.

### File Layout

```
.azure/baselines/
├── default/                          # Default baseline (all environments)
│   ├── golden-traces/
│   │   ├── trace-001.jsonl
│   │   ├── trace-002.jsonl
│   │   └── trace-003.jsonl
│   └── dominator-model.json          # Auto-generated from golden traces
├── prod/                             # Stricter baseline for prod
│   ├── golden-traces/
│   │   └── ...
│   └── dominator-model.json
```

### Dominator Model Schema

```jsonc
// .azure/baselines/default/dominator-model.json
{
  "version": "1.0",
  "generated_from": ["trace-001.jsonl", "trace-002.jsonl", "trace-003.jsonl"],
  "generated_at": "2025-06-01T09:00:00Z",
  "essential_states": [
    { "id": "requirements_gathered", "stage": 1, "dominates": ["api_reference_lookup"] },
    { "id": "api_reference_lookup", "stage": 2, "dominates": ["template_generated"] },
    { "id": "template_generated", "stage": 2, "dominates": ["security_gate_evaluated"] },
    { "id": "security_gate_evaluated", "stage": 2.5, "dominates": ["preflight_validated"] },
    { "id": "preflight_validated", "stage": 2, "dominates": ["user_confirmation"] },
    { "id": "user_confirmation", "stage": 3, "dominates": ["deployment_executed"] },
    { "id": "deployment_executed", "stage": 3, "dominates": ["integration_tests_passed"] },
    { "id": "integration_tests_passed", "stage": 4, "dominates": [] }
  ],
  "optional_states": ["cost_estimated", "policy_assessed", "architecture_reviewed", "drift_checked"],
  "coverage_threshold": 100
}
```

### Skill Implementation

```
.github/skills/trace-validator/
├── SKILL.md                    # Skill definition (frontmatter + instructions)
├── scripts/
│   ├── build-model.sh          # Ingest golden traces → generate dominator-model.json
│   └── validate-trace.sh       # Validate a trace.jsonl against the model
└── references/
    └── algorithm.md            # Dominator extraction explained for the agent
```

The core algorithm (`build-model` and `validate-trace`) should be implemented as a **Node.js script** (matching the repo's existing `scripts/` tooling) or as a standalone shell-based implementation using `jq` for JSON processing.

### Integration with Deploy Workflow

In `git-ape-deploy.yml` (or `.exampleyml`), add a post-deployment step:

```yaml
- name: Validate execution trace
  if: hashFiles(format(.azure/deployments/{0}/trace.jsonl, env.DEPLOYMENT_ID)) != 
  run: |
    node .github/skills/trace-validator/scripts/validate-trace.js \
      --trace ".azure/deployments/${{ env.DEPLOYMENT_ID }}/trace.jsonl" \
      --model ".azure/baselines/default/dominator-model.json" \
      --threshold 100
```

### PR Comment Output

```markdown
## 🔍 Trace Validation

**Status:** 🟢 PASSED (7/7 essential states matched)

| # | Essential State | Status | Timestamp |
|---|----------------|:------:|-----------|
| 1 | requirements_gathered | ✅ | 08:30:00 |
| 2 | api_reference_lookup | ✅ | 08:31:12 |
| 3 | template_generated | ✅ | 08:32:45 |
| 4 | security_gate_evaluated | ✅ | 08:33:10 |
| 5 | preflight_validated | ✅ | 08:34:00 |
| 6 | deployment_executed | ✅ | 08:36:30 |
| 7 | integration_tests_passed | ✅ | 08:37:15 |

**Coverage:** 100% | **Optional states observed:** cost_estimated, policy_assessed
**Model:** `.azure/baselines/default/dominator-model.json` (built from 3 traces)
```

Or on failure:

```markdown
## 🔍 Trace Validation

**Status:** 🔴 FAILED (5/7 essential states matched)

| # | Essential State | Status |
|---|----------------|:------:|
| 1 | requirements_gathered | ✅ |
| 2 | api_reference_lookup | ✅ |
| 3 | template_generated | ✅ |
| 4 | security_gate_evaluated | ❌ MISSING |
| 5 | preflight_validated | ❌ MISSING |
| 6 | deployment_executed | ✅ |
| 7 | integration_tests_passed | ✅ |

⚠️ **The agent skipped the security gate and preflight validation.**
This deployment may not meet security requirements.

**Coverage:** 71% (below 100% threshold)
```

### Customer Workflow

1. **Bootstrap:** Run 3–5 deployments successfully → traces auto-captured (#148)
2. **Build model:** `trace-validator --build --traces .azure/baselines/default/golden-traces/`
3. **Validate ongoing:** Every deployment auto-validated in CI, or invoke interactively with "validate my last deployment trace"

## Acceptance Criteria

- [ ] `SKILL.md` created with correct frontmatter, triggers, and instructions
- [ ] `build-model` script: ingests N trace files → produces `dominator-model.json`
- [ ] `validate-trace` script: validates a trace against a model → outputs PASS/FAIL + coverage
- [ ] Schema file for `dominator-model.json` in `.github/schemas/`
- [ ] Integration example in deploy workflow (`.exampleyml`)
- [ ] Fixture: 3 golden traces + expected model in `.github/evals/trace-validator/`
- [ ] Docs page: `website/docs/skills/trace-validator.md`
- [ ] Works in both interactive (skill invocation) and CI (workflow step) modes

## Non-Goals (This Issue)

- LLM-based semantic state equivalence (Tier 3 — future, for non-git-ape trace formats)
- Automatic trace capture (#148 handles this)
- Eval grader integration (separate issue)
- Multi-environment baseline management UI

## References

- Paper: [arXiv:2605.03159](https://arxiv.org/abs/2605.03159) — Sections 3.2 (Merge + Dominator Extraction) and 3.3 (Validation)
- Dominator algorithm: Lengauer & Tarjan (1979) — "A fast algorithm for finding dominators in a flowgraph"
- Dependency: #148 (trace capture)
- Agent workflow stages: `.github/agents/git-ape.agent.md`
- Existing skill pattern: `.github/skills/prereq-check/SKILL.md`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Dominator-tree validation skill for deployment workflows #149

Summary

Why This Matters

The Problem

The Research

Value to Customers

Design

Skill Interface

Algorithm (3 Phases from the Paper)

Phase 1: Build Prefix Tree Acceptors

Phase 2: Merge + Dominator Extraction

Phase 3: Validate New Traces

State Equivalence Strategy

File Layout

Dominator Model Schema

Skill Implementation

Integration with Deploy Workflow

PR Comment Output

Customer Workflow

Acceptance Criteria

Non-Goals (This Issue)

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Tier	Method	When
1	Exact match on `state` field	Default — handles 95% of cases
2	Normalized match (ignore timestamp, meta variations)	Same state ID with different metadata
3	LLM semantic equivalence	Future: for free-form trace states from other agent systems

feat: Dominator-tree validation skill for deployment workflows #149

Description

Summary

Why This Matters

The Problem

The Research

Value to Customers

Design

Skill Interface

Algorithm (3 Phases from the Paper)

Phase 1: Build Prefix Tree Acceptors

Phase 2: Merge + Dominator Extraction

Phase 3: Validate New Traces

State Equivalence Strategy

File Layout

Dominator Model Schema

Skill Implementation

Integration with Deploy Workflow

PR Comment Output

Customer Workflow

Acceptance Criteria

Non-Goals (This Issue)

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions