test (layer 1): /e2e staging-repo dispatcher for nightly + on-demand end-to-end runs

## Problem

Real end-to-end testing of the SDD pipeline today requires running the full workflow against a consumer repository: open an issue, watch `sdd-spec` run, merge a PR, watch `sdd-triage` run, etc. Done manually on a real repo, this is expensive (LLM tokens + Actions minutes) and slow (a single feature spans hours). There is no automated cadence that exercises the pipeline end-to-end before changes ship.

## Desired behavior

A dedicated `spectacles-staging` repository, owned by the same org, dedicated as the E2E target. A new workflow in spectacles, `.github/workflows/e2e-dispatch.md`, triggers on:

- `/e2e` comment on a tracking issue or PR in the spectacles repo, from a write-access author. Optional scenario name argument selects which fixture to run.
- Nightly `schedule:` at a low-traffic hour, default scenario.

The workflow:

1. Resets the staging repo to a known commit (the lock files from the spectacles PR being tested or `main`).
2. Compiles and pushes the current spectacles workflows to the staging repo via `gh aw deploy` (or equivalent).
3. Opens a synthetic tracking issue in the staging repo from a fixture file under `tests/fixtures/e2e/<scenario>/issue.md`. Bodies are deliberately tiny (~30 tokens) to bound cost.
4. Polls the staging repo for lifecycle progress with a configurable timeout (default 30 minutes). Lifecycle transitions are observed via the `sdd:*` labels.
5. Asserts on terminal state: tracking issue reaches `sdd:done`, the expected artifact files exist (`docs/specs/.../*.md`, `decisions/*.md` if the scenario expects an ADR), one implementation PR was opened and merged.
6. Posts a summary comment back on the originating spectacles PR / a nightly issue: scenario, duration, token cost (from the lock file metadata), final state, any assertion failures.
7. Tags the staging repo state for post-mortem if assertions fail; cleans up on success.

## Scenarios

Initial fixture set, one fixture per scenario:

- `happy-path-feature` — the canonical flow: open feature → spec → architecture → plan → execute → merge → done. The smallest possible feature body that still produces meaningful artifacts.
- `happy-path-bug` — same flow from the `kind:bug` template.
- `revise-loop-spec` — open feature → spec PR → `/revise <note>` → spec PR updated → merge → continues.
- `needs-human-handoff` — feature body deliberately ambiguous, asserts the agent escalates `needs-human` and the workflow halts without erroring.

Each scenario is one LLM run end-to-end and costs a few dollars. Total nightly cost is bounded by the scenario count.

## Implementation

- `tests/fixtures/e2e/<scenario>/{issue.md,expectations.yml}`. Issue body and the assertion contract.
- `.github/workflows/e2e-dispatch.md` (+ lock). A non-agentic workflow (pure GitHub Actions, no gh-aw agent step) that orchestrates the scenario.
- A small `scripts/e2e-assert.py` that walks the staging repo's state via the GitHub API and checks `expectations.yml`.
- One-time setup script in `scripts/e2e-setup-staging.sh` to provision the staging repo with the required secrets, branch protections-off, etc. Documented but not run by CI.

## Cost ceilings

- Default scenario timeout: 30 minutes per scenario.
- Default `max-parallel` across scenarios: 2 (a nightly run executes scenarios serially with no fan-out unless explicitly raised).
- Token budget: each fixture body is ≤ 50 tokens; the agent's own context is bounded by its existing limits. A typical scenario costs less than $5 in LLM tokens.
- Off switch: a `SPECTACLES_E2E_DISABLED` repo variable shuts down the nightly schedule without a workflow edit.

## Acceptance

- `/e2e happy-path-feature` from a write-access author on a spectacles PR posts back a summary comment within 30 minutes naming the staging-repo run, the final state (`sdd:done`), and a token-cost estimate.
- A spectacles PR that breaks `sdd-spec`'s spec-file-write step fails the nightly run; the failure comment names the broken phase and links the staging-repo logs.
- The needs-human scenario succeeds when the agent applies the label and stops; it fails if the agent powers through without escalating.
- Cost telemetry: each run's cost (Actions minutes + estimated tokens) is posted in the summary; nightly budget is enforceable by reading the recent runs.

## Out of scope

- Cross-repo scenarios (the `repo:` seam is unexercised here).
- Performance / load testing.
- Scenarios that require human merge decisions (every scenario is autonomous from open through done; revise loops are simulated by a workflow step that comments `/revise` on schedule).

## References

- Issue #79 (plan-comment-before-tree), #81 (dispatch cascade), #82 (fast-path) — these add scenarios that should be added to the fixture set as they land.
- `mcp-smoke.lock.yml` — pattern for a workflow that talks to external services from CI.
- The Layer-0 issues filed alongside this one are this Layer 1's prerequisites; the cheap checks should be green before the expensive checks run.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test (layer 1): /e2e staging-repo dispatcher for nightly + on-demand end-to-end runs #89

Problem

Desired behavior

Scenarios

Implementation

Cost ceilings

Acceptance

Out of scope

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

test (layer 1): /e2e staging-repo dispatcher for nightly + on-demand end-to-end runs #89

Description

Problem

Desired behavior

Scenarios

Implementation

Cost ceilings

Acceptance

Out of scope

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions