Problem
The repo ships two reference specs: docs/specs/01-spec-issue-native-sdd/ and docs/specs/02-spec-chore-suite/. They are the canonical fixtures — large enough to exercise every agent in the pipeline and every command in the vocabulary. Today they are reviewed by humans on every change but never re-run against a fresh repo as a release gate. A regression in a less-trafficked path (cross-repo task routing, ADR creation, needs-human escalation under load) ships in a tag without being noticed until a consumer hits it.
Desired behavior
On every git tag matching v*.*.*, run both reference specs end-to-end against fresh staging repos in parallel and gate the tag release on their outcomes. Specifically:
- Trigger:
push: on tag pattern v*.*.*.
- For each of
01-spec-issue-native-sdd and 02-spec-chore-suite, spin up a fresh staging repo (named spectacles-release-<tag>-<spec>), deploy the tagged spectacles workflows into it, file the spec's tracking issue body, and let the pipeline run from sdd:spec to sdd:done.
- Generous timeout: 4 hours per scenario, since the full pipeline includes architecture phase + plan phase + N task executions + validation + review.
- Assert on artifact tree: spec file matches the source, architecture file exists, every task sub-issue is closed with a merged implementation PR, the tracking issue lands
sdd:done.
- On failure: tag the staging repo for post-mortem, open an issue in spectacles with the failure summary, do not block the tag itself (the release artifact is the lock files; the test is a quality signal that lands as a release-notes comment).
- On success: post a release note linking the two staging repos as the receipts.
Implementation
- Builds on the Layer-1
/e2e dispatcher. Layer 2 is "Layer 1 with longer timeouts, larger scenarios, and a tag trigger."
- New workflow
.github/workflows/e2e-release.md (+ lock).
- The two reference specs already exist as deliverables; the test reads them as fixtures rather than as documentation.
- Token cost: ~$50–$100 per full run, executed once per tag. Acceptable for a release gate.
Acceptance
- Tagging
v1.2.3 fires two parallel E2E runs; within 4 hours both complete or report a failure.
- A regression that breaks
sdd-triage phase B fixture is caught at tag time and surfaces an automatic issue.
- Successful runs leave two staging repos tagged with the release for human inspection.
- The release tag is created regardless of E2E outcome (release isn't blocked, but signal lands).
Out of scope
- Pre-tag preview runs on release-candidate branches (use Layer 1
/e2e for those).
- Automated rollback if E2E fails (manual decision; the release exists as the artifact).
- Fixtures other than the existing two reference specs (a third or fourth scenario is a future change).
References
docs/specs/01-spec-issue-native-sdd/01-spec-issue-native-sdd.md
docs/specs/02-spec-chore-suite/02-spec-chore-suite.md
- Layer 1 issue (the
/e2e dispatcher this builds on) — must land first
Problem
The repo ships two reference specs:
docs/specs/01-spec-issue-native-sdd/anddocs/specs/02-spec-chore-suite/. They are the canonical fixtures — large enough to exercise every agent in the pipeline and every command in the vocabulary. Today they are reviewed by humans on every change but never re-run against a fresh repo as a release gate. A regression in a less-trafficked path (cross-repo task routing, ADR creation, needs-human escalation under load) ships in a tag without being noticed until a consumer hits it.Desired behavior
On every git tag matching
v*.*.*, run both reference specs end-to-end against fresh staging repos in parallel and gate the tag release on their outcomes. Specifically:push:on tag patternv*.*.*.01-spec-issue-native-sddand02-spec-chore-suite, spin up a fresh staging repo (namedspectacles-release-<tag>-<spec>), deploy the tagged spectacles workflows into it, file the spec's tracking issue body, and let the pipeline run fromsdd:spectosdd:done.sdd:done.Implementation
/e2edispatcher. Layer 2 is "Layer 1 with longer timeouts, larger scenarios, and a tag trigger.".github/workflows/e2e-release.md(+ lock).Acceptance
v1.2.3fires two parallel E2E runs; within 4 hours both complete or report a failure.sdd-triagephase B fixture is caught at tag time and surfaces an automatic issue.Out of scope
/e2efor those).References
docs/specs/01-spec-issue-native-sdd/01-spec-issue-native-sdd.mddocs/specs/02-spec-chore-suite/02-spec-chore-suite.md/e2edispatcher this builds on) — must land first