Skip to content

test (layer 2): release-tag full E2E fixture run against both spec fixtures #90

@norrietaylor

Description

@norrietaylor

Problem

The repo ships two reference specs: docs/specs/01-spec-issue-native-sdd/ and docs/specs/02-spec-chore-suite/. They are the canonical fixtures — large enough to exercise every agent in the pipeline and every command in the vocabulary. Today they are reviewed by humans on every change but never re-run against a fresh repo as a release gate. A regression in a less-trafficked path (cross-repo task routing, ADR creation, needs-human escalation under load) ships in a tag without being noticed until a consumer hits it.

Desired behavior

On every git tag matching v*.*.*, run both reference specs end-to-end against fresh staging repos in parallel and gate the tag release on their outcomes. Specifically:

  • Trigger: push: on tag pattern v*.*.*.
  • For each of 01-spec-issue-native-sdd and 02-spec-chore-suite, spin up a fresh staging repo (named spectacles-release-<tag>-<spec>), deploy the tagged spectacles workflows into it, file the spec's tracking issue body, and let the pipeline run from sdd:spec to sdd:done.
  • Generous timeout: 4 hours per scenario, since the full pipeline includes architecture phase + plan phase + N task executions + validation + review.
  • Assert on artifact tree: spec file matches the source, architecture file exists, every task sub-issue is closed with a merged implementation PR, the tracking issue lands sdd:done.
  • On failure: tag the staging repo for post-mortem, open an issue in spectacles with the failure summary, do not block the tag itself (the release artifact is the lock files; the test is a quality signal that lands as a release-notes comment).
  • On success: post a release note linking the two staging repos as the receipts.

Implementation

  • Builds on the Layer-1 /e2e dispatcher. Layer 2 is "Layer 1 with longer timeouts, larger scenarios, and a tag trigger."
  • New workflow .github/workflows/e2e-release.md (+ lock).
  • The two reference specs already exist as deliverables; the test reads them as fixtures rather than as documentation.
  • Token cost: ~$50–$100 per full run, executed once per tag. Acceptable for a release gate.

Acceptance

  • Tagging v1.2.3 fires two parallel E2E runs; within 4 hours both complete or report a failure.
  • A regression that breaks sdd-triage phase B fixture is caught at tag time and surfaces an automatic issue.
  • Successful runs leave two staging repos tagged with the release for human inspection.
  • The release tag is created regardless of E2E outcome (release isn't blocked, but signal lands).

Out of scope

  • Pre-tag preview runs on release-candidate branches (use Layer 1 /e2e for those).
  • Automated rollback if E2E fails (manual decision; the release exists as the artifact).
  • Fixtures other than the existing two reference specs (a third or fourth scenario is a future change).

References

  • docs/specs/01-spec-issue-native-sdd/01-spec-issue-native-sdd.md
  • docs/specs/02-spec-chore-suite/02-spec-chore-suite.md
  • Layer 1 issue (the /e2e dispatcher this builds on) — must land first

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions