ci: two-tier test lanes — hermetic always-on for forks; live agent-smoke label-gated on self-hosted (INV-75, closes #238)#256
Conversation
There was a problem hiding this comment.
Review reached a blocking FAIL verdict — see the Review findings: comment on issue #238 for the full list of blocking findings and remediation steps. This PR is sent back to development; reviewDecision is set to CHANGES_REQUESTED until the findings are addressed and a new review passes (INV-52).
…n -ffdx no longer deletes it ([P1] review, INV-75) PR #256 review [P1]: the `live-smoke` self-hosted job's `actions/checkout` defaults to `clean: true` (`git clean -ffdx`), which deletes the gitignored, machine-local `tests/e2e/e2e.conf` on the persistent self-hosted workspace before the run step — a labeled live run would die on `FATAL: matrix not found/readable` instead of proving the matrix. Fix: resolve the live matrix config OUTSIDE the checkout. The `live-smoke` job now reads it from the `RUNNER_SMOKE_CONF` repo variable, or a stable per-box default `$HOME/.config/autonomous-dev-team/e2e.conf` (resolved in shell where $HOME expands — `vars.*` only expands in the Actions template layer). The resolved path is exported as `SMOKE_CONF` to `$GITHUB_ENV` (the harness honors the SMOKE_CONF override), and a preflight step checks readability, emitting a loud `::error::` + provisioning pointer rather than the opaque harness FATAL when the operator has not provisioned it. - tests/unit/test-ci-two-tier-lanes.sh: TC-CI-TIERS-021/022/023 pin the out-of-checkout resolution, the $GITHUB_ENV export, and the readability preflight (023 scoped to the live-smoke job region). - docs/pipeline/invariants.md: INV-75 sub-point 4 (config-outside-checkout contract) + Tested-by update. - CONTRIBUTING.md: maintainer one-time-setup note; tests/e2e/e2e.conf.example: CI provisioning note. No behavior change to the hermetic tier or the gate logic.
There was a problem hiding this comment.
Review reached a blocking FAIL verdict — see the Review findings: comment on issue #238 for the full list of blocking findings and remediation steps. This PR is sent back to development; reviewDecision is set to CHANGES_REQUESTED until the findings are addressed and a new review passes (INV-52).
…n -ffdx no longer deletes it ([P1] review, INV-76) PR #256 review [P1]: the `live-smoke` self-hosted job's `actions/checkout` defaults to `clean: true` (`git clean -ffdx`), which deletes the gitignored, machine-local `tests/e2e/e2e.conf` on the persistent self-hosted workspace before the run step — a labeled live run would die on `FATAL: matrix not found/readable` instead of proving the matrix. Fix: resolve the live matrix config OUTSIDE the checkout. The `live-smoke` job now reads it from the `RUNNER_SMOKE_CONF` repo variable, or a stable per-box default `$HOME/.config/autonomous-dev-team/e2e.conf` (resolved in shell where $HOME expands — `vars.*` only expands in the Actions template layer). The resolved path is exported as `SMOKE_CONF` to `$GITHUB_ENV` (the harness honors the SMOKE_CONF override), and a preflight step checks readability, emitting a loud `::error::` + provisioning pointer rather than the opaque harness FATAL when the operator has not provisioned it. - tests/unit/test-ci-two-tier-lanes.sh: TC-CI-TIERS-021/022/023 pin the out-of-checkout resolution, the $GITHUB_ENV export, and the readability preflight (023 scoped to the live-smoke job region). - docs/pipeline/invariants.md: INV-76 sub-point 4 (config-outside-checkout contract) + Tested-by update. - CONTRIBUTING.md: maintainer one-time-setup note; tests/e2e/e2e.conf.example: CI provisioning note. No behavior change to the hermetic tier or the gate logic.
d76c4ef to
bff56d2
Compare
…oke label-gated on self-hosted (INV-77, closes #238) Split ci.yml into two explicit tiers: - Tier 1 (hermetic): hermetic-unit + hermetic-shellcheck jobs on ubuntu-latest, credential-free — unit tests, adapter conformance (INV-74), and the stub-mode smoke/metrics/error-envelope self-tests. A fork PR gets a fully green, fully meaningful CI with no agent-CLI auth. These are the merge-required checks. - Tier 2 (live): the #222 live agent-smoke matrix (real CLIs) in a new `live-smoke` job. Gated by `github.event.label.name == 'run-live-smoke'` (pull_request labeled) OR push to main, targeting the self-hosted pool via the RUNNER_LABEL ternary. Advisory (non-required): UNAVAILABLE (quota) is non-blocking per #222's rc contract; SMOKE evidence goes to the job summary. Security: a fork PR with no label NEVER schedules live-smoke (the `if:` has no unconditional branch) — applying the maintainer-only label is the authorization act. Uses plain pull_request, NEVER pull_request_target (would run untrusted head with base secrets). Threat-model note inline. - setup-labels.sh: add the `run-live-smoke` gate label (day-one bootstrap). - hermetic-shellcheck adds an actionlint step (pull_request_target / syntax foot-gun lint) alongside the existing shellcheck. - tests/unit/test-ci-two-tier-lanes.sh: pyyaml structural truth-table test (TC-CI-TIERS-010..051) — hermetic=ubuntu-latest+credential-free, the label-OR-push gate, no pull_request_target, labeled type, self-hosted runs-on, job summary, and the label entry. - docs/pipeline/invariants.md: INV-77 (two-tier CI contract). - CONTRIBUTING.md "What CI runs on your PR"; conformance README cross-link. Pipeline-docs-gate: setup-labels.sh is a watched script → invariants.md updated in the same PR (INV-77).
…n -ffdx no longer deletes it ([P1] review, INV-77) PR #256 review [P1]: the `live-smoke` self-hosted job's `actions/checkout` defaults to `clean: true` (`git clean -ffdx`), which deletes the gitignored, machine-local `tests/e2e/e2e.conf` on the persistent self-hosted workspace before the run step — a labeled live run would die on `FATAL: matrix not found/readable` instead of proving the matrix. Fix: resolve the live matrix config OUTSIDE the checkout. The `live-smoke` job now reads it from the `RUNNER_SMOKE_CONF` repo variable, or a stable per-box default `$HOME/.config/autonomous-dev-team/e2e.conf` (resolved in shell where $HOME expands — `vars.*` only expands in the Actions template layer). The resolved path is exported as `SMOKE_CONF` to `$GITHUB_ENV` (the harness honors the SMOKE_CONF override), and a preflight step checks readability, emitting a loud `::error::` + provisioning pointer rather than the opaque harness FATAL when the operator has not provisioned it. - tests/unit/test-ci-two-tier-lanes.sh: TC-CI-TIERS-021/022/023 pin the out-of-checkout resolution, the $GITHUB_ENV export, and the readability preflight (023 scoped to the live-smoke job region). - docs/pipeline/invariants.md: INV-77 sub-point 4 (config-outside-checkout contract) + Tested-by update. - CONTRIBUTING.md: maintainer one-time-setup note; tests/e2e/e2e.conf.example: CI provisioning note. No behavior change to the hermetic tier or the gate logic.
bff56d2 to
33da9fb
Compare
There was a problem hiding this comment.
Review reached a blocking FAIL verdict — see the Review findings: comment on issue #238 for the full list of blocking findings and remediation steps. This PR is sent back to development; reviewDecision is set to CHANGES_REQUESTED until the findings are addressed and a new review passes (INV-52).
There was a problem hiding this comment.
Review reached a blocking FAIL verdict — see the Review findings: comment on issue #238 for the full list of blocking findings and remediation steps. This PR is sent back to development; reviewDecision is set to CHANGES_REQUESTED until the findings are addressed and a new review passes (INV-52).
…repo variable — works on the ephemeral autoscaling pool (INV-77, #238) PR #256 review [P1] (cycle 11): the labeled live-smoke dry run failed the preflight on the shared self-hosted pool because the matrix config did not exist on the runner. Root cause: the pool is an EPHEMERAL autoscaling spot fleet (the dry run landed on runner i-004c…, a different box than the dispatcher's SSM target), so a per-box file at $HOME/.config/autonomous-dev-team/e2e.conf does NOT survive pool churn — a labeled run lands on a fresh runner with no matrix. Fix: make the lane self-provisioning. The preflight now resolves the matrix from the first of three sources: 1. RUNNER_SMOKE_CONF repo variable — a PATH to a runner-local file (existing). 2. SMOKE_MATRIX repo variable — the matrix CONTENT, materialized at job time to a runner temp file (mktemp under $RUNNER_TEMP, outside the checkout). A repo variable travels with the repo, so ANY pool runner materializes the same matrix — no per-box provisioning. (NEW — the self-provisioning fix.) 3. $HOME/.config/autonomous-dev-team/e2e.conf — per-box default (existing). If none resolve, the preflight FAILs loud with a pointer naming all three. Injection-safe: SMOKE_MATRIX is wired into `env:` via ${{ vars.SMOKE_MATRIX }} and consumed in the run block only as the quoted shell var "$SMOKE_MATRIX" — never ${{ }}-inlined into a run: command. Maintainer-only (a repo variable needs write access); must not carry secrets (Bedrock entries use the runner instance role). Verified end-to-end on the self-hosted box: materialize → run-agent-smoke.sh → SMOKE claude PASS / kiro PASS / agy UNAVAILABLE (quota-exhausted), SUMMARY pass=2 fail=0 unavailable=1, exit 0. - tests/unit/test-ci-two-tier-lanes.sh: TC-CI-TIERS-024/025 pin the SMOKE_MATRIX self-provisioning branch + mktemp materialization outside the checkout (18/18). - docs/pipeline/invariants.md: INV-77 sub-point 4 rewritten — three-source precedence; the pool-churn rationale; the injection-safe/secret-free contract. - CONTRIBUTING.md / tests/e2e/e2e.conf.example / docs/designs: provisioning guidance updated to the SMOKE_MATRIX-first precedence. No change to the gate logic or the hermetic tier.
There was a problem hiding this comment.
Review reached a blocking FAIL verdict — see the Review findings: comment on issue #238 for the full list of blocking findings and remediation steps. This PR is sent back to development; reviewDecision is set to CHANGES_REQUESTED until the findings are addressed and a new review passes (INV-52).
… summary (INV-77, #238) Two PR #256 review [P1]s (cycle 12): 1. Fork supply-chain: the provisioning guidance told maintainers to seed SMOKE_MATRIX / the per-box matrix from the CHECKED-OUT tests/e2e/e2e.conf.example. On a labeled fork PR that file is attacker head content, and run-agent-smoke.sh `eval`s each entry's env-setup on the self-hosted runner — so following the docs could persist arbitrary shell on the runner. Fix: all CI-bootstrap pointers (the preflight job-summary, CONTRIBUTING.md, e2e.conf.example) now seed from a TRUSTED `main` template (`gh api …/contents/…?ref=main | base64 -d`), never the PR checkout, with a review-before-use warning. (The local-dev `cp …example` for one's own machine is unaffected — no fork/runner trust boundary there.) 2. Requirement drift (Keesan12): an unlabeled PR emitted NO live-tier summary because summaries lived only inside the label-gated live-smoke job. Fix: a new always-on `live-smoke-status` job — hermetic (ubuntu-latest, credential-free), no label gate, never fails — writes a non-failing $GITHUB_STEP_SUMMARY stating whether the live tier was scheduled or intentionally skipped pending a maintainer `run-live-smoke` label. Reads event context via env vars (never ${{ }}-inlined into run:). - tests/unit/test-ci-two-tier-lanes.sh: TC-CI-TIERS-026/027 (always-on status job exists + is hermetic) and TC-CI-TIERS-028 (bootstrap pointer is fork-safe: ref=main, no checkout cp). 21/21. - docs/pipeline/invariants.md: INV-77 sub-points 5 (trusted-template bootstrap) and 6 (always-on status summary); Producer + Tested-by updated. - CONTRIBUTING.md / tests/e2e/e2e.conf.example: fork-safety warning + ref=main seeding. No change to the gate logic or the hermetic merge-required set.
Summary
Split CI into two explicit tiers (issue #238, INV-75):
hermetic-unit,hermetic-shellcheck): runs on every PR/push onubuntu-latestwith zero credentials — unit tests, adapter conformance (INV-74), and the stub-mode smoke/metrics/error-envelope self-tests, plus ShellCheck and a newactionlintworkflow lint. A fork PR or external contributor gets a fully green, fully meaningful CI without any agent-CLI auth. These are the merge-required checks.live-smoke): the Add agent-smoke E2E: three-state smoke_agent lib + PR-gating matrix harness #222 live agent-smoke matrix (real CLIs) on the self-hosted runner, gated by the maintainer-appliedrun-live-smokelabel (pull_requestlabeled) OR push tomain. Advisory (non-required) — a quota-walled CLI yieldsUNAVAILABLEwithout failing the job per Add agent-smoke E2E: three-state smoke_agent lib + PR-gating matrix harness #222's rc contract; SMOKE evidence is posted to the job summary.Security / threat model
live-smoke— theif:has no unconditional branch, so untrusted PR code can't self-trigger the self-hosted tier. Applying therun-live-smokelabel is the authorization act (label application requires write access → maintainer-only).pull_request(with thelabeledtype), neverpull_request_target(which would run untrusted head code with the base repo's token/secrets — the classic foot-gun). The threat-model rationale is documented inline in theon:header.runs-on: ${{ vars.RUNNER_LABEL && fromJSON(vars.RUNNER_LABEL) || 'self-hosted' }}— the operator's lazy-ternary self-hosted-pool convention.run:block.Branch-protection note (operator action after merge)
Mark the two
hermetic-*jobs (Hermetic / Unit + conformance,Hermetic / ShellCheck + workflow lint) as required status checks. Leavelive-smokeNOT required — it is advisory + gated on hardware/credentials only maintainers have, so requiring it would block every fork PR.Changes
.github/workflows/ci.yml— tier split + the label-gated self-hostedlive-smokejob +actionlintstep.skills/autonomous-dispatcher/scripts/setup-labels.sh— adds therun-live-smokegate label (day-one bootstrap).tests/unit/test-ci-two-tier-lanes.sh— pyyaml structural truth-table test (TC-CI-TIERS-010..051).docs/pipeline/invariants.md— INV-75 (two-tier CI contract). Satisfies the pipeline-docs-gate (setup-labels.shis a watched script).CONTRIBUTING.md"What CI runs on your PR";tests/conformance/README.mdcross-link.docs/designs/ci-two-tier-lanes.md,docs/test-cases/ci-two-tier-lanes.md.Gate truth table
run-live-smokemainDesign
docs/designs/ci-two-tier-lanes.md)Test Plan
docs/test-cases/ci-two-tier-lanes.md)tests/unit/test-ci-two-tier-lanes.sh— 13/13 pass (structural truth-table)actionlintclean on both workflowssetup-labels.sh+ the new testrun-live-smoke; live-smoke triggers on self-hosted, SMOKE summary present)E2E (TC-CI-TIERS-040)
After CI is green, a maintainer applies the
run-live-smokelabel to this PR to trigger the live tier on the self-hosted runner and confirm the SMOKE summary artifact is present (UNAVAILABLE quota entries do not fail the job). Documented here as the gated dry run.Closes #238