feat(execute): split setup/check scripts; auto-source per-task gate#105
Merged
Conversation
- Discriminator code: 'check-failed' so OnError.catchIf can distinguish the post-task gate from spawn-level breakage. - Carries the failing script's output for surfacing to logs / progress. - Added to DomainError union.
PostTaskCheckUseCase now returns Result.error(CheckFailedError) when the
script exits non-zero, instead of Result.ok({ passed: false }). This
lets the per-task chain's OnError gate block the task.
The evaluate-and-fix loop tolerates the new error code (`check-failed`)
between rounds so a red post-fix check doesn't abort the multi-round
loop — the next evaluator round re-grades.
Wrap post-task-check in two nested OnError decorators: - Inner (hard gate): catches `code: 'check-failed'` and runs mark-blocked-check, transitioning the task to `'blocked'` with reason "post-task check failed". Sets `taskBlocked: true` so downstream leaves no-op. - Outer (soft safety net): catches anything else except `aborted` / `check-failed`, swallowing spawn-level breakage so a flaky environment doesn't strand the task. Adds two trace-fence tests (red script → mark-blocked-check, spawn error → post-task-check-noop) to lock the OnError nesting order.
Walks sprint.affectedRepositories, looks up each path's Repository on the project, runs that repo's checkScript exactly once, and stamps Sprint.checkRanAt[repoPath] = now on success. Threads the updated sprint forward through ctx so the per-task bridge sees populated checkRanAt and the prompt builder can render "Pre-task environment check passed at <ISO>" instead of "Not run.". The leaf is wrapped in OnError(catchIf: code !== 'aborted' && code !== 'invalid-state', fallback: noop) — spawn-level errors degrade gracefully but a real red baseline propagates and hard-aborts the sprint. Adds three trace-fence tests (happy path, red baseline, spawn error swallow) plus an empty-affected-repositories short-circuit case. Adds projectRepo to createExecuteFlow's deps Pick so the leaf can resolve repos by path.
Locks the contract that {{ENVIRONMENT_STATUS}} renders
"Pre-task environment check passed at <ISO>" when sprint.checkRanAt
has an entry for task.projectPath, not the previous "Not run." regression.
- REQUIREMENTS.md: tick the three sprint-start / post-task gate checkboxes — they're now backed by code + tests. - CHANGELOG.md: Unreleased Fixed bullets for sprint-start iteration, post-task gate blocking, and graceful spawn-error degradation.
…eckScript
- ExternalPort: add `runSetupScript`; drop `'sprint-start'` from CheckScriptPhase
(now `'post-task' | 'feedback'`). Internal CheckScriptRunner phase becomes
`'setup' | 'post-task' | 'feedback'` for the lifecycle env var.
- execute-flow: replace `check-scripts-sprint-start` leaf with
`setup-scripts-sprint-start` — iterates `sprint.affectedRepositories`,
runs each `Repository.setupScript` once, persists `Sprint.setupRanAt`
audit stamps. Real failures hard-abort with
`InvalidStateError({ currentState: 'setup-failed' })`; spawn-level
flake degrades to noop (existing OnError semantics preserved).
- Sprint entity: rename `checkRanAt` → `setupRanAt`,
`recordCheckRun` → `recordSetupRun`. Sprint schema field renamed on
read + write — no backward-compat shim, pre-existing sprint.json files
with `checkRanAt` will fail Zod validation (single-user pre-1.0).
- Prompt builder: per-task prompt now renders
`Setup script ran at <ISO>.` (was `Pre-task environment check passed at <ISO>.`)
so the AI doesn't conflate "env prepared" with "tests pass".
`Repository.setupScript` was previously dead at runtime — collected during
`project onboard`, persisted, exported in context.md, but never executed.
`Repository.checkScript` did double duty as both the sprint-start "baseline"
gate and the per-task verification gate. This split aligns the runtime
with the data model: setup runs once at sprint start, check is the
per-task and feedback gate.
Behaviour notes for downstream:
- User scripts that branch on `RALPHCTL_LIFECYCLE_EVENT === 'sprint-start'`
now see `'setup'`.
- `RALPHCTL_SETUP_TIMEOUT_MS` still applies to both setup and check
scripts (they share the runner) — name retained, scope unchanged.
The post-task gate previously only fired when the user passed `sprint start --check-script <cmd>`. The repo's persisted `Repository.checkScript` was never read at execute time, so the "check after every task" REQUIREMENT was satisfied only by ad-hoc flag use. Adds a `resolve-check-scripts` chain leaf to the initialize phase (runs before `setup-scripts-sprint-start`, after dirty-tree-preflight). The leaf walks `sprint.affectedRepositories` once, looks each repo up on the project, and stuffs `Repository.checkScript` into a `ReadonlyMap<AbsolutePath, string>` on `ctx.checkScripts`. The per-task bridge then seeds each per-task chain's `checkScript` from that map. The CLI `--check-script` global override wins when set. Running resolve BEFORE the soft-wrapped setup leaf means even a flaky setup spawn that gets absorbed by the OnError still leaves the per-task gate with its check-script map intact. Also fixes outstanding doc/comment drift from the prior rename: - REQUIREMENTS.md outer trace + check-gate criterion wording - ARCHITECTURE.md outer trace + Repository entity description - CLAUDE.md post-task-gate bullet - CHANGELOG Unreleased: reframe from old check-script names to current setup/check semantics + auto-source bullet - execute-flow.ts stale comment naming the noop step - `sprint start --check-script` flag help text now describes it as a global override
cf229f1 to
c1d64f5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
setupScript(the deterministic baseline so Claude departs reliably) via the renamedsetup-scripts-sprint-startchain leaf — iteratessprint.affectedRepositories, stampsSprint.setupRanAt[repoPath], hard-aborts on the first red exit naming the failing repo. The per-task{{ENVIRONMENT_STATUS}}slot now renders "Setup script ran at ".checkScriptis auto-sourced from each repo's config. A newresolve-check-scriptschain leaf (runs in the initialize phase before setup) walks the sprint's project once and populatesctx.checkScripts. The per-task bridge seeds the gate per-task without--check-script. The CLI flag stays as a global override (help text updated).PostTaskCheckUseCasereturnsResult.error(CheckFailedError)on red;OnError(catchIf: code === 'check-failed')transitions the task toblockedinstead of lettingmark-doneproceed. Spawn-level errors (missing binary, EPERM) degrade to a noop via a soft outerOnErrorso a flaky env never strands a sprint or task.Pipeline (post-merge)
Test plan
pnpm typecheckcleanpnpm lintcleanpnpm test— 233 files, 2257 tests pass (incl. 2 new auto-source / override-wins tests)ralphctl sprint start <id>on a repo withsetupScript+checkScriptconfigured — confirm setup runs once, check runs after every tasksetupScriptaborts the chain naming the failing repocheckScriptafter a task transitions the task toblocked(notdone); subsequent tasks still execute--check-script <cmd>global override fires for every task instead of the per-repo config