feat(execute): split setup/check scripts; auto-source per-task gate by lukas-grigis · Pull Request #105 · lukas-grigis/ralphctl

lukas-grigis · 2026-05-06T06:17:29Z

Summary

Setup ≠ check. Sprint start now runs each repo's configured setupScript (the deterministic baseline so Claude departs reliably) via the renamed setup-scripts-sprint-start chain leaf — iterates sprint.affectedRepositories, stamps Sprint.setupRanAt[repoPath], hard-aborts on the first red exit naming the failing repo. The per-task {{ENVIRONMENT_STATUS}} slot now renders "Setup script ran at ".
Per-task checkScript is auto-sourced from each repo's config. A new resolve-check-scripts chain leaf (runs in the initialize phase before setup) walks the sprint's project once and populates ctx.checkScripts. The per-task bridge seeds the gate per-task without --check-script. The CLI flag stays as a global override (help text updated).
Post-task gate is a hard fence. PostTaskCheckUseCase returns Result.error(CheckFailedError) on red; OnError(catchIf: code === 'check-failed') transitions the task to blocked instead of letting mark-done proceed. Spawn-level errors (missing binary, EPERM) degrade to a noop via a soft outer OnError so a flaky env never strands a sprint or task.
Doc/changelog/comment sync. REQUIREMENTS, ARCHITECTURE, CLAUDE.md, CHANGELOG, top-of-file docstring, and CLI flag help text all updated to match.

Pipeline (post-merge)

sprint start
  resolve-branch
  dirty-tree-preflight
  resolve-check-scripts        ← NEW (pure read; populates ctx.checkScripts)
  setup-scripts-sprint-start   ← deterministic baseline (red exit aborts)
loop per task:
  branch-preflight
  execute-task
  post-task-check              ← auto-fires from repo.checkScript;
                                  red → mark blocked → downstream no-ops
  evaluate-task                ← skipped if blocked
  mark-done

Test plan

pnpm typecheck clean
pnpm lint clean
pnpm test — 233 files, 2257 tests pass (incl. 2 new auto-source / override-wins tests)
Step-trace fences updated for the new outer trace
Manual smoke: ralphctl sprint start <id> on a repo with setupScript + checkScript configured — confirm setup runs once, check runs after every task
Manual smoke: red setupScript aborts the chain naming the failing repo
Manual smoke: red checkScript after a task transitions the task to blocked (not done); subsequent tasks still execute
Manual smoke: --check-script <cmd> global override fires for every task instead of the per-repo config

- Discriminator code: 'check-failed' so OnError.catchIf can distinguish the post-task gate from spawn-level breakage. - Carries the failing script's output for surfacing to logs / progress. - Added to DomainError union.

PostTaskCheckUseCase now returns Result.error(CheckFailedError) when the script exits non-zero, instead of Result.ok({ passed: false }). This lets the per-task chain's OnError gate block the task. The evaluate-and-fix loop tolerates the new error code (`check-failed`) between rounds so a red post-fix check doesn't abort the multi-round loop — the next evaluator round re-grades.

Wrap post-task-check in two nested OnError decorators: - Inner (hard gate): catches `code: 'check-failed'` and runs mark-blocked-check, transitioning the task to `'blocked'` with reason "post-task check failed". Sets `taskBlocked: true` so downstream leaves no-op. - Outer (soft safety net): catches anything else except `aborted` / `check-failed`, swallowing spawn-level breakage so a flaky environment doesn't strand the task. Adds two trace-fence tests (red script → mark-blocked-check, spawn error → post-task-check-noop) to lock the OnError nesting order.

Walks sprint.affectedRepositories, looks up each path's Repository on the project, runs that repo's checkScript exactly once, and stamps Sprint.checkRanAt[repoPath] = now on success. Threads the updated sprint forward through ctx so the per-task bridge sees populated checkRanAt and the prompt builder can render "Pre-task environment check passed at <ISO>" instead of "Not run.". The leaf is wrapped in OnError(catchIf: code !== 'aborted' && code !== 'invalid-state', fallback: noop) — spawn-level errors degrade gracefully but a real red baseline propagates and hard-aborts the sprint. Adds three trace-fence tests (happy path, red baseline, spawn error swallow) plus an empty-affected-repositories short-circuit case. Adds projectRepo to createExecuteFlow's deps Pick so the leaf can resolve repos by path.

Locks the contract that {{ENVIRONMENT_STATUS}} renders "Pre-task environment check passed at <ISO>" when sprint.checkRanAt has an entry for task.projectPath, not the previous "Not run." regression.

- REQUIREMENTS.md: tick the three sprint-start / post-task gate checkboxes — they're now backed by code + tests. - CHANGELOG.md: Unreleased Fixed bullets for sprint-start iteration, post-task gate blocking, and graceful spawn-error degradation.

…eckScript - ExternalPort: add `runSetupScript`; drop `'sprint-start'` from CheckScriptPhase (now `'post-task' | 'feedback'`). Internal CheckScriptRunner phase becomes `'setup' | 'post-task' | 'feedback'` for the lifecycle env var. - execute-flow: replace `check-scripts-sprint-start` leaf with `setup-scripts-sprint-start` — iterates `sprint.affectedRepositories`, runs each `Repository.setupScript` once, persists `Sprint.setupRanAt` audit stamps. Real failures hard-abort with `InvalidStateError({ currentState: 'setup-failed' })`; spawn-level flake degrades to noop (existing OnError semantics preserved). - Sprint entity: rename `checkRanAt` → `setupRanAt`, `recordCheckRun` → `recordSetupRun`. Sprint schema field renamed on read + write — no backward-compat shim, pre-existing sprint.json files with `checkRanAt` will fail Zod validation (single-user pre-1.0). - Prompt builder: per-task prompt now renders `Setup script ran at <ISO>.` (was `Pre-task environment check passed at <ISO>.`) so the AI doesn't conflate "env prepared" with "tests pass". `Repository.setupScript` was previously dead at runtime — collected during `project onboard`, persisted, exported in context.md, but never executed. `Repository.checkScript` did double duty as both the sprint-start "baseline" gate and the per-task verification gate. This split aligns the runtime with the data model: setup runs once at sprint start, check is the per-task and feedback gate. Behaviour notes for downstream: - User scripts that branch on `RALPHCTL_LIFECYCLE_EVENT === 'sprint-start'` now see `'setup'`. - `RALPHCTL_SETUP_TIMEOUT_MS` still applies to both setup and check scripts (they share the runner) — name retained, scope unchanged.

The post-task gate previously only fired when the user passed `sprint start --check-script <cmd>`. The repo's persisted `Repository.checkScript` was never read at execute time, so the "check after every task" REQUIREMENT was satisfied only by ad-hoc flag use. Adds a `resolve-check-scripts` chain leaf to the initialize phase (runs before `setup-scripts-sprint-start`, after dirty-tree-preflight). The leaf walks `sprint.affectedRepositories` once, looks each repo up on the project, and stuffs `Repository.checkScript` into a `ReadonlyMap<AbsolutePath, string>` on `ctx.checkScripts`. The per-task bridge then seeds each per-task chain's `checkScript` from that map. The CLI `--check-script` global override wins when set. Running resolve BEFORE the soft-wrapped setup leaf means even a flaky setup spawn that gets absorbed by the OnError still leaves the per-task gate with its check-script map intact. Also fixes outstanding doc/comment drift from the prior rename: - REQUIREMENTS.md outer trace + check-gate criterion wording - ARCHITECTURE.md outer trace + Repository entity description - CLAUDE.md post-task-gate bullet - CHANGELOG Unreleased: reframe from old check-script names to current setup/check semantics + auto-source bullet - execute-flow.ts stale comment naming the noop step - `sprint start --check-script` flag help text now describes it as a global override

…scripts

lukas-grigis added 9 commits May 6, 2026 06:25

feat(domain): add CheckFailedError for post-task gate

88ad4a0

- Discriminator code: 'check-failed' so OnError.catchIf can distinguish the post-task gate from spawn-level breakage. - Carries the failing script's output for surfacing to logs / progress. - Added to DomainError union.

test(prompts): assert checkRanAt renders in execute prompt

4736e03

Locks the contract that {{ENVIRONMENT_STATUS}} renders "Pre-task environment check passed at <ISO>" when sprint.checkRanAt has an entry for task.projectPath, not the previous "Not run." regression.

docs(execute-flow): refresh top-of-file step trace for resolve-check-…

c1d64f5

…scripts

lukas-grigis force-pushed the refactor/setup-vs-check-scripts branch from cf229f1 to c1d64f5 Compare May 6, 2026 06:21

lukas-grigis merged commit 204f4de into main May 6, 2026
1 check passed

lukas-grigis deleted the refactor/setup-vs-check-scripts branch May 6, 2026 06:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(execute): split setup/check scripts; auto-source per-task gate#105

feat(execute): split setup/check scripts; auto-source per-task gate#105
lukas-grigis merged 9 commits into
mainfrom
refactor/setup-vs-check-scripts

lukas-grigis commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lukas-grigis commented May 6, 2026

Summary

Pipeline (post-merge)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant