Skip to content

feat(execute): split setup/check scripts; auto-source per-task gate#105

Merged
lukas-grigis merged 9 commits into
mainfrom
refactor/setup-vs-check-scripts
May 6, 2026
Merged

feat(execute): split setup/check scripts; auto-source per-task gate#105
lukas-grigis merged 9 commits into
mainfrom
refactor/setup-vs-check-scripts

Conversation

@lukas-grigis
Copy link
Copy Markdown
Owner

Summary

  • Setup ≠ check. Sprint start now runs each repo's configured setupScript (the deterministic baseline so Claude departs reliably) via the renamed setup-scripts-sprint-start chain leaf — iterates sprint.affectedRepositories, stamps Sprint.setupRanAt[repoPath], hard-aborts on the first red exit naming the failing repo. The per-task {{ENVIRONMENT_STATUS}} slot now renders "Setup script ran at ".
  • Per-task checkScript is auto-sourced from each repo's config. A new resolve-check-scripts chain leaf (runs in the initialize phase before setup) walks the sprint's project once and populates ctx.checkScripts. The per-task bridge seeds the gate per-task without --check-script. The CLI flag stays as a global override (help text updated).
  • Post-task gate is a hard fence. PostTaskCheckUseCase returns Result.error(CheckFailedError) on red; OnError(catchIf: code === 'check-failed') transitions the task to blocked instead of letting mark-done proceed. Spawn-level errors (missing binary, EPERM) degrade to a noop via a soft outer OnError so a flaky env never strands a sprint or task.
  • Doc/changelog/comment sync. REQUIREMENTS, ARCHITECTURE, CLAUDE.md, CHANGELOG, top-of-file docstring, and CLI flag help text all updated to match.

Pipeline (post-merge)

sprint start
  resolve-branch
  dirty-tree-preflight
  resolve-check-scripts        ← NEW (pure read; populates ctx.checkScripts)
  setup-scripts-sprint-start   ← deterministic baseline (red exit aborts)
loop per task:
  branch-preflight
  execute-task
  post-task-check              ← auto-fires from repo.checkScript;
                                  red → mark blocked → downstream no-ops
  evaluate-task                ← skipped if blocked
  mark-done

Test plan

  • pnpm typecheck clean
  • pnpm lint clean
  • pnpm test — 233 files, 2257 tests pass (incl. 2 new auto-source / override-wins tests)
  • Step-trace fences updated for the new outer trace
  • Manual smoke: ralphctl sprint start <id> on a repo with setupScript + checkScript configured — confirm setup runs once, check runs after every task
  • Manual smoke: red setupScript aborts the chain naming the failing repo
  • Manual smoke: red checkScript after a task transitions the task to blocked (not done); subsequent tasks still execute
  • Manual smoke: --check-script <cmd> global override fires for every task instead of the per-repo config

- Discriminator code: 'check-failed' so OnError.catchIf can distinguish
  the post-task gate from spawn-level breakage.
- Carries the failing script's output for surfacing to logs / progress.
- Added to DomainError union.
PostTaskCheckUseCase now returns Result.error(CheckFailedError) when the
script exits non-zero, instead of Result.ok({ passed: false }). This
lets the per-task chain's OnError gate block the task.

The evaluate-and-fix loop tolerates the new error code (`check-failed`)
between rounds so a red post-fix check doesn't abort the multi-round
loop — the next evaluator round re-grades.
Wrap post-task-check in two nested OnError decorators:
- Inner (hard gate): catches `code: 'check-failed'` and runs
  mark-blocked-check, transitioning the task to `'blocked'` with
  reason "post-task check failed". Sets `taskBlocked: true` so
  downstream leaves no-op.
- Outer (soft safety net): catches anything else except `aborted` /
  `check-failed`, swallowing spawn-level breakage so a flaky
  environment doesn't strand the task.

Adds two trace-fence tests (red script → mark-blocked-check, spawn
error → post-task-check-noop) to lock the OnError nesting order.
Walks sprint.affectedRepositories, looks up each path's Repository on
the project, runs that repo's checkScript exactly once, and stamps
Sprint.checkRanAt[repoPath] = now on success. Threads the updated
sprint forward through ctx so the per-task bridge sees populated
checkRanAt and the prompt builder can render "Pre-task environment
check passed at <ISO>" instead of "Not run.".

The leaf is wrapped in OnError(catchIf: code !== 'aborted' && code
!== 'invalid-state', fallback: noop) — spawn-level errors degrade
gracefully but a real red baseline propagates and hard-aborts the
sprint. Adds three trace-fence tests (happy path, red baseline,
spawn error swallow) plus an empty-affected-repositories short-circuit
case.

Adds projectRepo to createExecuteFlow's deps Pick so the leaf can
resolve repos by path.
Locks the contract that {{ENVIRONMENT_STATUS}} renders
"Pre-task environment check passed at <ISO>" when sprint.checkRanAt
has an entry for task.projectPath, not the previous "Not run." regression.
- REQUIREMENTS.md: tick the three sprint-start / post-task gate
  checkboxes — they're now backed by code + tests.
- CHANGELOG.md: Unreleased Fixed bullets for sprint-start iteration,
  post-task gate blocking, and graceful spawn-error degradation.
…eckScript

- ExternalPort: add `runSetupScript`; drop `'sprint-start'` from CheckScriptPhase
  (now `'post-task' | 'feedback'`). Internal CheckScriptRunner phase becomes
  `'setup' | 'post-task' | 'feedback'` for the lifecycle env var.
- execute-flow: replace `check-scripts-sprint-start` leaf with
  `setup-scripts-sprint-start` — iterates `sprint.affectedRepositories`,
  runs each `Repository.setupScript` once, persists `Sprint.setupRanAt`
  audit stamps. Real failures hard-abort with
  `InvalidStateError({ currentState: 'setup-failed' })`; spawn-level
  flake degrades to noop (existing OnError semantics preserved).
- Sprint entity: rename `checkRanAt` → `setupRanAt`,
  `recordCheckRun` → `recordSetupRun`. Sprint schema field renamed on
  read + write — no backward-compat shim, pre-existing sprint.json files
  with `checkRanAt` will fail Zod validation (single-user pre-1.0).
- Prompt builder: per-task prompt now renders
  `Setup script ran at <ISO>.` (was `Pre-task environment check passed at <ISO>.`)
  so the AI doesn't conflate "env prepared" with "tests pass".

`Repository.setupScript` was previously dead at runtime — collected during
`project onboard`, persisted, exported in context.md, but never executed.
`Repository.checkScript` did double duty as both the sprint-start "baseline"
gate and the per-task verification gate. This split aligns the runtime
with the data model: setup runs once at sprint start, check is the
per-task and feedback gate.

Behaviour notes for downstream:
- User scripts that branch on `RALPHCTL_LIFECYCLE_EVENT === 'sprint-start'`
  now see `'setup'`.
- `RALPHCTL_SETUP_TIMEOUT_MS` still applies to both setup and check
  scripts (they share the runner) — name retained, scope unchanged.
The post-task gate previously only fired when the user passed
`sprint start --check-script <cmd>`. The repo's persisted
`Repository.checkScript` was never read at execute time, so the
"check after every task" REQUIREMENT was satisfied only by ad-hoc
flag use.

Adds a `resolve-check-scripts` chain leaf to the initialize phase
(runs before `setup-scripts-sprint-start`, after dirty-tree-preflight).
The leaf walks `sprint.affectedRepositories` once, looks each repo
up on the project, and stuffs `Repository.checkScript` into a
`ReadonlyMap<AbsolutePath, string>` on `ctx.checkScripts`. The
per-task bridge then seeds each per-task chain's `checkScript` from
that map. The CLI `--check-script` global override wins when set.

Running resolve BEFORE the soft-wrapped setup leaf means even a
flaky setup spawn that gets absorbed by the OnError still leaves
the per-task gate with its check-script map intact.

Also fixes outstanding doc/comment drift from the prior rename:
- REQUIREMENTS.md outer trace + check-gate criterion wording
- ARCHITECTURE.md outer trace + Repository entity description
- CLAUDE.md post-task-gate bullet
- CHANGELOG Unreleased: reframe from old check-script names to
  current setup/check semantics + auto-source bullet
- execute-flow.ts stale comment naming the noop step
- `sprint start --check-script` flag help text now describes
  it as a global override
@lukas-grigis lukas-grigis force-pushed the refactor/setup-vs-check-scripts branch from cf229f1 to c1d64f5 Compare May 6, 2026 06:21
@lukas-grigis lukas-grigis merged commit 204f4de into main May 6, 2026
1 check passed
@lukas-grigis lukas-grigis deleted the refactor/setup-vs-check-scripts branch May 6, 2026 06:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant