Build full Launchplane merge train

## Objective

Build Launchplane's full merge train as a batch-validating, PR-native landing system. The current implementation is a sequential ordered merge queue runner: it selects one eligible PR and applies one transition per pass. The full target is different: collect multiple eligible PRs, build one combined candidate from `main + queued PRs in order`, run CI once on that combined candidate, then land the original PRs in order so GitHub preserves the expected PR UI/UX.

This plan optimizes for the workflow we actually want: many unrelated queued PRs can be proven compatible together and then merged from one Launchplane action, while still failing closed when the combined tree does not pass.

## Finish Line

Launchplane can collect multiple `ready-to-merge` PRs for one repository/base branch, build and test a combined batch candidate, and after the candidate passes, merge the original PRs in queue order with clear records, PR feedback, and rollback/stop behavior if GitHub state changes before landing completes.

## Current Status

State: Flat batch merge-train primitives are merged, deployed, and now live-proven with a real stacked codex-skills PR set. Launchplane collapsed #61 -> #60 -> #59, required fresh root checks, built and observed a flat candidate for #59, then landed the original PR through the service-backed workflow. This proves the chosen design: stack handling is pre-train normalization, and the batch train remains flat/PR-native.
Next action: Update the codex-skills Launchplane skill, then continue #605 scheduler/controller automation so these phases can run unattended instead of by manual record IDs.
Blocked by: Not blocked for manual stack-collapse + flat landing. Still blocked from the full product finish line by scheduler/controller automation, batch splitting/isolation, and operator feedback polish.
Last verified: 2026-05-14. codex-skills PRs #59/#60/#61 are merged, `main` is `c704e4d7a8e3975eef38fdce5e0446103b5ce335`, CodeQL-on-main run `25889619001` passed, and Launchplane health is `ok` on Postgres.

## Scope

- **In**: persisted train queue records, train position state, speculative validation model, scheduler/admission updates, GitHub ref/update/merge orchestration, PR/operator feedback, records/read models, and rollout validation.
- **Out**: replacing GitHub branch protection, bypassing required checks, direct DB mutation for live operations, hardcoded repository defaults, and native GitHub merge queue integration unless chosen later as a separate provider adapter.

## Acceptance Criteria

- [ ] Queue entries are explicit Launchplane records, not only derived from current labels.
- [ ] A batch candidate can be built from `base + eligible queued PRs in queue order`.
- [ ] CI runs against the combined batch candidate before any batch landing begins.
- [ ] If the batch candidate passes, Launchplane merges the original PRs in order so GitHub preserves normal PR UI/UX.
- [ ] Before each original PR merge, Launchplane verifies the PR still matches the tested candidate sequence or fails closed and rebuilds/requeues.
- [ ] If candidate build or CI fails, Launchplane splits/bisects or falls back to smaller batches to isolate blocker PRs.
- [ ] Earlier failures either pause the train or reflow later entries according to stored policy.
- [ ] Operators and PR authors can see batch position, candidate status, blocker, and next-action feedback.
- [ ] Every mutation has an audit record with policy digest, batch id, queue entries, observed SHAs, validation state, and GitHub API result.
- [ ] Dry-run mode shows the whole queued batch, candidate plan, and landing plan without mutating GitHub.
- [ ] Live rollout proves a multi-PR batch dry-run and then a low-risk mutate smoke.

## Relationships

- Builds on #410, the Level 1 sequential merge queue baseline.
- Related to #413 and #414 for self-hosted runner lane inventory/control.
- Related to #514 for trusted Every Code feedback actor authorization.
- related: cbusillo/launchplane#410 - https://github.com/cbusillo/launchplane/issues/410

## Validation

- Unit tests for queue state transitions, position assignment, stale head detection, reflow, pause, and cleanup.
- Contract/storage tests for train queue records and migrations.
- Service tests for queue read/write, dry-run train projection, scheduler admission, and authorization.
- GitHub API adapter tests with mocked refs, statuses/check runs, branch updates, labels, comments, and merges.
- Live dry-run against a repository with at least two queued PRs.
- Live mutate smoke on low-risk PRs, with post-merge records and PR feedback inspected.

## Decisions

- Keep fail-closed behavior from the sequential baseline.
- Do not use file/env-backed train policy; train authority remains DB-backed records and service routes.
- Do not hardcode real repositories into production defaults or fallback behavior.
- Treat the current implementation as an ordered merge queue baseline, not the final full merge train.
- First full-train target is batch candidate validation followed by PR-native sequential landing.
- Defer per-position speculative train refs until there is evidence batch validation is insufficient.
- Treat stacked PR handling as pre-train normalization: collapse a same-repository linear stack into the root PR, require fresh green checks on the root against the base branch, then admit only the root PR into the existing flat batch train.
- Do not add native arbitrary stacked-PR landing to the batch train; use the same flat train primitives after stack collapse.
- Require explicit stored intent and fresh SHA guards before Launchplane mutates feature branches during stack collapse.
- Keep codex-skills as an operator-selected live test bed only; do not hardcode it or any other real repository in production logic.

## Open Questions

- Should the batch candidate be a temporary branch under `refs/heads/launchplane/train/...`, a hidden ref if GitHub Actions can validate it, or a draft PR branch?
- What is the first blocking-isolation strategy: binary split, one-at-a-time fallback, or both?
- How strict should PR-native landing verification be when GitHub reports the PR mergeable but the tested candidate sequence was constructed independently?
- What is the minimum runner capacity needed before scheduled mutate mode is safe?
- Which codex-skills PRs should be queued for the first two-PR live batch dry-run/smoke?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build full Launchplane merge train #601

Objective

Finish Line

Current Status

Scope

Acceptance Criteria

Relationships

Validation

Decisions

Open Questions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Build full Launchplane merge train #601

Description

Objective

Finish Line

Current Status

Scope

Acceptance Criteria

Relationships

Validation

Decisions

Open Questions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions