Skip to content

Eliminate the write/read race CLASS: apply the #1095 causal-outbox pattern to ALL remaining forward state-trigger legs (reviewing/fixing/merge_ready/open_pr/reconcile) — behavioral, ungated by the pin bump #1228

@loning

Description

@loning

Re-entry of #1103 after the ready-budget stopgap (45→120, PR #1205): force-terminated to blocked (budget_minutes=45, from ready/implementing) while the implement codex was still within its 60-min run. blocked-from-ready has no operator re-entry (#1098 gap), so re-trigger = close+re-file. Re-enters with the 120-min budget.


The race is a CLASS, not a single leg

#1095 root-caused and fixed the write/read marker-visibility race at the ready leg: a department wrote a state-marker comment asynchronously AND synchronously direct-raised the downstream devloop_ready with the same dedup_key, so the proofless direct-raise won reliable-delivery dedup and suppressed the causal comment_handoff (write-confirmed) raise → the consumer read a not-yet-visible marker → strand/churn.

That exact shape exists at every forward state-trigger leg. migration/forward-direct-raise.allowlist currently has 15 entries, each a department that direct-raises a downstream trigger without a write-confirmed hand_off:

  • requests/review.lua|raise_fix_reviewing|devloop_reviewing (fixing→reviewing)
  • merge/main.lua|raise_reviewing_for_current_head|devloop_reviewing (merge→reviewing)
  • observe_pr/main.lua|maybe_apply_rereview_command|devloop_reviewing
  • fix/main.lua|raise_stale_speculation_refix|devloop_fixing
  • merge/main.lua|raise_fixing|devloop_fixing
  • review_carry_over.lua|raise_review_carry_over|devloop_merge_ready
  • implement/main.lua|raise_implementing|devloop_open_pr and raise_open_pr_from_fact|devloop_open_pr
  • loop/main.lua|pipeline|devloop_reconcile
  • (the ready_split.lua|replay_ready_state|devloop_ready redrive is already legal "REDRIVE-from-visible" after fix(github-devloop): comment_handoff sole forward devloop_ready producer; verifiable hand_off on every ready producer #1095)

Live evidence: as soon as #1095 let issues reach implementing, the implementing leg's variant of the same race surfaced (#1101 — implement re-spawns codex during the implement-attempt marker's visibility lag). The reviewing/fixing/merge_ready legs will exhibit it the moment they carry real load.

Scope clarification — this is BEHAVIORAL and is NOT gated by the pin bump

#1075 conflates two separable things and gates BOTH behind the substrate pin bump (#1070/#1074):

  1. Behavioral: route the trigger through the write-confirmed causal outbox (github_comment_written → comment_handoff) with a verifiable hand_off; remove the proofless direct-raise. This is the race fix and needs NO engine changefix(github-devloop): comment_handoff sole forward devloop_ready producer; verifiable hand_off on every ready producer #1095 proved it (it landed the ready leg behaviorally on the current engine).
  2. Capability: declare produces per dept and have the engine enforce raise ⊆ produces (the substrate#131 enforcement). This DOES need the pin bump.

This issue is leg (1) only: apply the #1095 pattern (sole causal forward producer + verifiable hand_off bound to the visible marker + monotonic redrive generation + handoff only on the matching canonical state + canonical marker effects) to each remaining leg, shrinking forward-direct-raise.allowlist toward 0 behaviorally — independent of and ahead of the pin bump. When the allowlist hits 0 the race class is structurally gone; #1075's capability enforcement then makes any regression CI-red.

Done when

Each remaining forward-direct-raise leg is converted to the causal outbox + verifiable hand_off (mirroring #1095), with a per-leg consumer test proving the consumer verifies the visible marker and advances without "not visible" churn; forward-direct-raise.allowlist shrinks toward 0; #1101 (implementing leg) is one child of this. Mechanical invariant: exactly one canonical producer per forward state-trigger queue = the write-confirmed causal handoff; any proofless direct-raise is CI-forbidden (the existing ratchet, driven to 0).

Relates to #1046 (design, closed), #1075 (capability enforcement, gated), #1095 (ready leg, done), #1101 (implementing leg).

⟦AI:FKST⟧


Re-filed from #1212 — this instance was force-terminated by the implement-liveness elapsed-time proxy bug (now fixed in #1213); a blocked issue has no operator re-entry, so it is re-activated by re-filing. Original scope unchanged. ⟦AI:FKST⟧

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions