feat(sdlc): FM-11 lane supervisor — dead lanes always auto-restart by ryanklee · Pull Request #3803 · hapax-systems/hapax-council

ryanklee · 2026-05-31T20:16:30Z

FM-11 lane supervisor — dead lanes always auto-restart

Task: reform-fix-lane-supervisor-20260531
AuthorityCase: CASE-CROSS-RUNTIME-COMMS-001
Parent spec: coordination-reform-master-design-2026-05-30 (§FM-11, Phase 6)

Problem

There was no production lane supervisor. hapax-claude-lane@.service had Restart=no. The only dead-lane respawn logic lived in hapax-lane-rate-limit-watchdog, which was (a) never attached to any timer (dead code), (b) gated restart on task-presence — a dead lane with no active task hit "DEAD with no active task — not restarting" and was left dead (violating the operator mandate that dead lanes must always auto-restart), and (c) covered only beta gamma delta epsilon zeta (no cx-*, no antigrav). A test even locked in the non-restart behaviour.

Change

A dedicated hapax-lane-supervisor that decouples process-liveness from task-presence. Clean split: the supervisor guarantees the process exists; the launcher/dispatcher decides what it does — so respawning a quota-walled or task-less lane into idle-await is correct, not spam.

scripts/hapax-lane-supervisor — one_for_one liveness across all runtimes:
- claude (greek): pidfile-alive or tmux session; codex (cx-*) / antigrav: tmux session.
- Respawn (always, when worktree exists + past cooldown + under a StartLimit-style burst cap): claude+task → hapax-claude-headless resume; claude+no-task → hapax-claude --readonly idle-await (the headless launcher is default-deny when task-less); codex/antigrav → tmux launcher --no-claim.
systemd/units/hapax-lane-supervisor.{service,timer} — a wired 60s oneshot (auto-enabled by the install-units.sh timer sweep).
systemd/units/hapax-claude-lane@.service — Restart=no → always + RestartSec + StartLimitIntervalSec/StartLimitBurst (FM-11 FORMALIZE; per-lane task binding preserved).
scripts/hapax-lane-rate-limit-watchdog — drop the unwired/task-gated/greek-only dead-lane block (superseded); add HAPAX_HEADLESS_PIPE_DIR override so tests don't read the host's real /run/user lane runtime.

Acceptance criteria

A supervisor guarantees lane-process liveness regardless of task presence (Restart=always+StartLimit and a one_for_one supervisor).
cx-* and antigrav lanes are covered (not greek-only).
A dead-no-task lane is respawned (test updated to assert this, replacing the locked-in non-restart assertion).
Ruff + tests pass.

Tests

tests/scripts/test_lane_supervisor.py (12): respawn dead-no-task (idle-await) / dead-with-task (resume) claude; cx-*/antigrav respawn; live-lane skip (pidfile + tmux); cooldown; burst/StartLimit back-off; worktree-absent skip; dry-run; shell syntax.
tests/systemd/test_lane_supervisor_units.py (4): supervisor oneshot+60s timer; Restart=always+StartLimit; per-lane task env preserved.
tests/scripts/test_lane_watchdog_methodology_dispatch.py: replaced the locked-in non-restart assertion with the clean-split delegation assertion.

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

New Features
- Automated periodic lane health checks running every 60 seconds
- Automatic lane recovery upon failure detection
Improvements
- Added restart rate limiting to prevent cascading failures (maximum 5 restarts per 5-minute window)
- Configured automatic service restart with 10-second intervals between attempts

Build the FM-11 lane supervisor (CASE-CROSS-RUNTIME-COMMS-001, reform Phase 6): a dead lane is ALWAYS respawned regardless of task presence, across the claude/codex/antigrav runtimes (operator standing mandate — idle-await is fine, dead is not). - scripts/hapax-lane-supervisor: dedicated one_for_one liveness supervisor. Liveness = claude pidfile-alive OR tmux session; codex/antigrav tmux session. Respawn (always, when the worktree exists + past cooldown + under a StartLimit-style burst cap): claude+task -> headless resume; claude+no-task -> read-only idle-await (the headless launcher is default-deny when task-less); codex/antigrav -> tmux launcher --no-claim. Clean split: the supervisor guarantees the process exists; the launcher/dispatcher decides what it does (so respawning a quota-walled or task-less lane into idle-await is correct, not spam). - systemd/units/hapax-lane-supervisor.{service,timer}: a wired 60s oneshot. The old dead-lane block lived in hapax-lane-rate-limit-watchdog, which was never attached to any timer. - systemd/units/hapax-claude-lane@.service: Restart=no -> always + RestartSec + StartLimitIntervalSec/StartLimitBurst (FM-11 FORMALIZE). - scripts/hapax-lane-rate-limit-watchdog: drop the unwired, task-gated, greek-only dead-lane block (superseded by the supervisor) and add a HAPAX_HEADLESS_PIPE_DIR override so tests do not read the host's real /run/user lane runtime. - tests: add test_lane_supervisor.py (12) + test_lane_supervisor_units.py (4); replace the locked-in non-restart assertion in test_lane_watchdog_methodology_dispatch.py with the clean-split delegation assertion. Acceptance criteria met: liveness guaranteed regardless of task presence, cx-*/antigrav covered (not greek-only), dead-no-task lane respawned. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai · 2026-05-31T20:16:45Z

📝 Walkthrough

Walkthrough

This PR introduces a dedicated lane supervisor system that periodically ensures FM-11 lanes remain alive. It refactors the rate-limit watchdog to remove dead-lane restart logic, delegates that responsibility to a new periodic supervisor script driven by systemd, and updates the lane service template to enable automatic restarts with rate limiting.

Changes

FM-11 Lane Supervisor System

Layer / File(s)	Summary
Rate-limit watchdog refactoring `scripts/hapax-lane-rate-limit-watchdog`, `tests/scripts/test_lane_watchdog_methodology_dispatch.py`	Removes dead-lane auto-restart logic and adds `HEADLESS_PIPE_DIR` configuration for test sandboxing. Watchdog now delegates dead-lane restart responsibility to the supervisor and focuses solely on quota-aware re-kicking of live idle lanes.
Lane supervisor core `scripts/hapax-lane-supervisor`	New script implementing liveness checks (Claude via PIDfile/tmux, Codex/Antigrav via tmux), per-lane restart spam control (cooldown + burst-limit), claimed-task resolution from vault, and kind-specific respawn commands (Claude headless/idle-await depending on task, Codex/Antigrav tmux idle-await).
Lane service unit restart policy `systemd/units/hapax-claude-lane@.service`	Enables `Restart=always` (replacing `no`), adds `RestartSec=10` inter-restart delay, and configures `StartLimitIntervalSec=300` and `StartLimitBurst=5` to bound restart frequency.
Supervisor systemd units `systemd/units/hapax-lane-supervisor.service`, `systemd/units/hapax-lane-supervisor.timer`	New service unit (oneshot, 120s timeout, journald logging) and timer unit (60-second intervals after boot) that run the supervisor to guarantee lane liveness.
Supervisor behavior tests `tests/scripts/test_lane_supervisor.py`	Comprehensive pytest suite: dead-lane respawn paths, liveness skipping, guardrails (cooldown, burst-limit, dry-run, missing worktree), test harness, and shell-syntax validation.
Systemd unit validation `tests/systemd/test_lane_supervisor_units.py`	Integration tests confirming supervisor service/timer configuration, Claude lane template restart policy and rate limits, and per-lane task environment wiring.

Sequence Diagram

sequenceDiagram
  participant Timer as systemd Timer
  participant Supervisor as hapax-lane-supervisor
  participant Alive as Lane Liveness Check
  participant Task as Task Vault Lookup
  participant Respawn as Lane Respawn
  participant Lane as Claude/Codex/Antigrav Lane

  Timer->>Supervisor: Trigger every 60s
  Supervisor->>Alive: Check if lane alive (PIDfile/tmux)
  alt Lane is dead
    Supervisor->>Task: Resolve claimed task (if Claude)
    Task-->>Supervisor: task_id, title or empty
    Supervisor->>Respawn: Execute kind-specific respawn
    alt Claude with claimed task
      Respawn->>Lane: Launch headless (setsid/disown)
    else Claude without task
      Respawn->>Lane: Launch read-only idle-await
    else Codex or Antigrav
      Respawn->>Lane: Launch tmux idle-await session
    end
    Supervisor->>Supervisor: Record restart timestamp (spam control)
  else Lane is alive
    Supervisor-->>Supervisor: Skip this lane
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

hapax-systems/hapax-council#3047: Lane watchdog's quota-blocked receipt filtering (both PRs modify scripts/hapax-lane-rate-limit-watchdog around receipt handling).
hapax-systems/hapax-council#3387: Dead-lane restart behavior changes in the watchdog (both PRs modify the same script with conflicting dead-lane restart logic).

Poem

🐰 A supervisor hops through the lanes each minute,
Checking if Claude and Codex are still in it.
Dead lanes respawn with careful control,
No spam, no flood—just keeping them whole. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 37.93% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly and clearly describes the main change: introducing a lane supervisor that ensures dead lanes always auto-restart, matching the primary objective of the PR.
Description check	✅ Passed	The description includes all required template sections: a clear summary of the problem and solution, AuthorityCase/Slice metadata, comprehensive test plan with specific test counts, and CLAUDE.md hygiene checklist.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch delta/reform-fix-lane-supervisor-20260531

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 96085b0385

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-31T20:21:45Z

+    if [ -n "$pid" ] && kill -0 "$pid" 2>/dev/null; then
+      return 0


Track the headless wrapper instead of its child pid

When a Claude lane is already running under hapax-claude-headless, this pidfile only contains the current inner claude child PID; the wrapper writes that PID before wait and then sleeps before restarting it (scripts/hapax-claude-headless lines 165-188). If the child exits or hits a quota/error and the 60s supervisor timer runs during that backoff window, kill -0 fails even though the wrapper is alive, so this new supervisor starts a second headless process on the same lane/task/worktree. That can create concurrent mutating agents for one claim; the liveness check needs to distinguish the supervisor wrapper (or another durable marker like a live FIFO owned by the wrapper) from the transient child PID.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-31T20:21:45Z

+# the operator-attended integrator, never auto-respawned). cx-*/antigrav default
+# to the operator-designated always-on set; all overridable for ops + tests.
+CLAUDE_LANES="${HAPAX_SUPERVISOR_CLAUDE_LANES:-beta gamma delta epsilon zeta eta theta}"
+CODEX_LANES="${HAPAX_SUPERVISOR_CODEX_LANES:-cx-cpu-bridge cx-effect-drift-current cx-preset-storm cx-visual-bandwidth}"


Include required Codex lanes in the default sweep

With the new timer service using this default environment, HAPAX_SUPERVISOR_CODEX_LANES is not set anywhere by the unit, so only these four task-named cx-* sessions are supervised. The repo instructions identify cx-red and protected cx-violet as lanes that require screen/session continuity (AGENTS.md lines 10-13), but a dead hapax-codex-cx-red or hapax-codex-cx-violet will never be checked unless an operator separately overrides the env, which defeats the advertised always-on cx coverage.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-31T20:21:45Z

 EnvironmentFile=-%h/.cache/hapax/claude-headless/%i.env
 ExecStart=/usr/bin/env bash -lc 'exec "$HOME/.local/bin/hapax-methodology-dispatch" --lane "%i" --platform claude --mode headless --task "${HAPAX_DISPATCH_TASK:?HAPAX_DISPATCH_TASK required in per-lane env}" --launch'
-Restart=no
+Restart=always


Avoid restarting a dispatch path without MQ binding

For the systemd-managed Claude lanes, changing this to Restart=always repeatedly invokes the same ExecStart, but that command only passes --task from HAPAX_DISPATCH_TASK and never provides the durable MQ message id required by scripts/hapax-methodology-dispatch for mutable --launch calls (it blocks with strict_mq_message_id_required at lines 1284-1299 unless HAPAX_METHODOLOGY_DISPATCH_MESSAGE_ID is present). In the normal per-lane env this turns a failed launch into a StartLimit crash loop rather than restoring the lane, so the restart path needs to carry the dispatch message id or avoid the strict launch route.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

tests/scripts/test_lane_supervisor.py (2)

251-259: ⚡ Quick win

Consider adding a live-skip test for antigrav lanes.

The test suite includes test_supervisor_skips_live_codex_lane (line 237) but lacks a parallel test_supervisor_skips_live_antigrav_lane. Since antigrav uses the same tmux-based liveness detection as codex, adding this test would improve coverage symmetry.

📋 Proposed additional test

Add after line 259:

def test_supervisor_skips_live_antigrav_lane(tmp_path: Path) -> None:
    env, calls = _base(
        tmp_path,
        HAPAX_SUPERVISOR_ANTIGRAV_LANES="antigrav",
        TMUX_LIVE="hapax-antigrav-antigrav",
    )
    _make_worktree(env, "antigrav")

    result = _run(env)

    assert result.returncode == 0, result.stderr
    assert _reads(calls, "antigrav.txt") == ""

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/scripts/test_lane_supervisor.py` around lines 251 - 259, Add a new test
mirroring test_supervisor_skips_live_codex_lane named
test_supervisor_skips_live_antigrav_lane: call _base(...) with
HAPAX_SUPERVISOR_ANTIGRAV_LANES="antigrav" and
TMUX_LIVE="hapax-antigrav-antigrav", create the worktree with
_make_worktree(env, "antigrav"), run the supervisor with _run(env), then assert
result.returncode == 0 and assert _reads(calls, "antigrav.txt") == "" to verify
the supervisor skips a live antigrav lane.

275-288: 💤 Low value

Consider asserting on log messages in guardrail tests.

Both test_supervisor_respects_restart_cooldown and test_supervisor_burst_limit_backs_off verify the absence of extra launches but do not assert that the supervisor logged why it skipped (cooldown window vs. burst limit hit). Adding assertions on result.stdout would improve observability and strengthen confidence that the right guardrail triggered.

📋 Example enhancements

In test_supervisor_respects_restart_cooldown:

     # Second pass within cooldown must NOT respawn again.
-    _run(env)
+    result = _run(env)
     assert _reads(calls, "claude.txt") == first
+    assert "cooldown" in result.stdout.lower()

In test_supervisor_burst_limit_backs_off:

     for _ in range(4):
-        _run(env)
+        result = _run(env)
 
     launches = [ln for ln in _reads(calls, "claude.txt").splitlines() if ln.strip()]
     assert len(launches) == 2  # capped at burst limit
+    assert "burst" in result.stdout.lower() or "limit" in result.stdout.lower()

Also applies to: 306-321

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/scripts/test_lane_supervisor.py` around lines 275 - 288, The tests
currently check that no new launch occurred but don't assert the supervisor
logged why; update test_supervisor_respects_restart_cooldown and
test_supervisor_burst_limit_backs_off to capture the result from the second
_run() call (e.g., result = _run(env)) and add an assertion on result.stdout
that the supervisor emitted a clear explanatory message—for the cooldown test
assert the stdout contains keywords like "skipping" and "cooldown" (referencing
test_supervisor_respects_restart_cooldown, _run, and _reads/claude.txt), and for
the burst-limit test assert stdout contains keywords like "skipping" and "burst
limit" (referencing test_supervisor_burst_limit_backs_off and _run) so the
guardrail reason is verifiably logged.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/hapax-lane-supervisor`:
- Around line 82-95: The restart-log is allowed to grow unbounded because the
checker (the while read -r ts loop that reads "$logf" and compares timestamps to
cutoff using BURST_WINDOW_S and increments count towards BURST_LIMIT) scans the
whole file each tick while record_restart() only appends; fix by
truncating/rotating the log when recording a restart: update record_restart() to
append the timestamp then trim the file to a bounded size (either remove entries
older than now - BURST_WINDOW_S or keep only the last N lines) so the subsequent
loop only ever reads a small, recent set of timestamps; ensure you still write
"$STATE_DIR/$1.last-restart" and use the same variables now, BURST_WINDOW_S, and
BURST_LIMIT.

In `@tests/scripts/test_lane_supervisor.py`:
- Around line 31-39: The bash snippet in _write_recorder interpolates the Path
log directly into the script which can allow shell injection or break on spaces;
escape or shell-quote the path before embedding it (e.g., use
shlex.quote(str(log))) when calling _write_executable so the redirect target is
safe, and keep the reference to the log target inside the generated script (the
change is local to _write_recorder and affects how the "{log}" token is produced
for _write_executable).

---

Nitpick comments:
In `@tests/scripts/test_lane_supervisor.py`:
- Around line 251-259: Add a new test mirroring
test_supervisor_skips_live_codex_lane named
test_supervisor_skips_live_antigrav_lane: call _base(...) with
HAPAX_SUPERVISOR_ANTIGRAV_LANES="antigrav" and
TMUX_LIVE="hapax-antigrav-antigrav", create the worktree with
_make_worktree(env, "antigrav"), run the supervisor with _run(env), then assert
result.returncode == 0 and assert _reads(calls, "antigrav.txt") == "" to verify
the supervisor skips a live antigrav lane.
- Around line 275-288: The tests currently check that no new launch occurred but
don't assert the supervisor logged why; update
test_supervisor_respects_restart_cooldown and
test_supervisor_burst_limit_backs_off to capture the result from the second
_run() call (e.g., result = _run(env)) and add an assertion on result.stdout
that the supervisor emitted a clear explanatory message—for the cooldown test
assert the stdout contains keywords like "skipping" and "cooldown" (referencing
test_supervisor_respects_restart_cooldown, _run, and _reads/claude.txt), and for
the burst-limit test assert stdout contains keywords like "skipping" and "burst
limit" (referencing test_supervisor_burst_limit_backs_off and _run) so the
guardrail reason is verifiably logged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: e13a022d-e1e7-43d5-9592-555cb4956f9b

📥 Commits

Reviewing files that changed from the base of the PR and between 1dd37ed and 96085b0.

📒 Files selected for processing (8)

scripts/hapax-lane-rate-limit-watchdog
scripts/hapax-lane-supervisor
systemd/units/hapax-claude-lane@.service
systemd/units/hapax-lane-supervisor.service
systemd/units/hapax-lane-supervisor.timer
tests/scripts/test_lane_supervisor.py
tests/scripts/test_lane_watchdog_methodology_dispatch.py
tests/systemd/test_lane_supervisor_units.py

coderabbitai · 2026-05-31T20:23:54Z

+  local logf="$STATE_DIR/$1.restart-log" cutoff count=0 ts
+  [ -f "$logf" ] || return 1
+  cutoff=$(( now - BURST_WINDOW_S ))
+  while read -r ts; do
+    [ -z "$ts" ] && continue
+    [ "$ts" -ge "$cutoff" ] && count=$(( count + 1 ))
+  done < "$logf"
+  [ "$count" -ge "$BURST_LIMIT" ]
+}
+
+record_restart() {
+  echo "$now" > "$STATE_DIR/$1.last-restart"
+  echo "$now" >> "$STATE_DIR/$1.restart-log"
+}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Bound restart-log growth to keep supervisor runtime stable.

Line 82–89 scans full *.restart-log files each tick, while Line 94 only appends. For repeatedly dead lanes, this becomes unbounded and can eventually threaten the oneshot timeout budget.

💡 Suggested patch

burst_exhausted() { - local logf="$STATE_DIR/$1.restart-log" cutoff count=0 ts + local logf="$STATE_DIR/$1.restart-log" cutoff count [ -f "$logf" ] || return 1 cutoff=$(( now - BURST_WINDOW_S )) - while read -r ts; do - [ -z "$ts" ] && continue - [ "$ts" -ge "$cutoff" ] && count=$(( count + 1 )) - done < "$logf" + count="$(awk -v cutoff="$cutoff" '$1 >= cutoff {c++} END {print c+0}' "$logf" 2>/dev/null || echo 0)" [ "$count" -ge "$BURST_LIMIT" ] } record_restart() { - echo "$now" > "$STATE_DIR/$1.last-restart" - echo "$now" >> "$STATE_DIR/$1.restart-log" + local lane="$1" logf="$STATE_DIR/$1.restart-log" cutoff + echo "$now" > "$STATE_DIR/$lane.last-restart" + cutoff=$(( now - BURST_WINDOW_S )) + if [ -f "$logf" ]; then + awk -v cutoff="$cutoff" '$1 >= cutoff' "$logf" > "$logf.tmp" 2>/dev/null || true + mv "$logf.tmp" "$logf" + fi + echo "$now" >> "$logf" }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

local logf="$STATE_DIR/$1.restart-log" cutoff count=0 ts

[ -f "$logf" ] || return 1

cutoff=$(( now - BURST_WINDOW_S ))

while read -r ts; do

[ -z "$ts" ] && continue

[ "$ts" -ge "$cutoff" ] && count=$(( count + 1 ))

done < "$logf"

[ "$count" -ge "$BURST_LIMIT" ]

}

record_restart() {

echo "$now" > "$STATE_DIR/$1.last-restart"

echo "$now" >> "$STATE_DIR/$1.restart-log"

}

local logf="$STATE_DIR/$1.restart-log" cutoff count

[ -f "$logf" ] || return 1

cutoff=$(( now - BURST_WINDOW_S ))

count="$(awk -v cutoff="$cutoff" '$1 >= cutoff {c++} END {print c+0}' "$logf" 2>/dev/null || echo 0)"

[ "$count" -ge "$BURST_LIMIT" ]

}

record_restart() {

local lane="$1" logf="$STATE_DIR/$1.restart-log" cutoff

echo "$now" > "$STATE_DIR/$lane.last-restart"

cutoff=$(( now - BURST_WINDOW_S ))

if [ -f "$logf" ]; then

awk -v cutoff="$cutoff" '$1 >= cutoff' "$logf" > "$logf.tmp" 2>/dev/null || true

mv "$logf.tmp" "$logf"

fi

echo "$now" >> "$logf"

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/hapax-lane-supervisor` around lines 82 - 95, The restart-log is allowed to grow unbounded because the checker (the while read -r ts loop that reads "$logf" and compares timestamps to cutoff using BURST_WINDOW_S and increments count towards BURST_LIMIT) scans the whole file each tick while record_restart() only appends; fix by truncating/rotating the log when recording a restart: update record_restart() to append the timestamp then trim the file to a bounded size (either remove entries older than now - BURST_WINDOW_S or keep only the last N lines) so the subsequent loop only ever reads a small, recent set of timestamps; ensure you still write "$STATE_DIR/$1.last-restart" and use the same variables now, BURST_WINDOW_S, and BURST_LIMIT.

coderabbitai · 2026-05-31T20:23:54Z

+def _write_recorder(path: Path, log: Path) -> None:
+    """A fake launcher that records its argv to ``log`` and exits 0."""
+    _write_executable(
+        path,
+        f"""
+        #!/usr/bin/env bash
+        printf '%s\\n' "$*" >> "{log}"
+        """,
+    )


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Quote the log path to prevent shell injection.

The log path is directly interpolated into the bash script without proper quoting. If log contains spaces or special characters, the script will break.

🛡️ Proposed fix with proper path quoting

def _write_recorder(path: Path, log: Path) -> None: """A fake launcher that records its argv to ``log`` and exits 0.""" + # Escape the log path for safe shell interpolation + log_escaped = str(log).replace("'", "'\"'\"'") _write_executable( path, f""" #!/usr/bin/env bash - printf '%s\\n' "$*" >> "{log}" + printf '%s\\n' "$*" >> '{log_escaped}' """, )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/scripts/test_lane_supervisor.py` around lines 31 - 39, The bash snippet in _write_recorder interpolates the Path log directly into the script which can allow shell injection or break on spaces; escape or shell-quote the path before embedding it (e.g., use shlex.quote(str(log))) when calling _write_executable so the redirect target is safe, and keep the reference to the log target inside the generated script (the change is local to _write_recorder and affects how the "{log}" token is produced for _write_executable).

chatgpt-codex-connector Bot reviewed May 31, 2026

View reviewed changes

coderabbitai Bot reviewed May 31, 2026

View reviewed changes

ryanklee added this pull request to the merge queue Jun 1, 2026

Merged via the queue into main with commit a42e777 Jun 1, 2026
38 checks passed

ryanklee deleted the delta/reform-fix-lane-supervisor-20260531 branch June 1, 2026 00:31

coderabbitai Bot mentioned this pull request Jun 1, 2026

fix(sdlc): deploy auto-enables marked timers/services + activate FM-11 lane supervisor #3817

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sdlc): FM-11 lane supervisor — dead lanes always auto-restart#3803

feat(sdlc): FM-11 lane supervisor — dead lanes always auto-restart#3803
ryanklee merged 1 commit into
mainfrom
delta/reform-fix-lane-supervisor-20260531

ryanklee commented May 31, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 31, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 31, 2026

Uh oh!

chatgpt-codex-connector Bot May 31, 2026

Uh oh!

chatgpt-codex-connector Bot May 31, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 31, 2026

Uh oh!

coderabbitai Bot May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if [ -n "$pid" ] && kill -0 "$pid" 2>/dev/null; then
		return 0

Conversation

ryanklee commented May 31, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

FM-11 lane supervisor — dead lanes always auto-restart

Problem

Change

Acceptance criteria

Tests

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ryanklee commented May 31, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 31, 2026 •

edited

Loading