Skip to content

fix(executor): make the SF the sole daemon-start authority (rm daemon.timer)#194

Merged
cipher813 merged 1 commit into
mainfrom
fix/daemon-sf-only-no-timer
May 19, 2026
Merged

fix(executor): make the SF the sole daemon-start authority (rm daemon.timer)#194
cipher813 merged 1 commit into
mainfrom
fix/daemon-sf-only-no-timer

Conversation

@cipher813
Copy link
Copy Markdown
Owner

The bug (2026-05-19)

The weekday SF FAILED at MorningEnrich, yet the trading daemon still came online at 06:29 PT (idling, "no order book at market open"). You asked that all non-SF daemon triggers be disabled.

Root cause — structural, not drift

boot-pull.sh force-enables every *.timer in infrastructure/systemd/ on every boot (the "every shipped timer MUST be enabled" reconciliation, added for the 2026-04-21 eod.timer incident). alpha-engine-daemon.timer shipped in the repo, so boot-pull re-enabled it on every boot and it fired on its wall clock (Mon..Fri 09:29 ET = 06:29 PT) independent of SF state or success. setup-trading-ec2.sh's "copied but NOT enabled / break-glass" comment was therefore false — the timer could never stay disabled. (Same mechanism behind the 2026-05-05 stale-predictions incident.)

Fix

  • Delete infrastructure/systemd/alpha-engine-daemon.timer. boot-pull's orphan reconciliation (/etc/systemd/system/alpha-engine-*.timer with no repo source → systemctl disable --now + rm) retires it from every instance on the next boot, and it can never be re-enabled.
  • The .service unit stays — SF RunDaemon does systemctl restart on it; services (unlike timers) are not force-enabled by boot-pull. So the only path that ever starts the daemon is the SF RunDaemon step, which is only reached after RunMorningPlanner wrote today's order book.
  • setup-trading-ec2.sh updated: timer dropped from the copy list; the boot-pull force-enable invariant is documented so a daemon timer is never reintroduced.

Deliberately unchanged

  • xvfb/ibgateway boot-autostart: the instance only boots because the SF's StartExecutorEC2 starts it, so IB Gateway is already transitively SF-gated and must be ready before RunMorningPlanner (wait-for-ibgateway.sh). This satisfies "SF should be booting the trading instance and IB Gateway".
  • No daemon.py change: its existing wait-loop has a deliberate, documented late-start recovery affordance (late predictor inference / morning batch). The structural fix alone fully meets the requirement; adding a hard order-book gate would regress that affordance.

Live instance

alpha-engine-daemon.timer is being disabled+stopped on i-018eb3307a21329bf now (out of band) so it does not fire again at 06:29 PT before this merges+deploys. Today's idling daemon is left to self-exit normally at market close (EOD is EventBridge-scheduled 13:05 PT; an early stop is unnecessary and avoided).

🤖 Generated with Claude Code

….timer)

2026-05-19: the weekday SF FAILED at MorningEnrich, yet the trading
daemon still came online at 06:29 PT with no order book. Root cause is
NOT random drift — it is structural:

  boot-pull.sh force-enables EVERY *.timer in infrastructure/systemd/ on
  every boot ("every shipped timer MUST be enabled" reconciliation,
  added for the 2026-04-21 eod.timer incident). alpha-engine-daemon.timer
  shipped in the repo, so boot-pull re-enabled it every boot and it fired
  on its wall clock (Mon..Fri 09:29 ET) independent of SF state/success.
  setup-trading-ec2.sh's "copied but NOT enabled / break-glass" comment
  was therefore false — the timer could never stay disabled.

Fix: delete infrastructure/systemd/alpha-engine-daemon.timer. boot-pull's
orphan reconciliation (no repo source → `disable --now` + rm) retires it
from every instance on the next boot, and it can never be re-enabled.
The .service unit stays — SF `RunDaemon` does `systemctl restart` on it,
and services (unlike timers) are not force-enabled by boot-pull, so the
ONLY path that ever starts the daemon is the SF RunDaemon step, which is
only reached after RunMorningPlanner has written today's order book.
setup-trading-ec2.sh updated to drop the timer from the copy list and
record the boot-pull-force-enable invariant so a daemon timer is never
reintroduced.

xvfb/ibgateway boot-autostart retained: the instance only boots because
the SF's StartExecutorEC2 starts it, so IB Gateway is already
transitively SF-gated and must be ready before RunMorningPlanner
(wait-for-ibgateway.sh). No daemon.py change — the existing wait-loop's
deliberate late-start recovery affordance is preserved; the structural
fix alone fully satisfies "SF is the sole daemon-start authority".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit 8e2eea2 into main May 19, 2026
1 check passed
@cipher813 cipher813 deleted the fix/daemon-sf-only-no-timer branch May 19, 2026 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant