fix(executor): make the SF the sole daemon-start authority (rm daemon.timer)#194
Merged
Conversation
….timer)
2026-05-19: the weekday SF FAILED at MorningEnrich, yet the trading
daemon still came online at 06:29 PT with no order book. Root cause is
NOT random drift — it is structural:
boot-pull.sh force-enables EVERY *.timer in infrastructure/systemd/ on
every boot ("every shipped timer MUST be enabled" reconciliation,
added for the 2026-04-21 eod.timer incident). alpha-engine-daemon.timer
shipped in the repo, so boot-pull re-enabled it every boot and it fired
on its wall clock (Mon..Fri 09:29 ET) independent of SF state/success.
setup-trading-ec2.sh's "copied but NOT enabled / break-glass" comment
was therefore false — the timer could never stay disabled.
Fix: delete infrastructure/systemd/alpha-engine-daemon.timer. boot-pull's
orphan reconciliation (no repo source → `disable --now` + rm) retires it
from every instance on the next boot, and it can never be re-enabled.
The .service unit stays — SF `RunDaemon` does `systemctl restart` on it,
and services (unlike timers) are not force-enabled by boot-pull, so the
ONLY path that ever starts the daemon is the SF RunDaemon step, which is
only reached after RunMorningPlanner has written today's order book.
setup-trading-ec2.sh updated to drop the timer from the copy list and
record the boot-pull-force-enable invariant so a daemon timer is never
reintroduced.
xvfb/ibgateway boot-autostart retained: the instance only boots because
the SF's StartExecutorEC2 starts it, so IB Gateway is already
transitively SF-gated and must be ready before RunMorningPlanner
(wait-for-ibgateway.sh). No daemon.py change — the existing wait-loop's
deliberate late-start recovery affordance is preserved; the structural
fix alone fully satisfies "SF is the sole daemon-start authority".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The bug (2026-05-19)
The weekday SF FAILED at
MorningEnrich, yet the trading daemon still came online at 06:29 PT (idling, "no order book at market open"). You asked that all non-SF daemon triggers be disabled.Root cause — structural, not drift
boot-pull.shforce-enables every*.timerininfrastructure/systemd/on every boot (the "every shipped timer MUST be enabled" reconciliation, added for the 2026-04-21eod.timerincident).alpha-engine-daemon.timershipped in the repo, so boot-pull re-enabled it on every boot and it fired on its wall clock (Mon..Fri 09:29 ET= 06:29 PT) independent of SF state or success.setup-trading-ec2.sh's "copied but NOT enabled / break-glass" comment was therefore false — the timer could never stay disabled. (Same mechanism behind the 2026-05-05 stale-predictions incident.)Fix
infrastructure/systemd/alpha-engine-daemon.timer. boot-pull's orphan reconciliation (/etc/systemd/system/alpha-engine-*.timerwith no repo source →systemctl disable --now+rm) retires it from every instance on the next boot, and it can never be re-enabled..serviceunit stays — SFRunDaemondoessystemctl restarton it; services (unlike timers) are not force-enabled by boot-pull. So the only path that ever starts the daemon is the SFRunDaemonstep, which is only reached afterRunMorningPlannerwrote today's order book.setup-trading-ec2.shupdated: timer dropped from the copy list; the boot-pull force-enable invariant is documented so a daemon timer is never reintroduced.Deliberately unchanged
StartExecutorEC2starts it, so IB Gateway is already transitively SF-gated and must be ready beforeRunMorningPlanner(wait-for-ibgateway.sh). This satisfies "SF should be booting the trading instance and IB Gateway".daemon.pychange: its existing wait-loop has a deliberate, documented late-start recovery affordance (late predictor inference / morning batch). The structural fix alone fully meets the requirement; adding a hard order-book gate would regress that affordance.Live instance
alpha-engine-daemon.timeris being disabled+stopped oni-018eb3307a21329bfnow (out of band) so it does not fire again at 06:29 PT before this merges+deploys. Today's idling daemon is left to self-exit normally at market close (EOD is EventBridge-scheduled 13:05 PT; an early stop is unnecessary and avoided).🤖 Generated with Claude Code