Skip to content

fix(sf): keystone ApplyShellRunDefaults — swap JsonMerge args so shellDefaults wins#291

Merged
cipher813 merged 1 commit into
mainfrom
fix/sf-keystone-shell-defaults-win
May 22, 2026
Merged

fix(sf): keystone ApplyShellRunDefaults — swap JsonMerge args so shellDefaults wins#291
cipher813 merged 1 commit into
mainfrom
fix/sf-keystone-shell-defaults-win

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

Fixes the keystone bug surfaced immediately after the 2026-05-22 trap-escape fix landed (PRs #289/#290/lib #57/dashboard #118/predictor #188/backtester #243). The post-fix verification execution postfix-verify-20260522T224450Z passed MorningEnrich cleanly (trap fix worked) and proceeded into DataPhase1 — where the spot booted and started collecting short interest / universe returns / fundamentals at full power instead of running --preflight-only. Aborted at that point.

Root cause

ApplyShellRunDefaults was:

States.JsonMerge(shellDefaults, $, false)

Per AWS docs: "second arg wins" → $ wins.

The original comment justified this as "user input ($) wins for per-flag overrides." But $ at ApplyShellRunDefaults entry is NOT raw user input — InitializeInput has already merged its defaults blob {preflight_args: "", research_dry: false, data_phase2_dry: false, regime_action: "produce"} into $. So "$ wins" meant the NON-DRY identity defaults won over the dry shellDefaults. Every shell_run=true execution silently fell back to a full-fat Saturday run.

The existing test test_user_input_wins_over_shell_defaults was hard-asserting the broken arg order — the bug had test coverage, but the test was wrong about what it was asserting (confused $ with raw user input).

Why this never fired before

Memory hint: "shipped disabled" → first execution of the mechanism today caught both bugs in sequence.

Fix

Swap arg order: States.JsonMerge($, shellDefaults, false) → shellDefaults wins.

Skip flag overrides UNAFFECTED. shellDefaults contains no skip_* keys, so {shell_run: true, skip_backtester: true} still passes skip_backtester through $ unchanged to the downstream Choice gate. Only the 4 keystone control vars (preflight_args / research_dry / data_phase2_dry / regime_action) are forced to dry when shell_run=true.

The previously-claimed user-override case for keystone control vars (e.g., {shell_run: true, preflight_args: ""}) is now intentionally lost — passing shell_run=true means you want dry mode for ALL keystone control vars; partial override within the dry-control-var set was a footgun, not a feature.

Test changes

  • test_user_input_wins_over_shell_defaults flipped + renamed to test_shell_defaults_win_over_current_state with a long docstring explaining why the prior order broke and what semantic the new order enforces.
  • Full SF suite 374/374 green.
  • Full repo suite 1430/1430 green.

Test plan

  • Suite green locally (1430/1430)
  • Merge
  • Run bash infrastructure/deploy_step_function.sh to push to AWS
  • Fire fresh dry-pass with {"shell_run": true, ...} input
  • Confirm DataPhase1 spot launches with bash spot_data_weekly.sh --phase1-only --preflight-only (NOT just --phase1-only)
  • Full dry-pass completes in ~15-25 min (not 3-4 hours)
  • Saturday 09:00 UTC firing runs cleanly

🤖 Generated with Claude Code

…lDefaults wins

The prior merge order silently downgraded every shell_run=true execution
to a full-fat real Saturday run instead of --preflight-only mode.

Caught 2026-05-22 evening, immediately after the Saturday-SF trap-escape
fix landed (lib v0.25.0 + 5 cross-repo pin bumps + SF JSON conversion).
The post-fix verification execution `postfix-verify-20260522T224450Z`
passed MorningEnrich cleanly (trap fix worked) and proceeded into
DataPhase1 — where the spot booted and started collecting short
interest, universe returns, fundamentals at FULL POWER. Aborted at
that point to fix this; otherwise the dry-pass would have spent ~$50
LLM + ~3h compute on what was supposed to be a 15-25min preflight
exercise.

**Root cause:**
ApplyShellRunDefaults was `States.JsonMerge(shellDefaults, $, false)`
— "second arg wins" per AWS docs, so `$` wins over shellDefaults.

The original comment justified this as "user input ($) wins for
per-flag overrides." But `$` at ApplyShellRunDefaults entry is NOT raw
user input — InitializeInput has already merged its defaults blob
`{preflight_args: "", research_dry: false, data_phase2_dry: false,
regime_action: "produce"}` into `$`. So "$ wins" meant the NON-DRY
identity defaults won over the dry shellDefaults.

The mechanism produced byte-identical SSM commands to the absent path
(real Saturday) regardless of shell_run=true. Test
`test_user_input_wins_over_shell_defaults` was hard-asserting the
broken order — the bug had test coverage, but the test was wrong
about what it was asserting.

**Why this never fired before:**
The Friday-PM dry-pass EventBridge rule shipped DISABLED (#258), and
the prior trap-escape bug masked any execution past MorningEnrich. The
2026-05-22 Friday-PM dry-pass was the first execution to reach
ApplyShellRunDefaults' actual semantic effect — and immediately
surfaced this gap once the trap fix unblocked MorningEnrich.

**Fix:**
Swap arg order to `States.JsonMerge($, shellDefaults, false)` so
shellDefaults wins over $. The keystone control vars (preflight_args /
research_dry / data_phase2_dry / regime_action) are now forced to
their dry values when shell_run=true.

Skip-flag overrides are UNAFFECTED — shellDefaults contains no skip_*
keys, so `{shell_run: true, skip_backtester: true}` still passes
skip_backtester through $ unchanged to the downstream Choice gate.
The previously-claimed user-override-from-input case for the keystone
control vars (e.g., `{shell_run: true, preflight_args: ""}`) is now
intentionally lost — passing shell_run=true means you want dry mode
for ALL keystone control vars; partial override within the dry-control
set was a footgun, not a feature.

**Tests:**
- `test_user_input_wins_over_shell_defaults` flipped + renamed to
  `test_shell_defaults_win_over_current_state` with a long docstring
  explaining why the prior order broke and what semantic the new
  order enforces.
- Full SF suite 374/374 green.
- Full repo suite 1430/1430 green.

Composes with [[feedback_no_silent_fails]] (the bug silently downgraded
to a less-safe execution mode rather than failing loud) +
[[feedback_test_must_pin_production_behavior_not_just_current_code]]
(the test was asserting code shape, not semantic intent).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit 3b88684 into main May 22, 2026
1 check passed
@cipher813 cipher813 deleted the fix/sf-keystone-shell-defaults-win branch May 22, 2026 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant