Skip to content

feat(spot-data-weekly): SSH/SCP → SSM transport via lib chokepoint (L342 PR 2)#330

Merged
cipher813 merged 1 commit into
mainfrom
feat-l342-pr2-spot-data-weekly-ssm-260527
May 27, 2026
Merged

feat(spot-data-weekly): SSH/SCP → SSM transport via lib chokepoint (L342 PR 2)#330
cipher813 merged 1 commit into
mainfrom
feat-l342-pr2-spot-data-weekly-ssm-260527

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

Migrates infrastructure/spot_data_weekly.sh off SSH+SCP onto the alpha-engine-lib v0.35.0+ ssm_dispatcher chokepoint (python -m alpha_engine_lib.ssm_dispatcher run). Closes the (i) alive-SSH-path finding from the 2026-05-24 audit; PR 2 of the 5-PR ROADMAP L342 arc.

Transport changes

Surface Before (SSH/SCP) After (SSM via lib)
Connectivity wait SSH wait loop (ssh ... "echo ok", 30 × 5s) aws ssm describe-instance-information PingStatus poll (36 × 5s)
Remote dispatch run_remote "..." (ssh -i $KEY_FILE) run_ssm "<desc>" <timeout> <<HEREDOC (lib CLI via --script-stdin)
Config upload scp -i $KEY_FILE config.yaml ec2-user@$IP:... Dispatcher aws s3 cp to tmp/spot_data_weekly/<run_id>/config.yaml; spot pulls via existing IAM s3:GetObject
Python detection REMOTE_PYTHON=$(ssh ... "command -v python3.12") captured to dispatcher var `PYTHON_BIN=$(command -v python3.12
SSH key KEY_FILE=$HOME/.ssh/alpha-engine-key.pem Removed; KEY_NAME kept ONLY for ec2_spot --key-name (break-glass operator SSH only)

Why pipe heredocs via --script-stdin instead of mirroring predictor's inline "$(cat <<HEREDOC ... HEREDOC)" pattern: the data path's RAG-secrets block contains aws ssm get-parameter --query 'Parameter.Value' ... inside $(...). The outer command-substitution scanner sees the inner single quotes and breaks. The lib CLI's --script-stdin flag reads the body verbatim, so the dispatcher's bash parser never scans the inner script for quote/paren balance. Future PRs adopting ssm_dispatcher should prefer --script-stdin for any non-trivial spot-side script body.

CI guards (8 new tests)

tests/test_spot_data_weekly_ssm_transport.py:

  • test_spot_data_weekly_script_exists — script presence
  • test_no_top_level_ssh_invocation — no ssh -X / ssh outside comments
  • test_no_top_level_scp_invocation — no scp -X / scp outside comments
  • test_no_ssh_keyscan_invocation — no ssh-keyscan github.com re-introduce
  • test_uses_lib_ssm_dispatcher_chokepointalpha_engine_lib.ssm_dispatcher present
  • test_no_inline_aws_ssm_send_command — no direct aws ssm send-command (the predictor feat(sf): wire counterfactual rule fit Lambda into Saturday SF #168 pre-lift pattern L342 explicitly lifts to lib)
  • test_stages_config_via_s3aws s3 cp ... config.yaml present
  • test_no_residual_key_file_dispatch_use — no $KEY_FILE / $SSH_OPTS in non-comment lines

Test-fixture updates (no behavior change)

  • test_spot_env_source_aws_region.py — recognize the new multi-line read -r -d '' ENV_SOURCE <<'ENV_EOF' ... ENV_EOF shape in addition to the single-line ENV_SOURCE="...". The semantic invariant (ENV_SOURCE exports AWS_REGION + AWS_DEFAULT_REGION) is unchanged.
  • test_preflight_only_dry_path.py — accept run_ssm "workloads" as the workloads opener in addition to the pre-migration run_remote bash -s <<WORKLOADS.

Operator notes

  • The spot's IAM profile (alpha-engine-executor-profile) already grants AmazonSSMManagedInstanceCore via the predictor feat(sf): wire counterfactual rule fit Lambda into Saturday SF #168 migration; no IAM changes needed here.
  • Saturday SF first exercise: next alpha-engine-saturday cron firing runs MorningEnrich / DataPhase1 / RAGIngestion through the new transport. If any step fails for transport-shape reasons, recover via operator-launched SF redrive (NOT cron) — invokes the same script.
  • Port-22 inbound on sg-03cd3c4bd91e610b0 stays open until PR 5 (post-soak revoke). Manual operator SSH via key file remains as break-glass only.
  • PRs 3-4 will follow this pattern: alpha-engine-backtester spot_backtest.sh + alpha-engine-predictor spot_train.sh (predictor's existing inline run_ssm bash helper is what the arc exists to replace at the chokepoint level).

Test plan

  • Full ae-data suite passes (1626 passed, 1 skipped)
  • 8 new chokepoint tests pass (zero SSH/SCP/keyscan/inline-send-command surfaces in non-comment lines)
  • 2 updated fixture tests still pass (env_source + preflight_only_dry_path)
  • bash -n infrastructure/spot_data_weekly.sh syntax-clean
  • On merge: deploy.yml does NOT auto-rebuild Lambda (this is an EC2-side script, not Lambda); next Saturday SF cron firing (Sat 2026-05-31 ~02:00 PT) is the first real exercise
  • First clean Saturday SF on new transport → trigger PR 3 (backtester) → PR 4 (predictor) → PR 5 (port-22 SG revoke)

Composes with morning-signal #34 (lib chokepoint adoption precedent), alpha-engine-lib v0.35.0 (ssm_dispatcher module), and [[feedback_lift_invariants_to_chokepoint_after_second_recurrence]] (this is the second adopter — predictor was first, backtester will be third).

🤖 Generated with Claude Code

…342 PR 2)

Migrates infrastructure/spot_data_weekly.sh off SSH+SCP onto the
alpha-engine-lib v0.35.0+ `ssm_dispatcher` chokepoint
(`python -m alpha_engine_lib.ssm_dispatcher run`). Closes the (i)
alive-SSH-path finding from the 2026-05-24 audit; PR 2 of the 5-PR
ROADMAP L342 arc.

Transport changes:
- Wait-for-SSH loop → wait-for-SSM-Online (`aws ssm describe-instance-
  information` polling, 180s budget, mirrors predictor #168 pattern)
- `run_remote "..."` (ssh-based) → `run_ssm "<desc>" <timeout> <<HEREDOC`
  (lib CLI via --script-stdin)
- SCP config upload → S3 staging: dispatcher uploads
  alpha-engine-config/data/config.yaml to
  tmp/spot_data_weekly/<run_id>/config.yaml; spot pulls via existing
  alpha-engine-executor-profile IAM role's s3:GetObject grant
- REMOTE_PYTHON captured via SSH → PYTHON_BIN resolved inline per SSM
  step (`command -v python3.12 || command -v python3`)
- KEY_FILE / SSH_OPTS removed; KEY_NAME kept ONLY as launch attribute
  for alpha_engine_lib.ec2_spot's --key-name flag (break-glass operator
  SSH only — port-22 SG revoke is PR 5 of the arc)

Why pipe heredoc via --script-stdin instead of mirror predictor's inline
`"$(cat <<HEREDOC ... HEREDOC)"` pattern: the data path's RAG-secrets
block contains `aws ssm get-parameter --query 'Parameter.Value' ...`
inside `$(...)`. The outer command-substitution scanner sees the inner
single quotes and breaks. The lib CLI's --script-stdin flag reads the
body verbatim, so the dispatcher's bash parser never scans the inner
script for quote/paren balance. Future PRs adopting the lib chokepoint
(ssm_dispatcher) should prefer --script-stdin for any non-trivial
spot-side script body.

Cleanup: dispatcher trap also removes the S3 staging prefix on EXIT
(belt-and-suspenders — S3 lifecycle on tmp/ is the authoritative
purger).

CI guards (tests/test_spot_data_weekly_ssm_transport.py, 8 new tests):
- test_spot_data_weekly_script_exists — script presence
- test_no_top_level_ssh_invocation — no `ssh -X` / `ssh ` outside comments
- test_no_top_level_scp_invocation — no `scp -X` / `scp ` outside comments
- test_no_ssh_keyscan_invocation — no `ssh-keyscan github.com` re-introduce
- test_uses_lib_ssm_dispatcher_chokepoint — `alpha_engine_lib.ssm_dispatcher`
  present (catches a regression that replaces it with `aws ssm send-command`)
- test_no_inline_aws_ssm_send_command — no direct `aws ssm send-command`
  (the predictor #168 pre-lift pattern L342 explicitly lifts to lib)
- test_stages_config_via_s3 — `aws s3 cp ... config.yaml` present
- test_no_residual_key_file_dispatch_use — no $KEY_FILE / $SSH_OPTS in
  non-comment lines (KEY_NAME stays as launch attribute, allow-listed)

Test fixture updates (no behavior change):
- test_spot_env_source_aws_region.py — multi-line `read -r -d ''
  ENV_SOURCE <<'ENV_EOF' ... ENV_EOF` recognized in addition to the
  single-line `ENV_SOURCE="..."` shape. The semantic invariant
  (ENV_SOURCE exports AWS_REGION + AWS_DEFAULT_REGION) is unchanged.
- test_preflight_only_dry_path.py — accept `run_ssm "workloads"` as
  the workloads opener in addition to the pre-migration
  `run_remote bash -s <<WORKLOADS`.

Suite: 1618 → 1626 passed (+8).

Operator notes:
- The spot's IAM profile (alpha-engine-executor-profile) already
  grants AmazonSSMManagedInstanceCore via the predictor #168 migration
  (lib pin v0.35.0+ ships in same profile); no IAM changes needed
  here.
- Saturday SF first exercise: next Saturday SF firing
  (alpha-engine-saturday) runs MorningEnrich / DataPhase1 / RAGIngestion
  through the new transport. If any step fails for transport-shape
  reasons, recover via SF redrive (operator-launched, NOT cron) which
  invokes the same script.
- Port-22 inbound on sg-03cd3c4bd91e610b0 stays open until PR 5
  (post-soak revoke). Manual operator SSH via key file remains as
  break-glass only.
- PRs 3-4 will follow this pattern: alpha-engine-backtester
  spot_backtest.sh + alpha-engine-predictor spot_train.sh
  (predictor's existing inline run_ssm bash helper is what the arc
  exists to replace at the chokepoint level).

Composes with morning-signal #34 (lib chokepoint adoption precedent),
alpha-engine-lib v0.35.0 (ssm_dispatcher module), and
[[feedback_lift_invariants_to_chokepoint_after_second_recurrence]]
(this is the second adopter — predictor was first, backtester will be
third).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit 43e304b into main May 27, 2026
1 check passed
@cipher813 cipher813 deleted the feat-l342-pr2-spot-data-weekly-ssm-260527 branch May 27, 2026 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant