feat(spot-data-weekly): SSH/SCP → SSM transport via lib chokepoint (L342 PR 2)#330
Merged
Merged
Conversation
…342 PR 2) Migrates infrastructure/spot_data_weekly.sh off SSH+SCP onto the alpha-engine-lib v0.35.0+ `ssm_dispatcher` chokepoint (`python -m alpha_engine_lib.ssm_dispatcher run`). Closes the (i) alive-SSH-path finding from the 2026-05-24 audit; PR 2 of the 5-PR ROADMAP L342 arc. Transport changes: - Wait-for-SSH loop → wait-for-SSM-Online (`aws ssm describe-instance- information` polling, 180s budget, mirrors predictor #168 pattern) - `run_remote "..."` (ssh-based) → `run_ssm "<desc>" <timeout> <<HEREDOC` (lib CLI via --script-stdin) - SCP config upload → S3 staging: dispatcher uploads alpha-engine-config/data/config.yaml to tmp/spot_data_weekly/<run_id>/config.yaml; spot pulls via existing alpha-engine-executor-profile IAM role's s3:GetObject grant - REMOTE_PYTHON captured via SSH → PYTHON_BIN resolved inline per SSM step (`command -v python3.12 || command -v python3`) - KEY_FILE / SSH_OPTS removed; KEY_NAME kept ONLY as launch attribute for alpha_engine_lib.ec2_spot's --key-name flag (break-glass operator SSH only — port-22 SG revoke is PR 5 of the arc) Why pipe heredoc via --script-stdin instead of mirror predictor's inline `"$(cat <<HEREDOC ... HEREDOC)"` pattern: the data path's RAG-secrets block contains `aws ssm get-parameter --query 'Parameter.Value' ...` inside `$(...)`. The outer command-substitution scanner sees the inner single quotes and breaks. The lib CLI's --script-stdin flag reads the body verbatim, so the dispatcher's bash parser never scans the inner script for quote/paren balance. Future PRs adopting the lib chokepoint (ssm_dispatcher) should prefer --script-stdin for any non-trivial spot-side script body. Cleanup: dispatcher trap also removes the S3 staging prefix on EXIT (belt-and-suspenders — S3 lifecycle on tmp/ is the authoritative purger). CI guards (tests/test_spot_data_weekly_ssm_transport.py, 8 new tests): - test_spot_data_weekly_script_exists — script presence - test_no_top_level_ssh_invocation — no `ssh -X` / `ssh ` outside comments - test_no_top_level_scp_invocation — no `scp -X` / `scp ` outside comments - test_no_ssh_keyscan_invocation — no `ssh-keyscan github.com` re-introduce - test_uses_lib_ssm_dispatcher_chokepoint — `alpha_engine_lib.ssm_dispatcher` present (catches a regression that replaces it with `aws ssm send-command`) - test_no_inline_aws_ssm_send_command — no direct `aws ssm send-command` (the predictor #168 pre-lift pattern L342 explicitly lifts to lib) - test_stages_config_via_s3 — `aws s3 cp ... config.yaml` present - test_no_residual_key_file_dispatch_use — no $KEY_FILE / $SSH_OPTS in non-comment lines (KEY_NAME stays as launch attribute, allow-listed) Test fixture updates (no behavior change): - test_spot_env_source_aws_region.py — multi-line `read -r -d '' ENV_SOURCE <<'ENV_EOF' ... ENV_EOF` recognized in addition to the single-line `ENV_SOURCE="..."` shape. The semantic invariant (ENV_SOURCE exports AWS_REGION + AWS_DEFAULT_REGION) is unchanged. - test_preflight_only_dry_path.py — accept `run_ssm "workloads"` as the workloads opener in addition to the pre-migration `run_remote bash -s <<WORKLOADS`. Suite: 1618 → 1626 passed (+8). Operator notes: - The spot's IAM profile (alpha-engine-executor-profile) already grants AmazonSSMManagedInstanceCore via the predictor #168 migration (lib pin v0.35.0+ ships in same profile); no IAM changes needed here. - Saturday SF first exercise: next Saturday SF firing (alpha-engine-saturday) runs MorningEnrich / DataPhase1 / RAGIngestion through the new transport. If any step fails for transport-shape reasons, recover via SF redrive (operator-launched, NOT cron) which invokes the same script. - Port-22 inbound on sg-03cd3c4bd91e610b0 stays open until PR 5 (post-soak revoke). Manual operator SSH via key file remains as break-glass only. - PRs 3-4 will follow this pattern: alpha-engine-backtester spot_backtest.sh + alpha-engine-predictor spot_train.sh (predictor's existing inline run_ssm bash helper is what the arc exists to replace at the chokepoint level). Composes with morning-signal #34 (lib chokepoint adoption precedent), alpha-engine-lib v0.35.0 (ssm_dispatcher module), and [[feedback_lift_invariants_to_chokepoint_after_second_recurrence]] (this is the second adopter — predictor was first, backtester will be third). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Migrates
infrastructure/spot_data_weekly.shoff SSH+SCP onto the alpha-engine-lib v0.35.0+ssm_dispatcherchokepoint (python -m alpha_engine_lib.ssm_dispatcher run). Closes the (i) alive-SSH-path finding from the 2026-05-24 audit; PR 2 of the 5-PR ROADMAP L342 arc.Transport changes
ssh ... "echo ok", 30 × 5s)aws ssm describe-instance-informationPingStatus poll (36 × 5s)run_remote "..."(ssh -i $KEY_FILE)run_ssm "<desc>" <timeout> <<HEREDOC(lib CLI via --script-stdin)scp -i $KEY_FILE config.yaml ec2-user@$IP:...aws s3 cptotmp/spot_data_weekly/<run_id>/config.yaml; spot pulls via existing IAM s3:GetObjectREMOTE_PYTHON=$(ssh ... "command -v python3.12")captured to dispatcher varKEY_FILE=$HOME/.ssh/alpha-engine-key.pemKEY_NAMEkept ONLY forec2_spot --key-name(break-glass operator SSH only)Why pipe heredocs via
--script-stdininstead of mirroring predictor's inline"$(cat <<HEREDOC ... HEREDOC)"pattern: the data path's RAG-secrets block containsaws ssm get-parameter --query 'Parameter.Value' ...inside$(...). The outer command-substitution scanner sees the inner single quotes and breaks. The lib CLI's--script-stdinflag reads the body verbatim, so the dispatcher's bash parser never scans the inner script for quote/paren balance. Future PRs adoptingssm_dispatchershould prefer--script-stdinfor any non-trivial spot-side script body.CI guards (8 new tests)
tests/test_spot_data_weekly_ssm_transport.py:test_spot_data_weekly_script_exists— script presencetest_no_top_level_ssh_invocation— nossh -X/sshoutside commentstest_no_top_level_scp_invocation— noscp -X/scpoutside commentstest_no_ssh_keyscan_invocation— nossh-keyscan github.comre-introducetest_uses_lib_ssm_dispatcher_chokepoint—alpha_engine_lib.ssm_dispatcherpresenttest_no_inline_aws_ssm_send_command— no directaws ssm send-command(the predictor feat(sf): wire counterfactual rule fit Lambda into Saturday SF #168 pre-lift pattern L342 explicitly lifts to lib)test_stages_config_via_s3—aws s3 cp ... config.yamlpresenttest_no_residual_key_file_dispatch_use— no$KEY_FILE/$SSH_OPTSin non-comment linesTest-fixture updates (no behavior change)
test_spot_env_source_aws_region.py— recognize the new multi-lineread -r -d '' ENV_SOURCE <<'ENV_EOF' ... ENV_EOFshape in addition to the single-lineENV_SOURCE="...". The semantic invariant (ENV_SOURCE exportsAWS_REGION+AWS_DEFAULT_REGION) is unchanged.test_preflight_only_dry_path.py— acceptrun_ssm "workloads"as the workloads opener in addition to the pre-migrationrun_remote bash -s <<WORKLOADS.Operator notes
alpha-engine-executor-profile) already grantsAmazonSSMManagedInstanceCorevia the predictor feat(sf): wire counterfactual rule fit Lambda into Saturday SF #168 migration; no IAM changes needed here.alpha-engine-saturdaycron firing runs MorningEnrich / DataPhase1 / RAGIngestion through the new transport. If any step fails for transport-shape reasons, recover via operator-launched SF redrive (NOT cron) — invokes the same script.sg-03cd3c4bd91e610b0stays open until PR 5 (post-soak revoke). Manual operator SSH via key file remains as break-glass only.spot_backtest.sh+ alpha-engine-predictorspot_train.sh(predictor's existing inlinerun_ssmbash helper is what the arc exists to replace at the chokepoint level).Test plan
bash -n infrastructure/spot_data_weekly.shsyntax-cleanComposes with morning-signal #34 (lib chokepoint adoption precedent), alpha-engine-lib v0.35.0 (
ssm_dispatchermodule), and [[feedback_lift_invariants_to_chokepoint_after_second_recurrence]] (this is the second adopter — predictor was first, backtester will be third).🤖 Generated with Claude Code