feat(spot-train): wire L394 diagnostics-write flags + bump lib v0.33.0 → v0.39.0#203
Merged
Merged
Conversation
…0 → v0.39.0
L394 cascade C of 3 (sibling to alpha-engine-data #334 cascade A
+ alpha-engine-backtester #254 cascade B). Activates the lib
v0.39.0 ssm_dispatcher diagnostics-write substrate for ae-
predictor's spot SSM dispatch — on terminal non-Success the lib
CLI writes a JSON failure record (status + command_id + 4KB
stdout/stderr tails + instance_id) to s3://${S3_BUCKET}/
_spot_diagnostics/ae-predictor/{YYYY-MM-DD}.json. Best-effort
posture inside the lib — S3 failure swallowed; inner SSM exit
always preserved. Substrate is failure-only (no-op on Success).
Changes:
- requirements.txt + requirements-lambda.txt: lib pin v0.33.0 →
v0.39.0 in lockstep. Carries forward 6 intervening lib substrate
bumps (v0.34 LLMJudgeReranker deletion, v0.35 ssm_dispatcher
lift, v0.36 Option-D execution-picker, v0.37 anthropic_payload
chokepoint, v0.38 universe_writer_lock + PyPI summary guard,
v0.39 ssm_dispatcher diagnostics-write).
- spot_train.sh::run_ssm: append --diagnostics-bucket $S3_BUCKET
+ --diagnostics-prefix _spot_diagnostics/ae-predictor to the
lib CLI invocation. Per-repo subprefix discriminates cascade A
(ae-data) + cascade B (ae-backtester) sibling writes — lib's
{date}.json key shape would otherwise clobber within a shared
prefix.
- tests/test_spot_train_ssm_lib_chokepoint.py: new
test_run_ssm_passes_diagnostics_flags pins both flags + per-
repo subprefix.
Composes with the existing predictor manifest.peak_rss_mb
instrumentation (PR #193) — peak_rss covers the OOM-class on
EVERY run (success or failure); diagnostics-write covers the
broader class (non-OOM terminal failures: missing S3 input, IAM
drift, ArcticDB connect timeout) failure-only.
Suite 1210 → 1211.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
L394 cascade C of 3 (sibling to
alpha-engine-data#334cascade A +alpha-engine-backtester#254cascade B). Activates the lib v0.39.0ssm_dispatcherdiagnostics-write substrate for ae-predictor — on terminal non-Success the lib CLI writes a JSON failure record tos3://${S3_BUCKET}/_spot_diagnostics/ae-predictor/{YYYY-MM-DD}.json.Changes
requirements.txt+requirements-lambda.txt— lib pinv0.33.0→v0.39.0in lockstep. Carries forward 6 intervening lib substrate bumps.infrastructure/spot_train.sh::run_ssm— append--diagnostics-bucket $S3_BUCKET+--diagnostics-prefix _spot_diagnostics/ae-predictor.tests/test_spot_train_ssm_lib_chokepoint.py— newtest_run_ssm_passes_diagnostics_flagspins both flags + per-repo subprefix.Composes with existing predictor instrumentation
PR #193's
manifest.peak_rss_mbcovers the OOM-class on EVERY run (success + failure); this PR's diagnostics-write covers the broader failure-only class (non-OOM terminal failures — missing S3 input, IAM drift, ArcticDB connect timeout, BranchBFailed cascade).Test plan
test_run_ssm_passes_diagnostics_flags— pins both flags + per-repo prefix.test_run_ssm_uses_lib_dispatcher+test_no_inline_aws_ssm_send_command— chokepoint preserved._spot_diagnostics/ae-predictor/{date}.json.🤖 Generated with Claude Code