Skip to content

feat(spot-train): wire L394 diagnostics-write flags + bump lib v0.33.0 → v0.39.0#203

Merged
cipher813 merged 1 commit into
mainfrom
feat/spot-train-diagnostics-cascade-l394
May 27, 2026
Merged

feat(spot-train): wire L394 diagnostics-write flags + bump lib v0.33.0 → v0.39.0#203
cipher813 merged 1 commit into
mainfrom
feat/spot-train-diagnostics-cascade-l394

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

L394 cascade C of 3 (sibling to alpha-engine-data#334 cascade A + alpha-engine-backtester#254 cascade B). Activates the lib v0.39.0 ssm_dispatcher diagnostics-write substrate for ae-predictor — on terminal non-Success the lib CLI writes a JSON failure record to s3://${S3_BUCKET}/_spot_diagnostics/ae-predictor/{YYYY-MM-DD}.json.

Changes

  • requirements.txt + requirements-lambda.txt — lib pin v0.33.0v0.39.0 in lockstep. Carries forward 6 intervening lib substrate bumps.
  • infrastructure/spot_train.sh::run_ssm — append --diagnostics-bucket $S3_BUCKET + --diagnostics-prefix _spot_diagnostics/ae-predictor.
  • tests/test_spot_train_ssm_lib_chokepoint.py — new test_run_ssm_passes_diagnostics_flags pins both flags + per-repo subprefix.

Composes with existing predictor instrumentation

PR #193's manifest.peak_rss_mb covers the OOM-class on EVERY run (success + failure); this PR's diagnostics-write covers the broader failure-only class (non-OOM terminal failures — missing S3 input, IAM drift, ArcticDB connect timeout, BranchBFailed cascade).

Test plan

  • test_run_ssm_passes_diagnostics_flags — pins both flags + per-repo prefix.
  • test_run_ssm_uses_lib_dispatcher + test_no_inline_aws_ssm_send_command — chokepoint preserved.
  • Full suite: 1210 → 1211.
  • Cascade A (ae-data #334) + cascade B (ae-backtester #254) ship same shape.
  • Next terminal non-Success in PredictorTraining writes _spot_diagnostics/ae-predictor/{date}.json.

🤖 Generated with Claude Code

…0 → v0.39.0

L394 cascade C of 3 (sibling to alpha-engine-data #334 cascade A
+ alpha-engine-backtester #254 cascade B). Activates the lib
v0.39.0 ssm_dispatcher diagnostics-write substrate for ae-
predictor's spot SSM dispatch — on terminal non-Success the lib
CLI writes a JSON failure record (status + command_id + 4KB
stdout/stderr tails + instance_id) to s3://${S3_BUCKET}/
_spot_diagnostics/ae-predictor/{YYYY-MM-DD}.json. Best-effort
posture inside the lib — S3 failure swallowed; inner SSM exit
always preserved. Substrate is failure-only (no-op on Success).

Changes:
- requirements.txt + requirements-lambda.txt: lib pin v0.33.0 →
  v0.39.0 in lockstep. Carries forward 6 intervening lib substrate
  bumps (v0.34 LLMJudgeReranker deletion, v0.35 ssm_dispatcher
  lift, v0.36 Option-D execution-picker, v0.37 anthropic_payload
  chokepoint, v0.38 universe_writer_lock + PyPI summary guard,
  v0.39 ssm_dispatcher diagnostics-write).
- spot_train.sh::run_ssm: append --diagnostics-bucket $S3_BUCKET
  + --diagnostics-prefix _spot_diagnostics/ae-predictor to the
  lib CLI invocation. Per-repo subprefix discriminates cascade A
  (ae-data) + cascade B (ae-backtester) sibling writes — lib's
  {date}.json key shape would otherwise clobber within a shared
  prefix.
- tests/test_spot_train_ssm_lib_chokepoint.py: new
  test_run_ssm_passes_diagnostics_flags pins both flags + per-
  repo subprefix.

Composes with the existing predictor manifest.peak_rss_mb
instrumentation (PR #193) — peak_rss covers the OOM-class on
EVERY run (success or failure); diagnostics-write covers the
broader class (non-OOM terminal failures: missing S3 input, IAM
drift, ArcticDB connect timeout) failure-only.

Suite 1210 → 1211.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit 605128e into main May 27, 2026
1 check passed
@cipher813 cipher813 deleted the feat/spot-train-diagnostics-cascade-l394 branch May 27, 2026 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant