From 4006e220daaa794da8cde1311fa32bc9d9aae8a8 Mon Sep 17 00:00:00 2001 From: Brian McMahon Date: Mon, 18 May 2026 15:23:34 -0700 Subject: [PATCH] =?UTF-8?q?feat(sf):=20rewire=20last=205=20skip-exceptions?= =?UTF-8?q?=20=E2=86=92=20dry=20(DriftDetection=20preflight=20+=20EvalJudg?= =?UTF-8?q?e/Rationale/Replay/CF=20dry=5Frun=5Fllm)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes the keystone gap: the 5 documented shell-run skip-exceptions are flipped skip→dry. Under shell_run EVERY substantive workload now boots + runs dry; ZERO skip-exceptions remain. All prerequisite dry flags were already MERGED on origin/main of their repos. Per-state mechanism: | State | Type | Mechanism (under shell_run) | |-----------------------------|--------|------------------------------------------------------| | DriftDetection | spot | commands.$ States.Format($.preflight_args) → ` --preflight-only` (data #261) | | EvalJudgeSubmitFirstSaturday| Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) | | EvalJudgeSubmitWeekly | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) | | EvalJudgePoll | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) | | EvalJudgeProcess | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) | | RationaleClustering | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) | | ReplayConcordance | Lambda | Payload "dry_run_llm.$": "$.research_dry" (backtester #225) | | Counterfactual | Lambda | Payload "dry_run_llm.$": "$.research_dry" (backtester #225) | Exact canonical dry var: $.research_dry. It is THE canonical shell-run LLM-dry signal — InitializeInput seeds it false on every run (so the absent path / real Sat 02:00 PT firing is unchanged); ApplyShellRunDefaults already sets it true under shell_run (it backed Research from the keystone). No new var invented — research #202 / backtester #225 PR bodies specify dry_run_llm, and reusing $.research_dry keeps the absent-path guarantee automatic (no extra seeding needed; the seed already exists). Changes: - ApplyShellRunDefaults: removed skip_drift_detection / skip_eval_judge / skip_rationale_clustering / skip_replay_concordance / skip_counterfactual from the force-set JsonMerge blob. It now force-sets ZERO skip_*. Per-flag user overrides still win (merge order unchanged). The Choice-gated CheckSkip gates are LEFT INTACT (still valid for targeted operator skips — verified by test_skip_gates_still_intact). - DriftDetection: literal `commands` array → `commands.$` States.Array whose final entry is States.Format('bash infrastructure/ spot_drift_detection.sh{} 2>&1 | tee /var/log/drift-detection.log', $.preflight_args). {} sits immediately after the script token with no literal space; preflight_args carries its leading space inside the var, so preflight_args="" reproduces the origin/main command char-for-char and " --preflight-only" yields exactly one separating space. - 7 eval Lambda Payloads: added "dry_run_llm.$": "$.research_dry". EvalRollingMean (alpha-engine-research-eval-rolling-mean) was NOT touched — it has no skip gate, was never a keystone exception, and is a pure historical-metric reader (out of scope). Byte-identical proof approach: - shell_run absent ⇒ CheckShellRun.Default = CheckSkipMorningEnrich (unchanged); InitializeInput seeds preflight_args="", research_dry=false. Every spot States.Format resolves char-for-char to the frozen origin/main literal; every eval Lambda dry_run_llm.$ resolves to false (handlers default it false ⇒ behaviourally identical to pre-rewire). - The frozen baseline fixture tests/fixtures/sf_prekeystone_spot_commands .json now INCLUDES DriftDetection's pre-rewire origin/main literal command (regenerated via the established generator at preflight_args=""; the existing 7 entries are unchanged). The byte-identical test asserts DriftDetection's resolved command at preflight_args="" equals that frozen baseline and carries --preflight-only (single space) under shell_run. - CI-safe: tests read only the committed fixture (no `git show origin/main` shell-out — that was the #260 CI failure). Tests: - _SPOT_STATES grew to 8 (added DriftDetection); _DRY_LAMBDA_STATES grew to 11 (added the 7 eval states); _KEYSTONE_SKIP_EXCEPTIONS = empty set. - test_shell_defaults_force_set_ZERO_skip_exceptions asserts the blob force-sets no skip_* and none of the 16 workload skips (incl. the 5 ex-exceptions) appear. - TestHappyPathTraversal: under shell_run nothing is skipped (skipped == set()); DriftDetection is VISITED (runs dry), not jumped past. - Module + class docstrings updated to the rewire semantics. JSON valid (58 top-level states, 91 incl. parallel branches). Full alpha-engine-data suite: 1351 passed, 1 skipped, 0 failed. Zero skip-exceptions remain — every substantive task runs dry under shell_run (spots → --preflight-only, Lambdas → dry_run_llm). Co-Authored-By: Claude Opus 4.7 (1M context) --- infrastructure/step_function.json | 35 ++-- .../sf_prekeystone_spot_commands.json | 9 + tests/test_sf_friday_shell_run_wiring.py | 164 +++++++++++------- 3 files changed, 131 insertions(+), 77 deletions(-) diff --git a/infrastructure/step_function.json b/infrastructure/step_function.json index a2c7364..ba51c91 100644 --- a/infrastructure/step_function.json +++ b/infrastructure/step_function.json @@ -33,9 +33,9 @@ }, "ApplyShellRunDefaults": { "Type": "Pass", - "Comment": "shell_run=true ONLY (keystone — replaces #258's pure-skip with dry EXECUTION). Merges the dry-path control blob UNDER the current state (States.JsonMerge(shellDefaults, $, false) — second arg wins) so an explicit per-flag override in the execution input still takes effect (e.g. {\"shell_run\": true, \"skip_backtester\": true} still skips Backtester; {\"shell_run\": true, \"preflight_args\": \"\"} would run the spots full-fat). Mirrors InitializeInput's defaults-under-input JsonMerge pattern exactly. SPOT states (MorningEnrich, DataPhase1, RAGIngestion, PredictorTraining, Backtester, Parity, Evaluator) boot + run dry via preflight_args=\" --preflight-only\" (LEADING space inside the var; the spot states' final command is a States.Format whose {} is placed immediately after the mode token with NO literal space, so preflight_args=\"\" on the real run yields a byte-identical command — data #259 + predictor #175 + backtester #224 all expose --preflight-only verbatim, an orthogonal MODIFIER). LAMBDA states with a verified clean no-write dry path are routed dry, NOT skipped: Research (dry_run_llm=true — post-#195 install_dry_run_stubs no-ops archive_writer/email_sender/upload_db/write_signals_json/save_sector_team_run/save_agent_run), DataPhase2 (dry_run=true — alternative.collect returns ok_dry_run BEFORE any fetch/S3 write), RegimeSubstrate + RegimeRetrospectiveEval (action=dry_run — produce_*(write=False) returns payload before any put_object). DOCUMENTED EXCEPTIONS still hard-skipped via the #258 mechanism (no verified clean no-write dry path): skip_drift_detection (spot_drift_detection.sh has NO --preflight-only flag), skip_eval_judge (submit handler always _persist_client_side_skips + Anthropic Batch create — no handler-level dry param), skip_rationale_clustering (_persist_analysis S3 put_object is NOT gated by dry_run — only the CloudWatch metric is), skip_replay_concordance + skip_counterfactual (alpha-engine-replay-concordance/counterfactual handler source not present in any cloned repo — cannot verify a clean dry path; routing to an unverified one is forbidden). The #258 skip_* gates are LEFT INTACT and remain valid for targeted operator skips. The two health-check states have NO skip gate by design and run under shell_run (the bootstrap smoke) — their non-blocking Catch absorbs a stale-data sys.exit(1) so a missing-Friday-bar produces only a clearly-Friday-timestamped alert email, NOT a SF-fatal failure (ROADMAP owed-work item 5: a --shell-run-aware staleness tolerance in alpha-engine-dashboard/health_checker.py is a scoped cross-repo follow-on; not a SF-fatal spurious-fail, so deliberately out of this single-file SF PR).", + "Comment": "shell_run=true ONLY (keystone + skip-exception rewire — every substantive workload now boots + runs DRY; ZERO skip-exceptions remain). Merges the dry-path control blob UNDER the current state (States.JsonMerge(shellDefaults, $, false) — second arg wins) so an explicit per-flag override in the execution input still takes effect (e.g. {\"shell_run\": true, \"skip_backtester\": true} still skips Backtester; {\"shell_run\": true, \"preflight_args\": \"\"} would run the spots full-fat). Mirrors InitializeInput's defaults-under-input JsonMerge pattern exactly. SPOT states (MorningEnrich, DataPhase1, RAGIngestion, PredictorTraining, Backtester, Parity, Evaluator, AND DriftDetection) boot + run dry via preflight_args=\" --preflight-only\" (LEADING space inside the var; the spot states' final command is a States.Format whose {} is placed immediately after the mode token with NO literal space, so preflight_args=\"\" on the real run yields a byte-identical command — data #259 + #261 + predictor #175 + backtester #224 all expose --preflight-only verbatim, an orthogonal MODIFIER). LAMBDA states with a verified clean no-write dry path are routed dry, NOT skipped, via $.research_dry (the canonical shell-run LLM-dry signal, true here / false on the real run): Research + the EvalJudge chain (EvalJudgeSubmit{FirstSaturday,Weekly}/Poll/Process — dry_run_llm via research #202) + RationaleClustering (research #202) + ReplayConcordance + Counterfactual (backtester #225) all take dry_run_llm.$=$.research_dry; DataPhase2 (dry_run=true — alternative.collect returns ok_dry_run BEFORE any fetch/S3 write), RegimeSubstrate + RegimeRetrospectiveEval (action=dry_run — produce_*(write=False) returns payload before any put_object). The skip-exception rewire (this PR) flipped the prior 5 skip→dry: DriftDetection now uses the same commands.$/States.Format($.preflight_args) Option-C mechanism as the other spot states (data #261 exposed --preflight-only on spot_drift_detection.sh); the eval-judge / rationale-clustering / replay-concordance / counterfactual Lambdas now route dry via dry_run_llm.$=$.research_dry instead of being hard-skipped. ZERO skip-exceptions are force-set here. The #258 Choice-gated skip_* gates are LEFT INTACT and remain valid for targeted operator skips. The two health-check states have NO skip gate by design and run under shell_run (the bootstrap smoke) — their non-blocking Catch absorbs a stale-data sys.exit(1) so a missing-Friday-bar produces only a clearly-Friday-timestamped alert email, NOT a SF-fatal failure (ROADMAP owed-work item 5: a --shell-run-aware staleness tolerance in alpha-engine-dashboard/health_checker.py is a scoped cross-repo follow-on; not a SF-fatal spurious-fail, so deliberately out of this single-file SF PR).", "Parameters": { - "merged.$": "States.JsonMerge(States.StringToJson('{\"preflight_args\":\" --preflight-only\",\"research_dry\":true,\"data_phase2_dry\":true,\"regime_action\":\"dry_run\",\"skip_drift_detection\":true,\"skip_eval_judge\":true,\"skip_rationale_clustering\":true,\"skip_replay_concordance\":true,\"skip_counterfactual\":true}'),$,false)" + "merged.$": "States.JsonMerge(States.StringToJson('{\"preflight_args\":\" --preflight-only\",\"research_dry\":true,\"data_phase2_dry\":true,\"regime_action\":\"dry_run\"}'),$,false)" }, "OutputPath": "$.merged", "Next": "CheckSkipMorningEnrich" @@ -676,7 +676,8 @@ "FunctionName": "alpha-engine-research-eval-judge-submit:live", "Payload": { "force_sonnet_pass": true, - "date.$": "$.eval_cadence.eval_date" + "date.$": "$.eval_cadence.eval_date", + "dry_run_llm.$": "$.research_dry" } }, "TimeoutSeconds": 300, @@ -712,7 +713,8 @@ "FunctionName": "alpha-engine-research-eval-judge-submit:live", "Payload": { "force_sonnet_pass": false, - "date.$": "$.eval_cadence.eval_date" + "date.$": "$.eval_cadence.eval_date", + "dry_run_llm.$": "$.research_dry" } }, "TimeoutSeconds": 300, @@ -772,7 +774,8 @@ "Payload": { "batch_id.$": "$.eval_judge_submit.Payload.batch_id", "submit_iso.$": "$.eval_cadence.submit_iso", - "max_wait_seconds": 21600 + "max_wait_seconds": 21600, + "dry_run_llm.$": "$.research_dry" } }, "TimeoutSeconds": 60, @@ -833,7 +836,8 @@ "FunctionName": "alpha-engine-research-eval-judge-process:live", "Payload": { "batch_id.$": "$.eval_judge_submit.Payload.batch_id", - "plan_s3_key.$": "$.eval_judge_submit.Payload.plan_s3_key" + "plan_s3_key.$": "$.eval_judge_submit.Payload.plan_s3_key", + "dry_run_llm.$": "$.research_dry" } }, "TimeoutSeconds": 900, @@ -923,7 +927,8 @@ "Parameters": { "FunctionName": "alpha-engine-research-rationale-clustering:live", "Payload": { - "end_time_iso.$": "$$.Execution.StartTime" + "end_time_iso.$": "$$.Execution.StartTime", + "dry_run_llm.$": "$.research_dry" } }, "TimeoutSeconds": 600, @@ -983,7 +988,8 @@ "claude-haiku-4-5" ], "window_days": 56, - "max_artifacts": 150 + "max_artifacts": 150, + "dry_run_llm.$": "$.research_dry" } }, "TimeoutSeconds": 900, @@ -1040,7 +1046,8 @@ "Payload": { "end_time_iso.$": "$$.Execution.StartTime", "window_days": 56, - "max_depth": 3 + "max_depth": 3, + "dry_run_llm.$": "$.research_dry" } }, "TimeoutSeconds": 600, @@ -1346,15 +1353,7 @@ "DocumentName": "AWS-RunShellScript", "InstanceIds.$": "$.ec2_instance_id", "Parameters": { - "commands": [ - "set -eo pipefail", - "sudo -u ec2-user git -C /home/ec2-user/alpha-engine-data pull --ff-only origin main", - "cd /home/ec2-user/alpha-engine-data", - "export HOME=/home/ec2-user", - "set -a && source /home/ec2-user/.alpha-engine.env && set +a", - "trap 'aws s3 cp /var/log/drift-detection.log \"s3://alpha-engine-research/_ssm_logs/drift-detection/$(date -u +%Y-%m-%d)/$(hostname)-$(date -u +%H%M%SZ).log\" --only-show-errors || true' EXIT", - "bash infrastructure/spot_drift_detection.sh 2>&1 | tee /var/log/drift-detection.log" - ], + "commands.$": "States.Array('set -eo pipefail','sudo -u ec2-user git -C /home/ec2-user/alpha-engine-data pull --ff-only origin main','cd /home/ec2-user/alpha-engine-data','export HOME=/home/ec2-user','set -a && source /home/ec2-user/.alpha-engine.env && set +a','trap \\'aws s3 cp /var/log/drift-detection.log \"s3://alpha-engine-research/_ssm_logs/drift-detection/$(date -u +%Y-%m-%d)/$(hostname)-$(date -u +%H%M%SZ).log\" --only-show-errors || true\\' EXIT',States.Format('bash infrastructure/spot_drift_detection.sh{} 2>&1 | tee /var/log/drift-detection.log',$.preflight_args))", "executionTimeout": [ "1200" ] diff --git a/tests/fixtures/sf_prekeystone_spot_commands.json b/tests/fixtures/sf_prekeystone_spot_commands.json index e139ad8..ab153da 100644 --- a/tests/fixtures/sf_prekeystone_spot_commands.json +++ b/tests/fixtures/sf_prekeystone_spot_commands.json @@ -19,6 +19,15 @@ "trap 'aws s3 cp /var/log/data-weekly.log \"s3://alpha-engine-research/_ssm_logs/data-weekly/$(date -u +%Y-%m-%d)/$(hostname)-$(date -u +%H%M%SZ).log\" --only-show-errors || true' EXIT", "bash infrastructure/spot_data_weekly.sh --phase1-only 2>&1 | tee /var/log/data-weekly.log" ], + "DriftDetection": [ + "set -eo pipefail", + "sudo -u ec2-user git -C /home/ec2-user/alpha-engine-data pull --ff-only origin main", + "cd /home/ec2-user/alpha-engine-data", + "export HOME=/home/ec2-user", + "set -a && source /home/ec2-user/.alpha-engine.env && set +a", + "trap 'aws s3 cp /var/log/drift-detection.log \"s3://alpha-engine-research/_ssm_logs/drift-detection/$(date -u +%Y-%m-%d)/$(hostname)-$(date -u +%H%M%SZ).log\" --only-show-errors || true' EXIT", + "bash infrastructure/spot_drift_detection.sh 2>&1 | tee /var/log/drift-detection.log" + ], "Evaluator": [ "set -eo pipefail", "sudo -u ec2-user git -C /home/ec2-user/alpha-engine-backtester pull --ff-only origin main", diff --git a/tests/test_sf_friday_shell_run_wiring.py b/tests/test_sf_friday_shell_run_wiring.py index 9b84ddb..b927bed 100644 --- a/tests/test_sf_friday_shell_run_wiring.py +++ b/tests/test_sf_friday_shell_run_wiring.py @@ -10,8 +10,11 @@ #258 shipped the SPINE (pure-skip every workload via the existing Choice-gated skip mechanism). The KEYSTONE (feat/sf-shell-run-keystone) -replaces pure-skip with dry EXECUTION so the Friday run actually boots + -exercises the real bootstrap/import/lib-pin/transport paths: +replaced pure-skip with dry EXECUTION for all but 5 documented exceptions. +The SKIP-EXCEPTION REWIRE (feat/sf-rewire-close-skip-exceptions) then +flipped those last 5 skip→dry, so the Friday run boots + exercises the +real bootstrap/import/lib-pin/transport paths for EVERY substantive +workload — ZERO skip-exceptions remain: The SF is a STRICT SUPERSET of the pre-spine Saturday SF: - A `CheckShellRun` Choice after `InitializeInput`. `shell_run` absent OR @@ -23,13 +26,20 @@ char-for-char unchanged and every Lambda Payload behaviourally identical. - `shell_run=true` → `ApplyShellRunDefaults`, a Pass that merges the dry-path control blob UNDER the execution input (user per-flag overrides - still win), then → `CheckSkipMorningEnrich`. The 7 SPOT states boot + run - dry via `preflight_args=" --preflight-only"`; the 4 verified-clean LAMBDA - states (Research, DataPhase2, RegimeSubstrate, RegimeRetrospectiveEval) - run dry via their handler's no-write dry flag; the 5 DOCUMENTED EXCEPTION - states with NO verified clean dry path (DriftDetection, EvalJudge, - RationaleClustering, ReplayConcordance, Counterfactual) are still hard - skipped via the #258 skip_* mechanism (which is LEFT INTACT). + still win), then → `CheckSkipMorningEnrich`. The 8 SPOT states boot + run + dry via `preflight_args=" --preflight-only"` (the rewire added + DriftDetection — data #261 exposed `--preflight-only` on + spot_drift_detection.sh, converting it from a literal `commands` array to + the same `commands.$`/`States.Format($.preflight_args)` Option-C + mechanism); the LAMBDA states run dry via their handler's no-write dry + flag: Research/DataPhase2/RegimeSubstrate/RegimeRetrospectiveEval from + the keystone, PLUS the eval-judge chain + (EvalJudgeSubmit{FirstSaturday,Weekly}/Poll/Process), RationaleClustering + (research #202), ReplayConcordance + Counterfactual (backtester #225) — + all reusing the canonical `$.research_dry` shell-run-dry signal via + `dry_run_llm.$`. ZERO skip-exceptions are force-set. The #258 + Choice-gated skip_* mechanism is LEFT INTACT (still valid for targeted + operator skips). - A `CheckShellRunNotify` Choice before the success notify. shell_run absent/false → the unchanged `NotifyComplete`; shell_run=true → `NotifyShellRunComplete` (shell-run-tagged Subject, same SNS substrate). @@ -88,9 +98,15 @@ "skip_evaluator", } -# KEYSTONE: the 7 SPOT workload states. Under shell_run they BOOT + run dry -# via preflight_args=" --preflight-only" (a States.Format suffix); with -# preflight_args="" (the real Saturday run) the command is byte-identical. +# KEYSTONE + skip-exception rewire: the 8 SPOT workload states. Under +# shell_run they BOOT + run dry via preflight_args=" --preflight-only" (a +# States.Format suffix); with preflight_args="" (the real Saturday run) the +# command is byte-identical. DriftDetection joined this set in the +# skip-exception rewire — data #261 added --preflight-only to +# spot_drift_detection.sh, so it was converted from a literal `commands` +# array (hard-skipped under shell_run via skip_drift_detection) to the same +# commands.$/States.Format($.preflight_args) Option-C mechanism the keystone +# used for the other 7 spots. # Maps state name → (mode token the {} immediately follows, log file). _SPOT_STATES = { "MorningEnrich": ( @@ -121,29 +137,45 @@ "bash infrastructure/spot_backtest.sh --skip-stages=backtest,parity", "/var/log/evaluator.log", ), + "DriftDetection": ( + "bash infrastructure/spot_drift_detection.sh", + "/var/log/drift-detection.log", + ), } -# KEYSTONE: the 4 LAMBDA states with a VERIFIED clean no-write dry path. -# state name → (Payload key carrying the dry flag, input var it references, -# the dry value ApplyShellRunDefaults sets, the non-dry identity default). +# KEYSTONE + skip-exception rewire: the LAMBDA states routed dry (NOT +# skipped) via an input-var ref so the absent path is behaviourally +# identical. Research/DataPhase2/Regime* were dry from the keystone; the +# skip-exception rewire ADDED the eval-judge chain + rationale-clustering +# (research #202 added dry_run_llm) + replay-concordance + counterfactual +# (backtester #225 added dry_run_llm) — all reusing the canonical +# $.research_dry shell-run-dry signal (already true under shell_run / false +# on the real run). DriftDetection is NOT here — it is a SPOT state (see +# _SPOT_STATES) routed dry via --preflight-only, not a Lambda dry flag. +# state name → (Payload key carrying the dry flag, input var it references). _DRY_LAMBDA_STATES = { "Research": ("dry_run_llm.$", "$.research_dry"), "DataPhase2": ("dry_run.$", "$.data_phase2_dry"), "RegimeSubstrate": ("action.$", "$.regime_action"), "RegimeRetrospectiveEval": ("action.$", "$.regime_action"), + # Skip-exception rewire — eval-judge chain + agent-justification triple. + "EvalJudgeSubmitFirstSaturday": ("dry_run_llm.$", "$.research_dry"), + "EvalJudgeSubmitWeekly": ("dry_run_llm.$", "$.research_dry"), + "EvalJudgePoll": ("dry_run_llm.$", "$.research_dry"), + "EvalJudgeProcess": ("dry_run_llm.$", "$.research_dry"), + "RationaleClustering": ("dry_run_llm.$", "$.research_dry"), + "ReplayConcordance": ("dry_run_llm.$", "$.research_dry"), + "Counterfactual": ("dry_run_llm.$", "$.research_dry"), } -# KEYSTONE: the 5 documented-exception states still HARD-skipped under -# shell_run (no verified clean no-write dry path — see PR body / the -# ApplyShellRunDefaults Comment for the per-state reason). These are the -# ONLY skip_* flags ApplyShellRunDefaults force-sets. -_KEYSTONE_SKIP_EXCEPTIONS = { - "skip_drift_detection", - "skip_eval_judge", - "skip_rationale_clustering", - "skip_replay_concordance", - "skip_counterfactual", -} +# Skip-exception rewire (this PR): ZERO skip-exceptions remain. The +# keystone's 5 documented hard-skips (skip_drift_detection / skip_eval_judge +# / skip_rationale_clustering / skip_replay_concordance / skip_counterfactual) +# were ALL flipped skip→dry — DriftDetection via the spot +# commands.$/States.Format(--preflight-only) mechanism (data #261), the 6 +# eval Lambdas via dry_run_llm.$=$.research_dry (research #202 + +# backtester #225). ApplyShellRunDefaults force-sets NO skip_* flag now. +_KEYSTONE_SKIP_EXCEPTIONS: set[str] = set() # Dry-path control vars + their NON-DRY identity values seeded by # InitializeInput (so the absent path is byte-identical / behaviourally @@ -412,22 +444,27 @@ def test_shell_defaults_set_dry_control_vars(self, states): 'the absent-path "" yields a byte-identical spot command' ) - def test_shell_defaults_skip_ONLY_documented_exceptions(self, states): - """The keystone must NOT force all 16 skip_* true (that would pure- - skip the workload, defeating the point). It force-sets ONLY the 5 - documented-exception skips (states with no verified clean dry - path).""" + def test_shell_defaults_force_set_ZERO_skip_exceptions(self, states): + """Skip-exception rewire: ApplyShellRunDefaults must force-set NO + skip_* flag at all. The keystone's 5 documented hard-skips were ALL + flipped skip→dry (DriftDetection → spot --preflight-only; the 6 eval + Lambdas → dry_run_llm.$=$.research_dry). A regression re-introducing + any force-set skip_* would silently demote a substantive workload + back to pure-skip — defeating the whole point of the shell run.""" blob = self._blob(states) skip_keys = {k for k in blob if k.startswith("skip_")} assert skip_keys == _KEYSTONE_SKIP_EXCEPTIONS, ( - "ApplyShellRunDefaults skip_* set drifted from the documented " - f"exceptions: {skip_keys ^ _KEYSTONE_SKIP_EXCEPTIONS}" + "ApplyShellRunDefaults must force-set ZERO skip_* (the " + "skip-exception rewire flipped all 5 keystone exceptions to " + f"dry); leaked skip_* keys: {skip_keys}" ) - for k in _KEYSTONE_SKIP_EXCEPTIONS: - assert blob[k] is True, f"{k} must be force-true'd" - # The verified-clean workload states must NOT be skipped (they run - # dry) — a regression re-adding e.g. skip_research would silently - # demote Research from dry-execution back to pure-skip. + assert skip_keys == set(), ( + f"ZERO skip-exceptions must remain; got {skip_keys}" + ) + # NO workload state may be force-skipped — every substantive task + # now runs DRY under shell_run (spots via --preflight-only, Lambdas + # via their no-write dry flag). This explicitly includes the 5 + # previously-excepted states (the rewire's whole purpose). for forbidden in ( "skip_morning_enrich", "skip_data_phase1", @@ -440,10 +477,16 @@ def test_shell_defaults_skip_ONLY_documented_exceptions(self, states): "skip_data_phase2", "skip_regime_substrate", "skip_regime_retrospective_eval", + # The 5 ex-keystone skip-exceptions — now run DRY, not skipped. + "skip_drift_detection", + "skip_eval_judge", + "skip_rationale_clustering", + "skip_replay_concordance", + "skip_counterfactual", ): assert forbidden not in blob, ( - f"{forbidden} must NOT be force-skipped under the keystone " - "— that state runs DRY, not skipped" + f"{forbidden} must NOT be force-skipped — the skip-exception " + "rewire runs that state DRY under shell_run, not skipped" ) def test_skip_gates_still_intact(self, states): @@ -563,11 +606,11 @@ class TestHappyPathTraversal: """End-to-end traversal of the deterministic gates (CheckShellRun + every CheckSkip). Models a Task/Wait/status-Choice as "the workload RUNS, then control proceeds past it" (status checks resolve to - their success edge on a green run). Asserts the keystone semantics: - under shell_run the 7 spot + 4 dry-Lambda workload gates do NOT skip - (their state RUNS dry), the 5 documented-exception gates DO skip, and - the run still reaches NotifyShellRunComplete; absent shell_run it is the - pre-keystone path.""" + their success edge on a green run). Asserts the skip-exception-rewire + semantics: under shell_run EVERY workload gate falls through (its state + RUNS dry — including DriftDetection, flipped skip→dry), ZERO + skip-exceptions remain, and the run still reaches + NotifyShellRunComplete; absent shell_run it is the pre-keystone path.""" def _trace_main(self, sf, states, shell_run: bool) -> tuple: """Returns (visited_state_order, skipped_workload_set). A workload @@ -642,7 +685,7 @@ def _trace_main(self, sf, states, shell_run: bool) -> tuple: cur = st.get("Next") return order, skipped_workloads - def test_shell_run_true_runs_workloads_dry_skips_only_exceptions( + def test_shell_run_true_runs_every_workload_dry_zero_skips( self, sf, states ): order, skipped = self._trace_main(sf, states, shell_run=True) @@ -650,16 +693,19 @@ def test_shell_run_true_runs_workloads_dry_skips_only_exceptions( assert "ApplyShellRunDefaults" in order assert "NotifyShellRunComplete" in order assert "NotifyComplete" not in order - # The 7 spot + 2 dry-Lambda main-thread workload states are NOT - # skipped — their CheckSkip gate falls through so the (dry) state is - # VISITED. (Research/DataPhase2/PredictorTraining live inside the - # Parallel and aren't on this main-thread trace; their dry-routing - # is asserted by TestByteIdenticalAbsentPath + - # test_shell_defaults_skip_ONLY_documented_exceptions.) + # Skip-exception rewire: ZERO main-thread workload states are + # skipped — every CheckSkip gate falls through so the (dry) state is + # VISITED. This now INCLUDES DriftDetection (flipped skip→dry via + # the spot --preflight-only mechanism). (Research/DataPhase2/eval + # chain/PredictorTraining live inside the Parallel and aren't on + # this main-thread trace; their dry-routing is asserted by + # TestByteIdenticalAbsentPath + + # test_shell_defaults_force_set_ZERO_skip_exceptions.) for ran_dry in ( "MorningEnrich", "DataPhase1", "RAGIngestion", + "DriftDetection", "Backtester", "Parity", "Evaluator", @@ -667,17 +713,17 @@ def test_shell_run_true_runs_workloads_dry_skips_only_exceptions( "RegimeRetrospectiveEval", ): assert ran_dry in order, ( - f"{ran_dry} was NOT visited under shell_run — the keystone " + f"{ran_dry} was NOT visited under shell_run — the rewire " "runs it DRY (visited), not skip (jumped past)" ) assert ran_dry not in skipped, ( - f"{ran_dry} was skipped under shell_run — keystone runs it " - "DRY" + f"{ran_dry} was skipped under shell_run — the rewire runs " + "it DRY" ) - # The documented-exception workload states ARE skipped (jumped - # past) — no verified clean dry path. - assert "DriftDetection" in skipped - assert "DriftDetection" not in order + # ZERO skip-exceptions remain — nothing is jumped past. + assert skipped == set(), ( + f"skip-exception rewire leaves nothing skipped; got {skipped}" + ) # Health/substrate checks DO still run (the bootstrap smoke). assert "SaturdayHealthCheck" in order assert "WeeklySubstrateHealthCheck" in order