feat(sf): rewire last 5 skip-exceptions → dry (DriftDetection preflight + EvalJudge/Rationale/Replay/CF dry_run_llm)#263
Merged
Conversation
…ht + EvalJudge/Rationale/Replay/CF dry_run_llm) Closes the keystone gap: the 5 documented shell-run skip-exceptions are flipped skip→dry. Under shell_run EVERY substantive workload now boots + runs dry; ZERO skip-exceptions remain. All prerequisite dry flags were already MERGED on origin/main of their repos. Per-state mechanism: | State | Type | Mechanism (under shell_run) | |-----------------------------|--------|------------------------------------------------------| | DriftDetection | spot | commands.$ States.Format($.preflight_args) → ` --preflight-only` (data #261) | | EvalJudgeSubmitFirstSaturday| Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) | | EvalJudgeSubmitWeekly | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) | | EvalJudgePoll | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) | | EvalJudgeProcess | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) | | RationaleClustering | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) | | ReplayConcordance | Lambda | Payload "dry_run_llm.$": "$.research_dry" (backtester #225) | | Counterfactual | Lambda | Payload "dry_run_llm.$": "$.research_dry" (backtester #225) | Exact canonical dry var: $.research_dry. It is THE canonical shell-run LLM-dry signal — InitializeInput seeds it false on every run (so the absent path / real Sat 02:00 PT firing is unchanged); ApplyShellRunDefaults already sets it true under shell_run (it backed Research from the keystone). No new var invented — research #202 / backtester #225 PR bodies specify dry_run_llm, and reusing $.research_dry keeps the absent-path guarantee automatic (no extra seeding needed; the seed already exists). Changes: - ApplyShellRunDefaults: removed skip_drift_detection / skip_eval_judge / skip_rationale_clustering / skip_replay_concordance / skip_counterfactual from the force-set JsonMerge blob. It now force-sets ZERO skip_*. Per-flag user overrides still win (merge order unchanged). The Choice-gated CheckSkip<State> gates are LEFT INTACT (still valid for targeted operator skips — verified by test_skip_gates_still_intact). - DriftDetection: literal `commands` array → `commands.$` States.Array whose final entry is States.Format('bash infrastructure/ spot_drift_detection.sh{} 2>&1 | tee /var/log/drift-detection.log', $.preflight_args). {} sits immediately after the script token with no literal space; preflight_args carries its leading space inside the var, so preflight_args="" reproduces the origin/main command char-for-char and " --preflight-only" yields exactly one separating space. - 7 eval Lambda Payloads: added "dry_run_llm.$": "$.research_dry". EvalRollingMean (alpha-engine-research-eval-rolling-mean) was NOT touched — it has no skip gate, was never a keystone exception, and is a pure historical-metric reader (out of scope). Byte-identical proof approach: - shell_run absent ⇒ CheckShellRun.Default = CheckSkipMorningEnrich (unchanged); InitializeInput seeds preflight_args="", research_dry=false. Every spot States.Format resolves char-for-char to the frozen origin/main literal; every eval Lambda dry_run_llm.$ resolves to false (handlers default it false ⇒ behaviourally identical to pre-rewire). - The frozen baseline fixture tests/fixtures/sf_prekeystone_spot_commands .json now INCLUDES DriftDetection's pre-rewire origin/main literal command (regenerated via the established generator at preflight_args=""; the existing 7 entries are unchanged). The byte-identical test asserts DriftDetection's resolved command at preflight_args="" equals that frozen baseline and carries --preflight-only (single space) under shell_run. - CI-safe: tests read only the committed fixture (no `git show origin/main` shell-out — that was the #260 CI failure). Tests: - _SPOT_STATES grew to 8 (added DriftDetection); _DRY_LAMBDA_STATES grew to 11 (added the 7 eval states); _KEYSTONE_SKIP_EXCEPTIONS = empty set. - test_shell_defaults_force_set_ZERO_skip_exceptions asserts the blob force-sets no skip_* and none of the 16 workload skips (incl. the 5 ex-exceptions) appear. - TestHappyPathTraversal: under shell_run nothing is skipped (skipped == set()); DriftDetection is VISITED (runs dry), not jumped past. - Module + class docstrings updated to the rewire semantics. JSON valid (58 top-level states, 91 incl. parallel branches). Full alpha-engine-data suite: 1351 passed, 1 skipped, 0 failed. Zero skip-exceptions remain — every substantive task runs dry under shell_run (spots → --preflight-only, Lambdas → dry_run_llm). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes the last keystone gap: the 5 documented shell-run skip-exceptions are flipped skip→dry. Under
shell_runevery substantive workload now boots + runs dry; ZERO skip-exceptions remain. All prerequisite dry flags were already MERGED on origin/main of their repos (data #261, research #202, backtester #225). LIVE Saturday SF — single-file SF-wiring change, precision-validated.Per-state mechanism
shell_runcommands.$States.Format($.preflight_args)→--preflight-only"dry_run_llm.$": "$.research_dry""dry_run_llm.$": "$.research_dry""dry_run_llm.$": "$.research_dry""dry_run_llm.$": "$.research_dry""dry_run_llm.$": "$.research_dry""dry_run_llm.$": "$.research_dry""dry_run_llm.$": "$.research_dry"Exact canonical dry var + why
$.research_dry— THE canonical shell-run LLM-dry signal.InitializeInputseeds itfalseon every run (absent path / the real Sat 02:00 PT firing unchanged);ApplyShellRunDefaultsalready sets ittrueundershell_run(it backed Research from the keystone). No new var invented: research #202 / backtester #225 PR bodies specify thedry_run_llmevent flag, and reusing$.research_drymakes the absent-path guarantee automatic — the seed already exists, no extraInitializeInputplumbing required.The change
skip_drift_detection/skip_eval_judge/skip_rationale_clustering/skip_replay_concordance/skip_counterfactualfrom the force-setJsonMergeblob → it now force-sets ZEROskip_*. Per-flag user overrides still win (merge orderStates.JsonMerge(defaults,$,false)unchanged). The Choice-gatedCheckSkip<State>gates are LEFT INTACT (still valid for targeted operator skips — pinned bytest_skip_gates_still_intact, all 16 retained).commandsarray →commands.$States.Array(...,States.Format('bash infrastructure/spot_drift_detection.sh{} 2>&1 | tee /var/log/drift-detection.log',$.preflight_args)), mirroring DataPhase1 exactly.{}immediately after the script token, no literal space;preflight_argscarries its leading space inside the var →preflight_args=""reproduces the origin/main command char-for-char," --preflight-only"yields exactly one separating space."dry_run_llm.$": "$.research_dry". EvalRollingMean was NOT touched —alpha-engine-research-eval-rolling-mean, no skip gate, never a keystone exception, pure historical-metric reader (out of scope).Byte-identical proof (the non-negotiable invariant)
shell_runabsent OR false ⇒ Saturday SF behaviorally byte-identical to today's real run:CheckShellRun.Default=CheckSkipMorningEnrich(unchanged).InitializeInputseedspreflight_args="",research_dry=false.States.Formatresolves char-for-char to the frozen origin/main literal (verified via the test's own resolver for all 8 spot states, including DriftDetection now in the frozen fixture). Every eval Lambdadry_run_llm.$resolves tofalse(handlers default it false ⇒ behaviourally identical to pre-rewire).tests/fixtures/sf_prekeystone_spot_commands.jsonregenerated via the established generator atpreflight_args=""; now includes DriftDetection's pre-rewire origin/main literal command; the existing 7 entries are unchanged (git diffadds DriftDetection only).git show origin/mainshell-out (that was the feat(sf): shell-run keystone — spot --preflight-only + Lambda --dry-run instead of skip #260 CI failure).Test changes
_SPOT_STATES→ 8 (added DriftDetection);_DRY_LAMBDA_STATES→ 11 (added the 7 eval states);_KEYSTONE_SKIP_EXCEPTIONS = set().test_shell_defaults_force_set_ZERO_skip_exceptions: blob force-sets noskip_*; none of the 16 workload skips (incl. the 5 ex-exceptions) appear.TestHappyPathTraversal: undershell_runskipped == set(); DriftDetection is VISITED (runs dry), not jumped past; absentshell_run= unchanged pre-keystone path.Validation
alpha-engine-datasuite: 1351 passed, 1 skipped, 0 failed.Zero skip-exceptions remain — every substantive task runs dry under
shell_run(spots →--preflight-only, Lambdas →dry_run_llm).🤖 Generated with Claude Code