Skip to content

feat(sf): rewire last 5 skip-exceptions → dry (DriftDetection preflight + EvalJudge/Rationale/Replay/CF dry_run_llm)#263

Merged
cipher813 merged 1 commit into
mainfrom
feat/sf-rewire-close-skip-exceptions
May 18, 2026
Merged

feat(sf): rewire last 5 skip-exceptions → dry (DriftDetection preflight + EvalJudge/Rationale/Replay/CF dry_run_llm)#263
cipher813 merged 1 commit into
mainfrom
feat/sf-rewire-close-skip-exceptions

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Closes the last keystone gap: the 5 documented shell-run skip-exceptions are flipped skip→dry. Under shell_run every substantive workload now boots + runs dry; ZERO skip-exceptions remain. All prerequisite dry flags were already MERGED on origin/main of their repos (data #261, research #202, backtester #225). LIVE Saturday SF — single-file SF-wiring change, precision-validated.

Per-state mechanism

State Type Mechanism under shell_run Source PR
DriftDetection spot commands.$ States.Format($.preflight_args) --preflight-only data #261
EvalJudgeSubmitFirstSaturday Lambda Payload "dry_run_llm.$": "$.research_dry" research #202
EvalJudgeSubmitWeekly Lambda Payload "dry_run_llm.$": "$.research_dry" research #202
EvalJudgePoll Lambda Payload "dry_run_llm.$": "$.research_dry" research #202
EvalJudgeProcess Lambda Payload "dry_run_llm.$": "$.research_dry" research #202
RationaleClustering Lambda Payload "dry_run_llm.$": "$.research_dry" research #202
ReplayConcordance Lambda Payload "dry_run_llm.$": "$.research_dry" backtester #225
Counterfactual Lambda Payload "dry_run_llm.$": "$.research_dry" backtester #225

Exact canonical dry var + why

$.research_dry — THE canonical shell-run LLM-dry signal. InitializeInput seeds it false on every run (absent path / the real Sat 02:00 PT firing unchanged); ApplyShellRunDefaults already sets it true under shell_run (it backed Research from the keystone). No new var invented: research #202 / backtester #225 PR bodies specify the dry_run_llm event flag, and reusing $.research_dry makes the absent-path guarantee automatic — the seed already exists, no extra InitializeInput plumbing required.

The change

  • ApplyShellRunDefaults: removed skip_drift_detection / skip_eval_judge / skip_rationale_clustering / skip_replay_concordance / skip_counterfactual from the force-set JsonMerge blob → it now force-sets ZERO skip_*. Per-flag user overrides still win (merge order States.JsonMerge(defaults,$,false) unchanged). The Choice-gated CheckSkip<State> gates are LEFT INTACT (still valid for targeted operator skips — pinned by test_skip_gates_still_intact, all 16 retained).
  • DriftDetection: literal commands array → commands.$ States.Array(...,States.Format('bash infrastructure/spot_drift_detection.sh{} 2>&1 | tee /var/log/drift-detection.log',$.preflight_args)), mirroring DataPhase1 exactly. {} immediately after the script token, no literal space; preflight_args carries its leading space inside the var → preflight_args="" reproduces the origin/main command char-for-char, " --preflight-only" yields exactly one separating space.
  • 7 eval Lambda Payloads: added "dry_run_llm.$": "$.research_dry". EvalRollingMean was NOT touchedalpha-engine-research-eval-rolling-mean, no skip gate, never a keystone exception, pure historical-metric reader (out of scope).

Byte-identical proof (the non-negotiable invariant)

shell_run absent OR false ⇒ Saturday SF behaviorally byte-identical to today's real run:

  • CheckShellRun.Default = CheckSkipMorningEnrich (unchanged). InitializeInput seeds preflight_args="", research_dry=false.
  • Every spot States.Format resolves char-for-char to the frozen origin/main literal (verified via the test's own resolver for all 8 spot states, including DriftDetection now in the frozen fixture). Every eval Lambda dry_run_llm.$ resolves to false (handlers default it false ⇒ behaviourally identical to pre-rewire).
  • tests/fixtures/sf_prekeystone_spot_commands.json regenerated via the established generator at preflight_args=""; now includes DriftDetection's pre-rewire origin/main literal command; the existing 7 entries are unchanged (git diff adds DriftDetection only).
  • CI-safe: tests read only the committed fixture — no git show origin/main shell-out (that was the feat(sf): shell-run keystone — spot --preflight-only + Lambda --dry-run instead of skip #260 CI failure).

Test changes

  • _SPOT_STATES → 8 (added DriftDetection); _DRY_LAMBDA_STATES → 11 (added the 7 eval states); _KEYSTONE_SKIP_EXCEPTIONS = set().
  • test_shell_defaults_force_set_ZERO_skip_exceptions: blob force-sets no skip_*; none of the 16 workload skips (incl. the 5 ex-exceptions) appear.
  • TestHappyPathTraversal: under shell_run skipped == set(); DriftDetection is VISITED (runs dry), not jumped past; absent shell_run = unchanged pre-keystone path.
  • Module + class docstrings updated to the rewire semantics.

Validation

  • JSON valid: 58 top-level states (91 incl. parallel branches).
  • Full alpha-engine-data suite: 1351 passed, 1 skipped, 0 failed.

Zero skip-exceptions remain — every substantive task runs dry under shell_run (spots → --preflight-only, Lambdas → dry_run_llm).

🤖 Generated with Claude Code

…ht + EvalJudge/Rationale/Replay/CF dry_run_llm)

Closes the keystone gap: the 5 documented shell-run skip-exceptions are
flipped skip→dry. Under shell_run EVERY substantive workload now boots +
runs dry; ZERO skip-exceptions remain. All prerequisite dry flags were
already MERGED on origin/main of their repos.

Per-state mechanism:

| State                       | Type   | Mechanism (under shell_run)                          |
|-----------------------------|--------|------------------------------------------------------|
| DriftDetection              | spot   | commands.$ States.Format($.preflight_args) → ` --preflight-only` (data #261) |
| EvalJudgeSubmitFirstSaturday| Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) |
| EvalJudgeSubmitWeekly       | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) |
| EvalJudgePoll               | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) |
| EvalJudgeProcess            | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) |
| RationaleClustering         | Lambda | Payload "dry_run_llm.$": "$.research_dry" (research #202) |
| ReplayConcordance           | Lambda | Payload "dry_run_llm.$": "$.research_dry" (backtester #225) |
| Counterfactual              | Lambda | Payload "dry_run_llm.$": "$.research_dry" (backtester #225) |

Exact canonical dry var: $.research_dry. It is THE canonical shell-run
LLM-dry signal — InitializeInput seeds it false on every run (so the
absent path / real Sat 02:00 PT firing is unchanged); ApplyShellRunDefaults
already sets it true under shell_run (it backed Research from the
keystone). No new var invented — research #202 / backtester #225 PR bodies
specify dry_run_llm, and reusing $.research_dry keeps the absent-path
guarantee automatic (no extra seeding needed; the seed already exists).

Changes:
- ApplyShellRunDefaults: removed skip_drift_detection / skip_eval_judge /
  skip_rationale_clustering / skip_replay_concordance / skip_counterfactual
  from the force-set JsonMerge blob. It now force-sets ZERO skip_*.
  Per-flag user overrides still win (merge order unchanged). The
  Choice-gated CheckSkip<State> gates are LEFT INTACT (still valid for
  targeted operator skips — verified by test_skip_gates_still_intact).
- DriftDetection: literal `commands` array → `commands.$` States.Array
  whose final entry is States.Format('bash infrastructure/
  spot_drift_detection.sh{} 2>&1 | tee /var/log/drift-detection.log',
  $.preflight_args). {} sits immediately after the script token with no
  literal space; preflight_args carries its leading space inside the var,
  so preflight_args="" reproduces the origin/main command char-for-char
  and " --preflight-only" yields exactly one separating space.
- 7 eval Lambda Payloads: added "dry_run_llm.$": "$.research_dry".
  EvalRollingMean (alpha-engine-research-eval-rolling-mean) was NOT touched
  — it has no skip gate, was never a keystone exception, and is a pure
  historical-metric reader (out of scope).

Byte-identical proof approach:
- shell_run absent ⇒ CheckShellRun.Default = CheckSkipMorningEnrich
  (unchanged); InitializeInput seeds preflight_args="", research_dry=false.
  Every spot States.Format resolves char-for-char to the frozen
  origin/main literal; every eval Lambda dry_run_llm.$ resolves to false
  (handlers default it false ⇒ behaviourally identical to pre-rewire).
- The frozen baseline fixture tests/fixtures/sf_prekeystone_spot_commands
  .json now INCLUDES DriftDetection's pre-rewire origin/main literal
  command (regenerated via the established generator at preflight_args="";
  the existing 7 entries are unchanged). The byte-identical test asserts
  DriftDetection's resolved command at preflight_args="" equals that
  frozen baseline and carries --preflight-only (single space) under
  shell_run.
- CI-safe: tests read only the committed fixture (no `git show
  origin/main` shell-out — that was the #260 CI failure).

Tests:
- _SPOT_STATES grew to 8 (added DriftDetection); _DRY_LAMBDA_STATES grew
  to 11 (added the 7 eval states); _KEYSTONE_SKIP_EXCEPTIONS = empty set.
- test_shell_defaults_force_set_ZERO_skip_exceptions asserts the blob
  force-sets no skip_* and none of the 16 workload skips (incl. the 5
  ex-exceptions) appear.
- TestHappyPathTraversal: under shell_run nothing is skipped (skipped ==
  set()); DriftDetection is VISITED (runs dry), not jumped past.
- Module + class docstrings updated to the rewire semantics.

JSON valid (58 top-level states, 91 incl. parallel branches). Full
alpha-engine-data suite: 1351 passed, 1 skipped, 0 failed.

Zero skip-exceptions remain — every substantive task runs dry under
shell_run (spots → --preflight-only, Lambdas → dry_run_llm).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit c64d721 into main May 18, 2026
1 check passed
@cipher813 cipher813 deleted the feat/sf-rewire-close-skip-exceptions branch May 18, 2026 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant