feat(tech-ablation): two-flag auto-apply path with 4-week reproduction gate (ROADMAP L2553)#229
Merged
Merged
Conversation
…n gate (ROADMAP L2553)
Wires the auto-apply cutover follow-on to alpha-engine-backtester#174
(tech weight ablation recommendation-only). The compute step is
unchanged; this PR ships the gated write path.
Activation
- Both flags default false → bit-identical behavior to today.
- `tech_weight_ablation.use_tech_ablation_target=True` (Stage 1):
every "ok" compute result writes a shadow payload to
`config/scoring_weights_per_sector_shadow_history/{run_id}.json` +
`latest.json` sidecar. Live config untouched. Pure observability.
- `tech_weight_ablation.enforce_tech_ablation=True` (Stage 2): live
write to `config/scoring_weights_per_sector.json` fires ONLY when
the reproduction gate passes — the same per-sector payload must
reproduce across the last `_MIN_CONSECUTIVE_WEEKS = 4` shadow
archives (the L2553 "4+ consecutive Saturdays" acceptance). One
drift week breaks the streak; gate explicitly NOT tolerant of
intermittent shadow drift.
Implementation
- `optimizer/tech_weight_ablation.py`:
- `init_config()` + module-level `_cfg` (mirrors executor_optimizer).
- `_build_per_sector_payload()` translates `recommendations:
{team_id -> config_name}` to `{team_id -> {weight_name -> value}}`
via DEFAULT_GRID lookup; unknown config_name drops cleanly.
- `_read_recent_shadow_archives()` lex-sorts YYMMDDHHMM keys
descending; missing/corrupt archives skip with a warning (treated
as "reproduction not yet reached", not a hard fail).
- `_check_reproduction_gate()` returns {passed, reason,
n_consecutive} with byte-equal per_sector match across the window.
- `apply()` orchestrates the two stages; mirrors
`executor_optimizer.apply()` so the evaluator wiring stays
uniform.
- `evaluate.py`:
- `tech_weight_ablation.init_config(config)` added alongside the
other optimizers.
- New `_run_tech_weight_ablation()` helper hangs compute + apply
together (with `--freeze` short-circuit), replacing the prior
inline lambda.
Tests (+12 new, suite 1694 → 1706 green)
- `TestBuildPerSectorPayload` × 3 — name→weights mapping incl.
unknown-name fallthrough + empty-recommendations.
- `TestApplyShadowGating` × 4 — flag-off skips, status-not-ok skips,
empty recommendations skips, shadow-only mode writes archive but
NOT live key.
- `TestReproductionGate` × 3 — insufficient history (0 archives),
exact 4-in-a-row match passes, 1 drift breaks the streak.
- `TestApplyLiveGating` × 3 — enforce+insufficient history → shadow
only, enforce+full reproduction → live written, enforce+drift in
history → blocked + live untouched.
- One pre-existing test (`test_recommendation_only_no_apply`) updated:
the old `apply_note` string "recommendation-only — auto-apply
gated on parallel observation cutover (follow-up PR)" was the
artifact this PR closes; new note points readers at `apply()` +
the two flags.
S3 contract
- Both keys are NEW additions: no consumer reads them today, so
no backward-compat concern. The research-side consumer side
(per-sector composite_weights override read) is operator-owned —
`alpha-engine-research/config.py` is gitignored — and will land
when Brian flips `use_tech_ablation_target` and decides to wire
the override. Without that, the writes are observation-only.
Data gate (not a code gate)
- The compute step still emits `insufficient_data` until ~30
sub-score-populated rows/team accumulate; `apply()` correctly
no-ops on that. First non-empty recommendation is expected
~2026-06-27 per the v15-migration accumulation timeline.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ROADMAP: L2553 — Tech weight ablation auto-apply — parallel-observation cutover (P1)
Cutover follow-on to alpha-engine-backtester#174 (tech weight ablation recommendation-only). The compute step is unchanged; this PR wires the gated write path that L2553 specifies.
Activation contract
Both flags default false → bit-identical behavior to today.
tech_weight_ablation:in backtester config.yaml)use_tech_ablation_target=True(Stage 1)config/scoring_weights_per_sector_shadow_history/{run_id}.json+latest.jsonsidecaruse_tech_ablation_target=TrueANDenforce_tech_ablation=True(Stage 2)config/scoring_weights_per_sector.jsoniff reproduction gate passesReproduction gate
_MIN_CONSECUTIVE_WEEKS = 4. Byte-equalper_sectormatch across the last 4 shadow archives required (the L2553 "4+ consecutive Saturdays" acceptance). One drift week breaks the streak — the gate explicitly does NOT tolerate intermittent shadow drift, matching the "recommendation must reproduce" framing.Implementation
optimizer/tech_weight_ablation.py:init_config()+ module-level_cfg(mirrorsexecutor_optimizer)._build_per_sector_payload()—recommendations: {team_id -> config_name}→{team_id -> {weight_name -> value}}viaDEFAULT_GRIDlookup. Unknown config_name drops cleanly (forward-compat guard)._read_recent_shadow_archives()— lex-sorts YYMMDDHHMM keys descending; missing/corrupt archives skip with a WARNING (treated as "reproduction not yet reached", not a hard fail)._check_reproduction_gate()— returns{passed, reason, n_consecutive}.apply()— orchestrates the two stages.evaluate.py:tech_weight_ablation.init_config(config)added alongside the other optimizers._run_tech_weight_ablation()helper hangs compute + apply together (with--freezeshort-circuit), replacing the prior inline lambda — mirrors_run_executor_opt/_run_weight_opt.Tests (+12 new, suite 1694 → 1706 green)
TestBuildPerSectorPayload× 3 — name→weights mapping including the unknown-name drop + empty-recommendations.TestApplyShadowGating× 4 — flag-off skips, status-not-ok skips, empty recommendations skips, shadow-only mode writes archive but NOT live.TestReproductionGate× 3 — insufficient history fails, 4-in-a-row matches pass, 1 drift breaks the streak.TestApplyLiveGating× 3 — enforce+insufficient history → shadow only; enforce+full reproduction → live written; enforce+drift in history → blocked + live untouched.test_recommendation_only_no_apply) reframed: the oldapply_notestring was the artifact this PR closes; new note points readers atapply()+ the two flags.S3 contract
Both keys are new additions —
config/scoring_weights_per_sector.jsonandconfig/scoring_weights_per_sector_shadow_history/. No consumer reads them today, so this PR is forward-compatible on its own.The research-side consumer (per-sector
composite_weightsoverride layer on top ofscoring.yaml) is operator-owned:alpha-engine-research/config.pyis gitignored per repo policy, and the per-sector schema is already inscoring.yaml(see_load_live_composite_weights_per_sectorin this same module). The override-read can land when Brian flipsuse_tech_ablation_target=Trueand decides to wire the consumer.Until then, the writes are pure observation — exactly the L2553 design intent for the recommendation-stability soak.
Data gate (not a code gate)
The compute step still emits
insufficient_datauntil ~30 sub-score-populated rows/team accumulate;apply()correctly no-ops on that path. First non-empty recommendation is expected ~2026-06-27 per the research v15-migration accumulation timeline.Composes with
executor_optimizer.apply()(the pattern this PR mirrors).🤖 Generated with Claude Code