Skip to content

feat(tech-ablation): two-flag auto-apply path with 4-week reproduction gate (ROADMAP L2553)#229

Merged
cipher813 merged 1 commit into
mainfrom
feat/tech-weight-ablation-apply
May 19, 2026
Merged

feat(tech-ablation): two-flag auto-apply path with 4-week reproduction gate (ROADMAP L2553)#229
cipher813 merged 1 commit into
mainfrom
feat/tech-weight-ablation-apply

Conversation

@cipher813
Copy link
Copy Markdown
Owner

ROADMAP: L2553 — Tech weight ablation auto-apply — parallel-observation cutover (P1)

Cutover follow-on to alpha-engine-backtester#174 (tech weight ablation recommendation-only). The compute step is unchanged; this PR wires the gated write path that L2553 specifies.

Activation contract

Both flags default false → bit-identical behavior to today.

Flag (under tech_weight_ablation: in backtester config.yaml) What fires Live config
Both false nothing unchanged
use_tech_ablation_target=True (Stage 1) shadow archive written every "ok" week to config/scoring_weights_per_sector_shadow_history/{run_id}.json + latest.json sidecar unchanged
use_tech_ablation_target=True AND enforce_tech_ablation=True (Stage 2) shadow archive + live write to config/scoring_weights_per_sector.json iff reproduction gate passes written when same per_sector payload reproduces across last 4 shadow archives

Reproduction gate

_MIN_CONSECUTIVE_WEEKS = 4. Byte-equal per_sector match across the last 4 shadow archives required (the L2553 "4+ consecutive Saturdays" acceptance). One drift week breaks the streak — the gate explicitly does NOT tolerate intermittent shadow drift, matching the "recommendation must reproduce" framing.

Implementation

  • optimizer/tech_weight_ablation.py:

    • init_config() + module-level _cfg (mirrors executor_optimizer).
    • _build_per_sector_payload()recommendations: {team_id -> config_name}{team_id -> {weight_name -> value}} via DEFAULT_GRID lookup. Unknown config_name drops cleanly (forward-compat guard).
    • _read_recent_shadow_archives() — lex-sorts YYMMDDHHMM keys descending; missing/corrupt archives skip with a WARNING (treated as "reproduction not yet reached", not a hard fail).
    • _check_reproduction_gate() — returns {passed, reason, n_consecutive}.
    • apply() — orchestrates the two stages.
  • evaluate.py:

    • tech_weight_ablation.init_config(config) added alongside the other optimizers.
    • New _run_tech_weight_ablation() helper hangs compute + apply together (with --freeze short-circuit), replacing the prior inline lambda — mirrors _run_executor_opt / _run_weight_opt.

Tests (+12 new, suite 1694 → 1706 green)

  • TestBuildPerSectorPayload × 3 — name→weights mapping including the unknown-name drop + empty-recommendations.
  • TestApplyShadowGating × 4 — flag-off skips, status-not-ok skips, empty recommendations skips, shadow-only mode writes archive but NOT live.
  • TestReproductionGate × 3 — insufficient history fails, 4-in-a-row matches pass, 1 drift breaks the streak.
  • TestApplyLiveGating × 3 — enforce+insufficient history → shadow only; enforce+full reproduction → live written; enforce+drift in history → blocked + live untouched.
  • One pre-existing test (test_recommendation_only_no_apply) reframed: the old apply_note string was the artifact this PR closes; new note points readers at apply() + the two flags.

S3 contract

Both keys are new additionsconfig/scoring_weights_per_sector.json and config/scoring_weights_per_sector_shadow_history/. No consumer reads them today, so this PR is forward-compatible on its own.

The research-side consumer (per-sector composite_weights override layer on top of scoring.yaml) is operator-owned: alpha-engine-research/config.py is gitignored per repo policy, and the per-sector schema is already in scoring.yaml (see _load_live_composite_weights_per_sector in this same module). The override-read can land when Brian flips use_tech_ablation_target=True and decides to wire the consumer.

Until then, the writes are pure observation — exactly the L2553 design intent for the recommendation-stability soak.

Data gate (not a code gate)

The compute step still emits insufficient_data until ~30 sub-score-populated rows/team accumulate; apply() correctly no-ops on that path. First non-empty recommendation is expected ~2026-06-27 per the research v15-migration accumulation timeline.

Composes with

  • alpha-engine-backtester#174 (recommendation-only, this PR's prerequisite).
  • alpha-engine-research v15 sub-score persistence (data gate).
  • executor_optimizer.apply() (the pattern this PR mirrors).

🤖 Generated with Claude Code

…n gate (ROADMAP L2553)

Wires the auto-apply cutover follow-on to alpha-engine-backtester#174
(tech weight ablation recommendation-only). The compute step is
unchanged; this PR ships the gated write path.

Activation
- Both flags default false → bit-identical behavior to today.
- `tech_weight_ablation.use_tech_ablation_target=True` (Stage 1):
  every "ok" compute result writes a shadow payload to
  `config/scoring_weights_per_sector_shadow_history/{run_id}.json` +
  `latest.json` sidecar. Live config untouched. Pure observability.
- `tech_weight_ablation.enforce_tech_ablation=True` (Stage 2): live
  write to `config/scoring_weights_per_sector.json` fires ONLY when
  the reproduction gate passes — the same per-sector payload must
  reproduce across the last `_MIN_CONSECUTIVE_WEEKS = 4` shadow
  archives (the L2553 "4+ consecutive Saturdays" acceptance). One
  drift week breaks the streak; gate explicitly NOT tolerant of
  intermittent shadow drift.

Implementation
- `optimizer/tech_weight_ablation.py`:
  - `init_config()` + module-level `_cfg` (mirrors executor_optimizer).
  - `_build_per_sector_payload()` translates `recommendations:
    {team_id -> config_name}` to `{team_id -> {weight_name -> value}}`
    via DEFAULT_GRID lookup; unknown config_name drops cleanly.
  - `_read_recent_shadow_archives()` lex-sorts YYMMDDHHMM keys
    descending; missing/corrupt archives skip with a warning (treated
    as "reproduction not yet reached", not a hard fail).
  - `_check_reproduction_gate()` returns {passed, reason,
    n_consecutive} with byte-equal per_sector match across the window.
  - `apply()` orchestrates the two stages; mirrors
    `executor_optimizer.apply()` so the evaluator wiring stays
    uniform.
- `evaluate.py`:
  - `tech_weight_ablation.init_config(config)` added alongside the
    other optimizers.
  - New `_run_tech_weight_ablation()` helper hangs compute + apply
    together (with `--freeze` short-circuit), replacing the prior
    inline lambda.

Tests (+12 new, suite 1694 → 1706 green)
- `TestBuildPerSectorPayload` × 3 — name→weights mapping incl.
  unknown-name fallthrough + empty-recommendations.
- `TestApplyShadowGating` × 4 — flag-off skips, status-not-ok skips,
  empty recommendations skips, shadow-only mode writes archive but
  NOT live key.
- `TestReproductionGate` × 3 — insufficient history (0 archives),
  exact 4-in-a-row match passes, 1 drift breaks the streak.
- `TestApplyLiveGating` × 3 — enforce+insufficient history → shadow
  only, enforce+full reproduction → live written, enforce+drift in
  history → blocked + live untouched.
- One pre-existing test (`test_recommendation_only_no_apply`) updated:
  the old `apply_note` string "recommendation-only — auto-apply
  gated on parallel observation cutover (follow-up PR)" was the
  artifact this PR closes; new note points readers at `apply()` +
  the two flags.

S3 contract
- Both keys are NEW additions: no consumer reads them today, so
  no backward-compat concern. The research-side consumer side
  (per-sector composite_weights override read) is operator-owned —
  `alpha-engine-research/config.py` is gitignored — and will land
  when Brian flips `use_tech_ablation_target` and decides to wire
  the override. Without that, the writes are observation-only.

Data gate (not a code gate)
- The compute step still emits `insufficient_data` until ~30
  sub-score-populated rows/team accumulate; `apply()` correctly
  no-ops on that. First non-empty recommendation is expected
  ~2026-06-27 per the v15-migration accumulation timeline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit 5238119 into main May 19, 2026
1 check passed
@cipher813 cipher813 deleted the feat/tech-weight-ablation-apply branch May 19, 2026 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant