Skip to content

fix(research-gbm): tighten 21d-horizon overfitting + formalize observe rail#180

Merged
cipher813 merged 1 commit into
mainfrom
feat/research-gbm-21d-overfitting-fix
May 19, 2026
Merged

fix(research-gbm): tighten 21d-horizon overfitting + formalize observe rail#180
cipher813 merged 1 commit into
mainfrom
feat/research-gbm-21d-overfitting-fix

Conversation

@cipher813
Copy link
Copy Markdown
Owner

ROADMAP L1816 P1 (promoted from P2 2026-05-15; ship-now-anyway direction 2026-05-19 despite the entry's own ≥30-week-corpus advisory — explicit opt-in at 22 weeks). Bounded-risk fix: the Ridge regularization absorbs today's overfit (research_calibrator_prob std-coef only +0.05; meta_IC driven by sector_macro_modifier + research_composite_score), so this is an integrity tightening, NOT a bleeding-alpha fix.

Evidence

The 2026-05-09 promote-true retrain at the new 21d horizon showed:

train_ic=0.4328 / val_ic=0.0857ratio 5.05 on 496 rows

Threshold cited in PR #113's manifest spec is >2×; 5.05× is decisive overfit. Cause: the 21d horizon shrinks the labeled fit set vs the pre-cutover 5d horizon, and num_leaves=15 / max_depth=4 was sized for the 5d larger-corpus regime.

What ships

Component Change
model/research_gbm.py::_default_params num_leaves 15→8, max_depth 4→3
training/meta_trainer.py _compute_overfit_signal(train_ic, val_ic) helper + _emit_research_gbm_overfit_metrics(...) CW dispatcher, both wired into Step 6c after the existing train_ic computation. Warns loud on ratio > 3.0; manifest research_gbm block now carries train_val_ic_ratio + overfit_warn alongside existing train_ic + val_ic
tests/test_research_gbm_scorer.py test_default_params_overfit_tightening_2026_05_19 pins 8/3 vs a future revert
tests/test_research_gbm_overfit_signal.py (new, 13 tests) Ratio formula, threshold boundary (strict >), None contracts, negative-val_ic abs handling, CW emit contract incl. CloudWatch-failure defensiveness

2-cycle persistence alarm is satisfied by setting up the CW alarm on research_gbm_overfit_warn > 0 with DatapointsToAlarm=2 / EvaluationPeriods=2 (weekly cadence); no code change needed once the gauges exist.

What does NOT ship

  • Per-fold WF GBM training — the structurally-clean remediation. At 22 weeks of labeled history, per-fold rows fall below min_child_samples=30; the entry's own caveat at line 1347-1353 documents this. Re-promoted to the entry's >=30 weeks (~2026-06-20) milestone.
  • Manifest schema break — additive only. Pre-existing consumers see identical fields.

Tests

pytest tests/ -q  →  1110 passed, 0 failed

Composes with: PR #113 (manifest train/val IC emit substrate), PR #114 (canonical-alpha cutover that surfaced the overfit by shrinking the fit set), [[feedback_component_baseline_validation]].

🤖 Generated with Claude Code

…e rail

ROADMAP L1816 P1 (promoted from P2 2026-05-15; "ship now anyway" direction
2026-05-19 despite the entry's own >=30-week-corpus advisory — Brian
explicitly opted in at 22 weeks). Bounded-risk fix: the Ridge regularization
absorbs today's overfit (`research_calibrator_prob` std-coef only +0.05;
meta_IC driven by `sector_macro_modifier` + `research_composite_score`)
so this is an integrity tightening, NOT a bleeding-alpha fix.

## Evidence the fix targets

The 2026-05-09 promote-true retrain at the new 21d horizon (post Track A
PR 4/5 canonical-alpha cutover) showed:

  train_ic=0.4328 / val_ic=0.0857 ⇒ ratio 5.05 on 496 rows

Threshold cited in alpha-engine-predictor PR #113's manifest spec is >2×;
5.05× is decisive overfit. Cause: the 21d horizon shrinks the labeled fit
set vs the pre-cutover 5d horizon (fewer rows have finite `actual_fwd`
labels) and `num_leaves=15 / max_depth=4` was sized for the 5d
larger-corpus regime.

## What ships

1. **`model/research_gbm.py::_default_params`**: `num_leaves` 15→8 +
   `max_depth` 4→3. Caps the LightGBM hypothesis space below the row
   count of a 21d-horizon corpus. Docstring carries the ROADMAP
   citation + the bounded-risk rationale. Other params unchanged
   (lambda_l1/l2, min_child_samples, learning_rate kept — they
   weren't the overfit lever).

2. **`training/meta_trainer.py`** — formalize the observe rail:
   - `_compute_overfit_signal(train_ic, val_ic, warn_threshold=3.0)`
     returns `(ratio, warn)`. Magnitude-based denominator so a negative
     val_ic still measures train-fit dominance. Returns `(None, False)`
     when train_ic missing OR |val_ic| < 1e-3 noise floor — surfaces
     "couldn't measure" as a distinct state from "measured and healthy".
   - `_emit_research_gbm_overfit_metrics(...)` dispatches CloudWatch
     gauges (namespace `AlphaEngine/Predictor`):
     `research_gbm_train_ic`, `research_gbm_val_ic`,
     `research_gbm_train_val_ic_ratio`, `research_gbm_overfit_warn`.
     Best-effort emit (mirrors existing `_emit_research_join_coverage_metrics`
     pattern); a CW failure WARNs + continues training.
   - The Step 6c block now computes the ratio + warn after `train_ic`
     and: (a) WARN-logs when the flag fires (loud, immediate signal);
     (b) emits the CW gauges; (c) carries the ratio + warn into the
     manifest's `research_gbm` block alongside `train_ic` + `val_ic`.
   - Threshold 3.0 chosen as the halfway between the 2× watch level +
     the 5.05× we observed — operator alerts only on confirmed regression.
   - 2-cycle persistence alarm is satisfied by setting up the CW alarm
     on `research_gbm_overfit_warn > 0` with `DatapointsToAlarm=2 /
     EvaluationPeriods=2` (weekly cadence); no code change needed once
     the gauges exist.

3. **Tests** (+13 net):
   - `tests/test_research_gbm_scorer.py`:
     `test_default_params_overfit_tightening_2026_05_19` pins 8/3 vs
     a future revert.
   - `tests/test_research_gbm_overfit_signal.py` (new, 13 tests):
     ratio formula (incl. the historical 5.05 case), threshold boundary
     (strict >), None contracts (train_ic absent / val_ic noise-floor),
     negative-val_ic abs handling, custom threshold override, CW emit
     contract (full metric set when warn / 0-warn when healthy / no
     emit when can't-measure / defensive train_ic absent / best-effort
     on CloudWatch failure).

## What does NOT ship

- **Per-fold WF GBM training**: the structurally-clean remediation
  (currently the Step 6c fit is a single 80/20 temporal split). At
  22 weeks of labeled history, per-fold rows fall below
  `min_child_samples=30`; the entry's own caveat at line 1347-1353
  documents this. Re-promoted to the entry's "do at >=30 weeks
  (~2026-06-20)" milestone.
- **Manifest schema change beyond additive fields**: `train_val_ic_ratio`
  + `overfit_warn` are additive only. Pre-existing consumers see
  identical fields; downstream `feature_drift` / dashboard reads
  ignore unknown keys (no contract break).

## Tests

  pytest tests/ -q  ->  1110 passed, 0 failed

Composes with: alpha-engine-predictor PR #113 (manifest train/val IC
emit substrate), PR #114 (canonical-alpha cutover that surfaced the
overfit by shrinking the fit set), feedback_component_baseline_validation
(L1-component subsample gate already protects deploys from a degenerate
research_gbm — this entry adds observe-rail visibility on top).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit e99e0b5 into main May 19, 2026
1 check passed
@cipher813 cipher813 deleted the feat/research-gbm-21d-overfitting-fix branch May 19, 2026 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant