fix(research-gbm): tighten 21d-horizon overfitting + formalize observe rail by cipher813 · Pull Request #180 · cipher813/alpha-engine-predictor

cipher813 · 2026-05-19T21:56:27Z

ROADMAP L1816 P1 (promoted from P2 2026-05-15; ship-now-anyway direction 2026-05-19 despite the entry's own ≥30-week-corpus advisory — explicit opt-in at 22 weeks). Bounded-risk fix: the Ridge regularization absorbs today's overfit (research_calibrator_prob std-coef only +0.05; meta_IC driven by sector_macro_modifier + research_composite_score), so this is an integrity tightening, NOT a bleeding-alpha fix.

Evidence

The 2026-05-09 promote-true retrain at the new 21d horizon showed:

train_ic=0.4328 / val_ic=0.0857 ⇒ ratio 5.05 on 496 rows

Threshold cited in PR #113's manifest spec is >2×; 5.05× is decisive overfit. Cause: the 21d horizon shrinks the labeled fit set vs the pre-cutover 5d horizon, and num_leaves=15 / max_depth=4 was sized for the 5d larger-corpus regime.

What ships

Component	Change
`model/research_gbm.py::_default_params`	`num_leaves` 15→8, `max_depth` 4→3
`training/meta_trainer.py`	`_compute_overfit_signal(train_ic, val_ic)` helper + `_emit_research_gbm_overfit_metrics(...)` CW dispatcher, both wired into Step 6c after the existing train_ic computation. Warns loud on ratio > 3.0; manifest `research_gbm` block now carries `train_val_ic_ratio` + `overfit_warn` alongside existing `train_ic` + `val_ic`
`tests/test_research_gbm_scorer.py`	`test_default_params_overfit_tightening_2026_05_19` pins 8/3 vs a future revert
`tests/test_research_gbm_overfit_signal.py` (new, 13 tests)	Ratio formula, threshold boundary (strict >), None contracts, negative-val_ic abs handling, CW emit contract incl. CloudWatch-failure defensiveness

2-cycle persistence alarm is satisfied by setting up the CW alarm on research_gbm_overfit_warn > 0 with DatapointsToAlarm=2 / EvaluationPeriods=2 (weekly cadence); no code change needed once the gauges exist.

What does NOT ship

Per-fold WF GBM training — the structurally-clean remediation. At 22 weeks of labeled history, per-fold rows fall below min_child_samples=30; the entry's own caveat at line 1347-1353 documents this. Re-promoted to the entry's >=30 weeks (~2026-06-20) milestone.
Manifest schema break — additive only. Pre-existing consumers see identical fields.

Tests

pytest tests/ -q  →  1110 passed, 0 failed

Composes with: PR #113 (manifest train/val IC emit substrate), PR #114 (canonical-alpha cutover that surfaced the overfit by shrinking the fit set), [[feedback_component_baseline_validation]].

🤖 Generated with Claude Code

…e rail ROADMAP L1816 P1 (promoted from P2 2026-05-15; "ship now anyway" direction 2026-05-19 despite the entry's own >=30-week-corpus advisory — Brian explicitly opted in at 22 weeks). Bounded-risk fix: the Ridge regularization absorbs today's overfit (`research_calibrator_prob` std-coef only +0.05; meta_IC driven by `sector_macro_modifier` + `research_composite_score`) so this is an integrity tightening, NOT a bleeding-alpha fix. ## Evidence the fix targets The 2026-05-09 promote-true retrain at the new 21d horizon (post Track A PR 4/5 canonical-alpha cutover) showed: train_ic=0.4328 / val_ic=0.0857 ⇒ ratio 5.05 on 496 rows Threshold cited in alpha-engine-predictor PR #113's manifest spec is >2×; 5.05× is decisive overfit. Cause: the 21d horizon shrinks the labeled fit set vs the pre-cutover 5d horizon (fewer rows have finite `actual_fwd` labels) and `num_leaves=15 / max_depth=4` was sized for the 5d larger-corpus regime. ## What ships 1. **`model/research_gbm.py::_default_params`**: `num_leaves` 15→8 + `max_depth` 4→3. Caps the LightGBM hypothesis space below the row count of a 21d-horizon corpus. Docstring carries the ROADMAP citation + the bounded-risk rationale. Other params unchanged (lambda_l1/l2, min_child_samples, learning_rate kept — they weren't the overfit lever). 2. **`training/meta_trainer.py`** — formalize the observe rail: - `_compute_overfit_signal(train_ic, val_ic, warn_threshold=3.0)` returns `(ratio, warn)`. Magnitude-based denominator so a negative val_ic still measures train-fit dominance. Returns `(None, False)` when train_ic missing OR |val_ic| < 1e-3 noise floor — surfaces "couldn't measure" as a distinct state from "measured and healthy". - `_emit_research_gbm_overfit_metrics(...)` dispatches CloudWatch gauges (namespace `AlphaEngine/Predictor`): `research_gbm_train_ic`, `research_gbm_val_ic`, `research_gbm_train_val_ic_ratio`, `research_gbm_overfit_warn`. Best-effort emit (mirrors existing `_emit_research_join_coverage_metrics` pattern); a CW failure WARNs + continues training. - The Step 6c block now computes the ratio + warn after `train_ic` and: (a) WARN-logs when the flag fires (loud, immediate signal); (b) emits the CW gauges; (c) carries the ratio + warn into the manifest's `research_gbm` block alongside `train_ic` + `val_ic`. - Threshold 3.0 chosen as the halfway between the 2× watch level + the 5.05× we observed — operator alerts only on confirmed regression. - 2-cycle persistence alarm is satisfied by setting up the CW alarm on `research_gbm_overfit_warn > 0` with `DatapointsToAlarm=2 / EvaluationPeriods=2` (weekly cadence); no code change needed once the gauges exist. 3. **Tests** (+13 net): - `tests/test_research_gbm_scorer.py`: `test_default_params_overfit_tightening_2026_05_19` pins 8/3 vs a future revert. - `tests/test_research_gbm_overfit_signal.py` (new, 13 tests): ratio formula (incl. the historical 5.05 case), threshold boundary (strict >), None contracts (train_ic absent / val_ic noise-floor), negative-val_ic abs handling, custom threshold override, CW emit contract (full metric set when warn / 0-warn when healthy / no emit when can't-measure / defensive train_ic absent / best-effort on CloudWatch failure). ## What does NOT ship - **Per-fold WF GBM training**: the structurally-clean remediation (currently the Step 6c fit is a single 80/20 temporal split). At 22 weeks of labeled history, per-fold rows fall below `min_child_samples=30`; the entry's own caveat at line 1347-1353 documents this. Re-promoted to the entry's "do at >=30 weeks (~2026-06-20)" milestone. - **Manifest schema change beyond additive fields**: `train_val_ic_ratio` + `overfit_warn` are additive only. Pre-existing consumers see identical fields; downstream `feature_drift` / dashboard reads ignore unknown keys (no contract break). ## Tests pytest tests/ -q -> 1110 passed, 0 failed Composes with: alpha-engine-predictor PR #113 (manifest train/val IC emit substrate), PR #114 (canonical-alpha cutover that surfaced the overfit by shrinking the fit set), feedback_component_baseline_validation (L1-component subsample gate already protects deploys from a degenerate research_gbm — this entry adds observe-rail visibility on top). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cipher813 merged commit e99e0b5 into main May 19, 2026
1 check passed

cipher813 deleted the feat/research-gbm-21d-overfitting-fix branch May 19, 2026 22:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(research-gbm): tighten 21d-horizon overfitting + formalize observe rail#180

fix(research-gbm): tighten 21d-horizon overfitting + formalize observe rail#180
cipher813 merged 1 commit into
mainfrom
feat/research-gbm-21d-overfitting-fix

cipher813 commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cipher813 commented May 19, 2026

Evidence

What ships

What does NOT ship

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant