Skip to content

Commit ac4e769

Browse files
cipher813claude
andauthored
Add 6 features: horizon returns + overnight/intraday + dist-from-high (#36)
Predictor ROADMAP P2 diagnostic — test whether 5d forward is reversal regime vs momentum regime, and whether splitting close-to-close returns into overnight/intraday components improves signal. New features (all technical group): return_60d, return_120d Longer-horizon momentum. Neutral name — meta ridge coefficient sign determines regime. At 5d forward, short-horizon returns load negative (reversal). If 60d/120d load positive, momentum persists at longer lookback — a well-documented pattern (Jegadeesh/Titman 1993). overnight_return_5d, intraday_return_5d 5d sum of (Open_t / Close_{t-1} - 1) vs (Close_t / Open_t - 1). Lou/Polk/Skouras 2019 "A Tug of War" found overnight persists positive (earnings, news, macro) while intraday is noisier and often negative (microstructure, flow). Total momentum_5d ≈ overnight_5d + intraday_5d. Decomposing lets the model learn different dynamics. NaN when Open column is missing (no silent zero-fill per feedback_no_silent_fails). dist_from_5d_high, dist_from_20d_high Reversal-native signals. Distance from recent peak as fraction: (Close - rolling_max(High, N)) / rolling_max(High, N). Always ≤ 0. A stock at its 5d high has no short-term reversal room; a stock pulled back has more. Conceptually cleaner than past returns for reversal signal. Registry: 6 FeatureEntry rows added under "v3.1 technical additions". FEATURES list in feature_engineer.py goes from 53 → 59. dropna still correct — rows missing any required feature are dropped. ## Test plan - [x] Synthetic OHLCV smoke: all 6 features compute, values in sensible ranges (dist_from_5d_high ≤ 0 always, overnight/intraday small magnitudes, return_60d/120d larger). - [x] Full suite: 43 passed. - [ ] After merge: re-run alpha-engine-data backfill to populate historical rows in ArcticDB with the new columns. - [ ] Predictor PR B2 (follow-up) adds features to MOMENTUM_FEATURES list + 21d forward IC diagnostic. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 0fbd6d5 commit ac4e769

2 files changed

Lines changed: 68 additions & 0 deletions

File tree

features/feature_engineer.py

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,17 @@
120120
"gross_margin",
121121
"roe",
122122
"current_ratio",
123+
# v3.1 additions — longer-horizon + overnight/intraday decomposition +
124+
# reversal-native signals. Predictor ROADMAP P2: collapse FLAT +
125+
# test whether 5d is reversal or momentum regime. 2026-04-15: neutral
126+
# names chosen — meta ridge coefficient sign determines whether the
127+
# feature behaves as reversal (positive coef) or momentum (negative).
128+
"return_60d",
129+
"return_120d",
130+
"overnight_return_5d",
131+
"intraday_return_5d",
132+
"dist_from_5d_high",
133+
"dist_from_20d_high",
123134
]
124135

125136
MIN_ROWS_FOR_FEATURES = 265 # 252 warmup + buffer
@@ -272,6 +283,55 @@ def compute_features(
272283
_mom_short = _FC["momentum_short"]
273284
df["momentum_5d"] = (close / close.shift(_mom_short)) - 1.0
274285

286+
# ── v3.1: Longer-horizon returns ──────────────────────────────────────────
287+
# ROADMAP P2 diagnostic — test whether 5d is the right label horizon.
288+
# Neutral naming: meta ridge coefficient sign determines reversal vs
289+
# momentum regime. Positive coef → reversal (high past returns predict
290+
# negative future returns). Negative coef → momentum persists at this
291+
# horizon.
292+
df["return_60d"] = (close / close.shift(60)) - 1.0
293+
df["return_120d"] = (close / close.shift(120)) - 1.0
294+
295+
# ── v3.1: Overnight / intraday decomposition ──────────────────────────────
296+
# Lou/Polk/Skouras 2019 "A Tug of War": overnight returns
297+
# (Open_t vs Close_{t-1}) have been historically persistent and positive
298+
# (earnings, news, macro), while intraday returns (Close_t vs Open_t)
299+
# have been noisier and often negative (microstructure, flow). Total
300+
# 5d return = overnight_5d + intraday_5d (approximately — compounding
301+
# differences are small at 5d horizons and this additive sum is the
302+
# form used in the source literature).
303+
if "Open" in df.columns:
304+
open_ = df["Open"].astype(float)
305+
overnight_daily = (open_ / close.shift(1)) - 1.0
306+
intraday_daily = (close / open_) - 1.0
307+
df["overnight_return_5d"] = overnight_daily.rolling(
308+
window=_mom_short, min_periods=_mom_short,
309+
).sum()
310+
df["intraday_return_5d"] = intraday_daily.rolling(
311+
window=_mom_short, min_periods=_mom_short,
312+
).sum()
313+
else:
314+
# Without Open, these features are undefined — NaN propagates and
315+
# dropna will exclude the ticker. No silent zero-fill (per
316+
# feedback_no_silent_fails).
317+
df["overnight_return_5d"] = float("nan")
318+
df["intraday_return_5d"] = float("nan")
319+
320+
# ── v3.1: Distance from recent highs (reversal-native) ────────────────────
321+
# Distance from recent peak is a cleaner reversal signal than past
322+
# returns: a stock at its 5d high has nowhere to "continue" in the
323+
# short-term reversal regime, while a stock pulled back from its 5d
324+
# high has more room to mean-revert. Negative values always (close
325+
# cannot exceed max by definition). Closer to zero = near high.
326+
if "High" in df.columns:
327+
high_col = df["High"].astype(float)
328+
else:
329+
high_col = close
330+
rolling_max_5 = high_col.rolling(window=5, min_periods=5).max()
331+
rolling_max_20 = high_col.rolling(window=20, min_periods=20).max()
332+
df["dist_from_5d_high"] = (close - rolling_max_5) / rolling_max_5
333+
df["dist_from_20d_high"] = (close - rolling_max_20) / rolling_max_20
334+
275335
# ── Relative volume ratio ──────────────────────────────────────────────────
276336
rolling_mean_vol_20 = volume.rolling(window=_vol_slow, min_periods=_vol_slow).mean()
277337
df["rel_volume_ratio"] = volume / rolling_mean_vol_20.replace(0, float("nan"))

features/registry.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,14 @@ class FeatureEntry:
8585
FeatureEntry("iv_rank", "alternative", "IV percentile rank (0-1)", source="yfinance", refresh="weekly"),
8686
FeatureEntry("iv_vs_rv", "alternative", "Implied vol / realized vol ratio", source="yfinance", refresh="weekly"),
8787

88+
# ── v3.1 technical additions — horizon + decomposition + reversal-native ──
89+
FeatureEntry("return_60d", "technical", "60-day price return (Close_t / Close_{t-60} - 1)", source="yfinance", refresh="daily"),
90+
FeatureEntry("return_120d", "technical", "120-day price return (Close_t / Close_{t-120} - 1)", source="yfinance", refresh="daily"),
91+
FeatureEntry("overnight_return_5d", "technical", "5d sum of overnight returns (Open_t vs Close_{t-1})", source="yfinance", refresh="daily"),
92+
FeatureEntry("intraday_return_5d", "technical", "5d sum of intraday returns (Close_t vs Open_t)", source="yfinance", refresh="daily"),
93+
FeatureEntry("dist_from_5d_high", "technical", "(Close - 5d rolling max High) / 5d rolling max High", source="yfinance", refresh="daily"),
94+
FeatureEntry("dist_from_20d_high", "technical", "(Close - 20d rolling max High) / 20d rolling max High", source="yfinance", refresh="daily"),
95+
8896
# ── Fundamental (8) — quarterly financials ────────────────────────────────
8997
FeatureEntry("pe_ratio", "fundamental", "Trailing P/E ratio, normalized (PE / 30)", source="fmp", refresh="quarterly"),
9098
FeatureEntry("pb_ratio", "fundamental", "Price-to-book ratio, normalized (PB / 5)", source="fmp", refresh="quarterly"),

0 commit comments

Comments
 (0)