feat(universe_returns): key on trading days, decouple from signal folders#39
Merged
Merged
Conversation
…ders Rewrites the collector enumeration from "list signal folder dates in S3 and process each" to "walk backwards through NYSE trading days and process any whose 5d forward window has closed." The table's grain is now "one row per ticker per trading day" — the natural grain for downstream evaluation. Why (discussion on 2026-04-15): The backtester's _scanner_lift / _team_lift / _cio_lift joins on (ticker, eval_date) between scanner_evaluations etc. and universe_returns. Before this change, universe_returns eval_dates came from signal folder names — sporadic and timestamped with whatever research happened to stamp (run date, next trading day, etc.). The join's behaviour then depended on whether research happened to run that week and on which date-stamping convention was in effect. After this change, universe_returns stores a row for every NYSE trading day in a 90-day rolling window, regardless of research cadence. The evaluator can match any trading-day eval_date on the scanner/team/cio side; schedule drift or research misfires no longer blow holes in the grade surface. Implementation: - Added _trading_days_to_process(today, max_lookback, existing) that walks backwards through trading days (via the repo-root trading_calendar.is_trading_day for holiday awareness) and yields dates whose +5 business days window has closed and which are not already populated. - collect() now uses the enumerator directly. The signals_prefix arg is kept in the signature for call-site compatibility (weekly_collector.py passes it) but is unused. - Added max_lookback_trading_days arg (default 90) — enough for all rolling evaluator windows with plenty of slack. - Removed _list_signal_dates (dead) and the local _is_trading_day helper (superseded by the holiday-aware shared module function). - Removed the "deferred" concept from the return dict; the enumerator guarantees every enqueued date has a closed window. Builds on PR #38's still-valid changes (INSERT OR REPLACE + NULL-aware _get_existing_dates) which let reprocessed trading days actually overwrite stale NULL-return rows. Test suite: 46/46 unit tests pass. Smoke-tested _trading_days_to_process with today=2026-04-15, max_lookback=15 → returns 9 trading days (3/25 through 4/7, correctly skipping Good Friday 2026-04-03 and recent dates whose 5d window hasn't closed).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Decouples `universe_returns` from signal folder names. The collector now enumerates NYSE trading days directly (holiday-aware, via `trading_calendar.is_trading_day`) and populates one row per ticker per trading day within a configurable rolling window (default 90 trading days).
Builds on `#38` (merged earlier today): `INSERT OR REPLACE` + NULL-aware `_get_existing_dates` remain, so reprocessed dates actually overwrite stale NULL-return rows.
Why
The backtester's `_scanner_lift` / `_team_lift` / `_cio_lift` joins on `(ticker, eval_date)`. Previously `universe_returns.eval_date` came from signal folder names — sporadic, weekly, and subject to whatever stamping convention research was using (Saturday today, next_trading_day Monday, etc.). Every change to research's cadence or stamping produced a ripple through the join semantics.
After this PR, `universe_returns` has the grain it should have had from the start: "5d forward returns, one row per ticker per trading day." Evaluator joins succeed regardless of how often research runs or how it chose to stamp a given run.
Changes
Test plan
Context
Discussion in 2026-04-15 session on the `eval_date` semantics mismatch that left `grading.json`'s `research.scanner` component stuck at `n_passing=0`. Three layered bugs; this is the third and most architecturally correct fix.
🤖 Generated with Claude Code