fix(universe_returns): unstick NULL return_5d rows + drop weekend signals#38
Merged
Merged
Conversation
…nals
Two related bugs kept scanner/team/cio grading pinned at N/A even when
the merge on (ticker, eval_date) would otherwise have worked.
1. NULL return_5d rows got stuck forever. When a collector run inserted
a date whose 5d forward window hadn't closed yet, the row landed with
NULL return/spy/beat columns. The next run's `existing` set contained
that eval_date, so `dates_to_process` skipped it — the returns were
never backfilled. Combined with `INSERT OR IGNORE` this silently
orphaned Fri 2026-04-03 and other edge cases.
Fix: `_get_existing_dates` now returns only eval_dates where
`return_5d IS NOT NULL`; `_insert_rows` uses INSERT OR REPLACE so a
reprocessed row overwrites the stale NULL one.
2. Weekend signal folders have no market data. Research runs before
alpha-engine-research 9a94e34 (2026-04-13, "Stamp signals with
next_trading_day") wrote signals/{Sat,Sun}/. Polygon has no
grouped-daily data for those dates, so rows landed with NULL returns
and then hit bug #1. Post-fix research stamps trading days, but
the legacy weekend folders still exist in S3 and would keep
retriggering the empty-prices path.
Fix: filter eval_dates to trading days (Mon-Fri) before enqueuing
for processing.
Existing tests pass (46/46). The next Saturday Step Function run will
reprocess the stuck Fri 2026-04-03 row and populate return_5d for any
subsequent trading days whose 5d window has since closed.
Out of scope: market-holiday filtering (e.g. Good Friday). _is_trading_day
only screens weekends; a closed-market weekday will still be attempted —
polygon returns empty and the row is skipped at _build_rows_for_date,
which is acceptable for now. Proper NYSE calendar filtering belongs in
a follow-up.
cipher813
added a commit
that referenced
this pull request
Apr 15, 2026
…ders Rewrites the collector enumeration from "list signal folder dates in S3 and process each" to "walk backwards through NYSE trading days and process any whose 5d forward window has closed." The table's grain is now "one row per ticker per trading day" — the natural grain for downstream evaluation. Why (discussion on 2026-04-15): The backtester's _scanner_lift / _team_lift / _cio_lift joins on (ticker, eval_date) between scanner_evaluations etc. and universe_returns. Before this change, universe_returns eval_dates came from signal folder names — sporadic and timestamped with whatever research happened to stamp (run date, next trading day, etc.). The join's behaviour then depended on whether research happened to run that week and on which date-stamping convention was in effect. After this change, universe_returns stores a row for every NYSE trading day in a 90-day rolling window, regardless of research cadence. The evaluator can match any trading-day eval_date on the scanner/team/cio side; schedule drift or research misfires no longer blow holes in the grade surface. Implementation: - Added _trading_days_to_process(today, max_lookback, existing) that walks backwards through trading days (via the repo-root trading_calendar.is_trading_day for holiday awareness) and yields dates whose +5 business days window has closed and which are not already populated. - collect() now uses the enumerator directly. The signals_prefix arg is kept in the signature for call-site compatibility (weekly_collector.py passes it) but is unused. - Added max_lookback_trading_days arg (default 90) — enough for all rolling evaluator windows with plenty of slack. - Removed _list_signal_dates (dead) and the local _is_trading_day helper (superseded by the holiday-aware shared module function). - Removed the "deferred" concept from the return dict; the enumerator guarantees every enqueued date has a closed window. Builds on PR #38's still-valid changes (INSERT OR REPLACE + NULL-aware _get_existing_dates) which let reprocessed trading days actually overwrite stale NULL-return rows. Test suite: 46/46 unit tests pass. Smoke-tested _trading_days_to_process with today=2026-04-15, max_lookback=15 → returns 9 trading days (3/25 through 4/7, correctly skipping Good Friday 2026-04-03 and recent dates whose 5d window hasn't closed).
4 tasks
cipher813
added a commit
that referenced
this pull request
Apr 15, 2026
…ders (#39) Rewrites the collector enumeration from "list signal folder dates in S3 and process each" to "walk backwards through NYSE trading days and process any whose 5d forward window has closed." The table's grain is now "one row per ticker per trading day" — the natural grain for downstream evaluation. Why (discussion on 2026-04-15): The backtester's _scanner_lift / _team_lift / _cio_lift joins on (ticker, eval_date) between scanner_evaluations etc. and universe_returns. Before this change, universe_returns eval_dates came from signal folder names — sporadic and timestamped with whatever research happened to stamp (run date, next trading day, etc.). The join's behaviour then depended on whether research happened to run that week and on which date-stamping convention was in effect. After this change, universe_returns stores a row for every NYSE trading day in a 90-day rolling window, regardless of research cadence. The evaluator can match any trading-day eval_date on the scanner/team/cio side; schedule drift or research misfires no longer blow holes in the grade surface. Implementation: - Added _trading_days_to_process(today, max_lookback, existing) that walks backwards through trading days (via the repo-root trading_calendar.is_trading_day for holiday awareness) and yields dates whose +5 business days window has closed and which are not already populated. - collect() now uses the enumerator directly. The signals_prefix arg is kept in the signature for call-site compatibility (weekly_collector.py passes it) but is unused. - Added max_lookback_trading_days arg (default 90) — enough for all rolling evaluator windows with plenty of slack. - Removed _list_signal_dates (dead) and the local _is_trading_day helper (superseded by the holiday-aware shared module function). - Removed the "deferred" concept from the return dict; the enumerator guarantees every enqueued date has a closed window. Builds on PR #38's still-valid changes (INSERT OR REPLACE + NULL-aware _get_existing_dates) which let reprocessed trading days actually overwrite stale NULL-return rows. Test suite: 46/46 unit tests pass. Smoke-tested _trading_days_to_process with today=2026-04-15, max_lookback=15 → returns 9 trading days (3/25 through 4/7, correctly skipping Good Friday 2026-04-03 and recent dates whose 5d window hasn't closed).
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two collector bugs that kept scanner / team / CIO grading pinned at N/A (`n_passing=0, n_universe=0`) even though the evaluator's join logic is correct.
Bug 1 — NULL return_5d rows stuck forever
`_build_rows_for_date` returns rows even when forward-window columns are NULL (e.g. when the 5d window hadn't closed at insert time). The next run's `existing` set included those dates, so `dates_to_process` skipped them — returns were never backfilled. Combined with `INSERT OR IGNORE`, they couldn't be overwritten either.
Fix:
Impact: Fri 2026-04-03 (and any future partial-insert) will be backfilled by the next Saturday run.
Bug 2 — weekend signal folders have no market data
Research runs before `alpha-engine-research 9a94e34` (2026-04-13, "Stamp signals with next_trading_day") wrote `signals/{Sat,Sun}/…`. Polygon has no grouped-daily data for those dates, so rows landed NULL and hit bug #1. Post-fix research stamps trading days, but the legacy weekend folders remain in S3.
Fix: filter `eval_dates` to Mon–Fri before enqueuing.
Context — the broader eval_date alignment story
The `universe_returns.eval_date` ↔ `scanner_evaluations.eval_date` mismatch was diagnosed while debugging why `grading.json` had `research.scanner: N/A` with `n_passing=0`. Three layered issues:
Out of scope
Test plan
🤖 Generated with Claude Code