Skip to content

fix(universe_returns): unstick NULL return_5d rows + drop weekend signals#38

Merged
cipher813 merged 1 commit into
mainfrom
fix/universe-returns-backfill-null-returns
Apr 15, 2026
Merged

fix(universe_returns): unstick NULL return_5d rows + drop weekend signals#38
cipher813 merged 1 commit into
mainfrom
fix/universe-returns-backfill-null-returns

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

Two collector bugs that kept scanner / team / CIO grading pinned at N/A (`n_passing=0, n_universe=0`) even though the evaluator's join logic is correct.

Bug 1 — NULL return_5d rows stuck forever

`_build_rows_for_date` returns rows even when forward-window columns are NULL (e.g. when the 5d window hadn't closed at insert time). The next run's `existing` set included those dates, so `dates_to_process` skipped them — returns were never backfilled. Combined with `INSERT OR IGNORE`, they couldn't be overwritten either.

Fix:

  • `_get_existing_dates` now returns only dates where `return_5d IS NOT NULL`.
  • `_insert_rows` uses `INSERT OR REPLACE` so a reprocessed row overwrites the stale NULL one.

Impact: Fri 2026-04-03 (and any future partial-insert) will be backfilled by the next Saturday run.

Bug 2 — weekend signal folders have no market data

Research runs before `alpha-engine-research 9a94e34` (2026-04-13, "Stamp signals with next_trading_day") wrote `signals/{Sat,Sun}/…`. Polygon has no grouped-daily data for those dates, so rows landed NULL and hit bug #1. Post-fix research stamps trading days, but the legacy weekend folders remain in S3.

Fix: filter `eval_dates` to Mon–Fri before enqueuing.

Context — the broader eval_date alignment story

The `universe_returns.eval_date` ↔ `scanner_evaluations.eval_date` mismatch was diagnosed while debugging why `grading.json` had `research.scanner: N/A` with `n_passing=0`. Three layered issues:

  1. Research used to stamp `eval_date = today` (weekend possible) — fixed upstream in `alpha-engine-research 9a94e34` (2026-04-13). Going forward both tables write trading days.
  2. universe_returns had NULL-return orphans blocking the overlap — fixed by this PR.
  3. scanner_evaluations / team_candidates started being written only on 2026-04-12 (schema v8, 2026-04-04) — nothing to do, one week of history today, will accumulate.

Out of scope

  • Market-holiday filtering (e.g. Good Friday). `_is_trading_day` only screens weekends; a closed-market weekday will still be attempted and silently skipped at `_build_rows_for_date`. Proper NYSE calendar filtering is a follow-up.

Test plan

  • Unit suite (46 passed, 0 failed)
  • `_is_trading_day` smoke — Mon/Wed/Fri → True, Sat/Sun → False
  • Next Saturday Step Function run populates return_5d for 2026-04-03

🤖 Generated with Claude Code

…nals

Two related bugs kept scanner/team/cio grading pinned at N/A even when
the merge on (ticker, eval_date) would otherwise have worked.

1. NULL return_5d rows got stuck forever. When a collector run inserted
   a date whose 5d forward window hadn't closed yet, the row landed with
   NULL return/spy/beat columns. The next run's `existing` set contained
   that eval_date, so `dates_to_process` skipped it — the returns were
   never backfilled. Combined with `INSERT OR IGNORE` this silently
   orphaned Fri 2026-04-03 and other edge cases.

   Fix: `_get_existing_dates` now returns only eval_dates where
   `return_5d IS NOT NULL`; `_insert_rows` uses INSERT OR REPLACE so a
   reprocessed row overwrites the stale NULL one.

2. Weekend signal folders have no market data. Research runs before
   alpha-engine-research 9a94e34 (2026-04-13, "Stamp signals with
   next_trading_day") wrote signals/{Sat,Sun}/. Polygon has no
   grouped-daily data for those dates, so rows landed with NULL returns
   and then hit bug #1. Post-fix research stamps trading days, but
   the legacy weekend folders still exist in S3 and would keep
   retriggering the empty-prices path.

   Fix: filter eval_dates to trading days (Mon-Fri) before enqueuing
   for processing.

Existing tests pass (46/46). The next Saturday Step Function run will
reprocess the stuck Fri 2026-04-03 row and populate return_5d for any
subsequent trading days whose 5d window has since closed.

Out of scope: market-holiday filtering (e.g. Good Friday). _is_trading_day
only screens weekends; a closed-market weekday will still be attempted —
polygon returns empty and the row is skipped at _build_rows_for_date,
which is acceptable for now. Proper NYSE calendar filtering belongs in
a follow-up.
@cipher813 cipher813 merged commit 8a0e54b into main Apr 15, 2026
1 check passed
@cipher813 cipher813 deleted the fix/universe-returns-backfill-null-returns branch April 15, 2026 23:06
cipher813 added a commit that referenced this pull request Apr 15, 2026
…ders

Rewrites the collector enumeration from "list signal folder dates in
S3 and process each" to "walk backwards through NYSE trading days and
process any whose 5d forward window has closed." The table's grain is
now "one row per ticker per trading day" — the natural grain for
downstream evaluation.

Why (discussion on 2026-04-15):

The backtester's _scanner_lift / _team_lift / _cio_lift joins on
(ticker, eval_date) between scanner_evaluations etc. and
universe_returns. Before this change, universe_returns eval_dates
came from signal folder names — sporadic and timestamped with
whatever research happened to stamp (run date, next trading day,
etc.). The join's behaviour then depended on whether research
happened to run that week and on which date-stamping convention
was in effect.

After this change, universe_returns stores a row for every NYSE
trading day in a 90-day rolling window, regardless of research
cadence. The evaluator can match any trading-day eval_date on the
scanner/team/cio side; schedule drift or research misfires no
longer blow holes in the grade surface.

Implementation:
- Added _trading_days_to_process(today, max_lookback, existing) that
  walks backwards through trading days (via the repo-root
  trading_calendar.is_trading_day for holiday awareness) and yields
  dates whose +5 business days window has closed and which are not
  already populated.
- collect() now uses the enumerator directly. The signals_prefix arg
  is kept in the signature for call-site compatibility (weekly_collector.py
  passes it) but is unused.
- Added max_lookback_trading_days arg (default 90) — enough for all
  rolling evaluator windows with plenty of slack.
- Removed _list_signal_dates (dead) and the local _is_trading_day
  helper (superseded by the holiday-aware shared module function).
- Removed the "deferred" concept from the return dict; the enumerator
  guarantees every enqueued date has a closed window.

Builds on PR #38's still-valid changes (INSERT OR REPLACE +
NULL-aware _get_existing_dates) which let reprocessed trading days
actually overwrite stale NULL-return rows.

Test suite: 46/46 unit tests pass. Smoke-tested _trading_days_to_process
with today=2026-04-15, max_lookback=15 → returns 9 trading days
(3/25 through 4/7, correctly skipping Good Friday 2026-04-03 and
recent dates whose 5d window hasn't closed).
cipher813 added a commit that referenced this pull request Apr 15, 2026
…ders (#39)

Rewrites the collector enumeration from "list signal folder dates in
S3 and process each" to "walk backwards through NYSE trading days and
process any whose 5d forward window has closed." The table's grain is
now "one row per ticker per trading day" — the natural grain for
downstream evaluation.

Why (discussion on 2026-04-15):

The backtester's _scanner_lift / _team_lift / _cio_lift joins on
(ticker, eval_date) between scanner_evaluations etc. and
universe_returns. Before this change, universe_returns eval_dates
came from signal folder names — sporadic and timestamped with
whatever research happened to stamp (run date, next trading day,
etc.). The join's behaviour then depended on whether research
happened to run that week and on which date-stamping convention
was in effect.

After this change, universe_returns stores a row for every NYSE
trading day in a 90-day rolling window, regardless of research
cadence. The evaluator can match any trading-day eval_date on the
scanner/team/cio side; schedule drift or research misfires no
longer blow holes in the grade surface.

Implementation:
- Added _trading_days_to_process(today, max_lookback, existing) that
  walks backwards through trading days (via the repo-root
  trading_calendar.is_trading_day for holiday awareness) and yields
  dates whose +5 business days window has closed and which are not
  already populated.
- collect() now uses the enumerator directly. The signals_prefix arg
  is kept in the signature for call-site compatibility (weekly_collector.py
  passes it) but is unused.
- Added max_lookback_trading_days arg (default 90) — enough for all
  rolling evaluator windows with plenty of slack.
- Removed _list_signal_dates (dead) and the local _is_trading_day
  helper (superseded by the holiday-aware shared module function).
- Removed the "deferred" concept from the return dict; the enumerator
  guarantees every enqueued date has a closed window.

Builds on PR #38's still-valid changes (INSERT OR REPLACE +
NULL-aware _get_existing_dates) which let reprocessed trading days
actually overwrite stale NULL-return rows.

Test suite: 46/46 unit tests pass. Smoke-tested _trading_days_to_process
with today=2026-04-15, max_lookback=15 → returns 9 trading days
(3/25 through 4/7, correctly skipping Good Friday 2026-04-03 and
recent dates whose 5d window hasn't closed).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant