Skip to content

daily_append: backfill-safe writes for historical row updates#92

Merged
cipher813 merged 1 commit into
mainfrom
feat/daily-append-backfill-safe
Apr 24, 2026
Merged

daily_append: backfill-safe writes for historical row updates#92
cipher813 merged 1 commit into
mainfrom
feat/daily-append-backfill-safe

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Summary

Surfaced by the 2026-04-24 historical VWAP repair after PR #91 deployed: ArcticDB's universe_lib.update() raises "index must be monotonic increasing or decreasing" when asked to insert behind the latest stored date. `daily_append` was originally designed for "append today's row at the head" — fine for the steady-state daily pass but useless for historical backfills.

This PR adds `_write_row_backfill_safe(lib, symbol, new_row, existing_series=None)` that routes by mode:

  • append (target_ts > all existing dates): `lib.update()` — fast, single-row write. Same as before for the steady-state daily pass.
  • backfill (target_ts ≤ some existing date): read full series, splice in the new row (replacing any existing same-date row, matching update() semantics), write the monotonic-sorted full series via `lib.write(prune_previous_versions=True)`. ~10-100x slower per ticker but only fires for rare backfill operations.

Per-ticker write site + macro + sector ETF write sites all rewired through the helper. Each call's mode is captured into `macro_write_modes` so the post-write verification check applies the right assertion (append: readback last == target_ts; backfill: target_ts in readback index anywhere).

Why now

Without this, `weekly_collector.py --morning-enrich --date ` is broken — exactly the use case PR #91 added the flag for. The going-forward path (Monday's SF enriches Friday's row, which IS the latest) works fine without this PR, but historical backfill of the 2026-04-17→2026-04-23 window cannot land until this merges.

Test plan

  • 9 new tests in `tests/test_daily_append_backfill_safe.py` cover both modes + boundary cases (empty series, nonexistent symbol, target == latest, prune_previous_versions, no-double-read)
  • Existing `test_daily_append_semantics.py` updated to match new helper-routed call sites while preserving regression intent (lib.append() forbidden, counters increment after write, etc.)
  • Full suite: 162/162 pass
  • After merge + deploy: re-run `for D in 2026-04-17 2026-04-20 2026-04-21 2026-04-22 2026-04-23; do python weekly_collector.py --morning-enrich --date $D; done` on ae-trading to repair the universally-NaN VWAP for the affected window

🤖 Generated with Claude Code

Surfaced by the 2026-04-24 historical VWAP repair after PR #91 deployed:
ArcticDB's universe_lib.update() raises "index must be monotonic
increasing or decreasing" when asked to insert behind the latest stored
date. daily_append was originally designed for "append today's row at
the head" — fine for the steady-state daily pass but useless for
historical backfills.

This commit adds _write_row_backfill_safe(lib, symbol, new_row,
existing_series=None) which routes by mode:

  * append (target_ts > all existing dates): use lib.update() — fast
    path, single-row write. Same behavior as before for the steady-
    state daily pass.

  * backfill (target_ts ≤ some existing date, including target_ts ==
    latest): read full series, splice in the new row (replacing any
    existing same-date row, matching update() semantics), write the
    monotonic-sorted full series via lib.write(prune_previous_versions=
    True). ~10-100x slower per ticker but only fires for rare
    backfill operations.

Per-ticker write site rewired to call _write_row_backfill_safe; the
existing `hist` (already read for feature warmup) is passed in as
existing_series so the helper doesn't double-read.

Macro + sector ETF write sites also rewired. Each call's mode is
captured into macro_write_modes so the post-write verification check
can apply the right correctness assertion:

  * append mode → readback last index must equal target_ts (catches
    the 2026-04-15 silent-stale failure)
  * backfill mode → target_ts must be IN the readback index, anywhere
    (the last date is naturally future relative to a backfilled
    historical date)

Existing semantics tests (test_daily_append_semantics.py) updated to
match the new helper-routed call sites while preserving regression
intent — lib.append() must never appear, counters increment after
write, etc.

9 new tests in tests/test_daily_append_backfill_safe.py:
  - append uses lib.update when target after latest
  - append when existing series is empty (first-write-after-empty)
  - first write to nonexistent symbol uses write()
  - backfill uses lib.write when target before latest
  - backfill replaces existing same-date row (semantic match w/ update)
  - backfill target in middle of series — sorts monotonic
  - target == latest takes backfill path (conservative >, not >=)
  - lib.write called with prune_previous_versions=True
  - passing existing_series avoids extra lib.read

Full suite: 162/162 pass.

After deploy to ae-trading: re-run the historical backfill loop
  for D in 2026-04-17 2026-04-20 2026-04-21 2026-04-22 2026-04-23; do
    python weekly_collector.py --morning-enrich --date $D
  done
to repair the universally-NaN VWAP column for the 2026-04-17→2026-04-23
window the polygon outage left behind.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit f1e55d6 into main Apr 24, 2026
1 check passed
@cipher813 cipher813 deleted the feat/daily-append-backfill-safe branch April 24, 2026 21:12
cipher813 added a commit that referenced this pull request Apr 27, 2026
…ma (#105)

Background — 2026-04-27 EOD-email blackout investigation
========================================================
The structural fix in PR #104 decoupled macro/SPY freshness from
stock-coverage correctness. Validation today exposed a second, latent
issue: with the universe-coverage guard now passing, daily_append's
per-stock writes finally execute — and 100% of them fail with an
ArcticDB schema-mismatch error.

Schema audit (2026-04-27 22:14 UTC) revealed heterogeneous universe state:

  - 816 symbols (~90%): 64 cols, no VWAP at all
  - 88  symbols (~10%): 65 cols, VWAP at idx=64 (appended at end)

daily_append writes via OHLCV_COLS = [Open, High, Low, Close, Volume,
VWAP, ...features], which puts VWAP at idx=5. ArcticDB update() requires
column order match — both schema variants fail. Per-stock universe
writes have therefore been failing since the polygon-VWAP work landed
on 2026-04-24 (PRs #90/#91/#92), masked until today by the macro-coupled
universe-coverage guard.

Operational design (yfinance EOD → polygon morning)
====================================================
- yfinance EOD post-close hook writes daily_closes parquet with
  VWAP=NaN (yfinance does not expose true volume-weighted VWAP).
- polygon morning enrichment overwrites the parquet with real VWAP
  values from polygon grouped-daily.
- daily_append runs end-of-day and writes whatever VWAP is in the
  parquet to ArcticDB universe — NaN initially, real values after the
  morning enrichment re-runs daily_append.

For that flow to work, VWAP must be a first-class column in the
universe schema with a stable position. This migration normalizes
every symbol to the canonical layout:

    [Open, High, Low, Close, Volume, VWAP] + FEATURES

NaN-fills VWAP historically for the 816 symbols that didn't have it.
Repositions VWAP for the 88 symbols that had it appended at idx=64.
Existing FEATURES block keeps its relative order.

Idempotent — symbols already in canonical order are skipped.
Per-symbol error isolation — one symbol's write failure does not abort
the batch (records into errors[], continues with the rest).

Tests
=====
- _canonical_column_order: VWAP inserted at idx=5, feature block
  preserved in relative order, drops nothing.
- _is_canonical: recognizes correct layout, rejects appended-VWAP and
  missing-VWAP variants.
- migrate_universe_vwap apply path:
  - Inserts VWAP at idx=5 with FLOAT64 NaN when absent.
  - Relocates VWAP from idx=last when appended (preserving values).
  - Skips already-canonical symbols (idempotent).
  - Honors --tickers override for canary / subset runs.
  - Per-symbol error isolation — partial-status return on partial failure.
- All 275 existing tests still pass (261 + 14 new).

Operational follow-up (not in this PR)
======================================
After merge, deploy + run:
    python -m builders.migrate_universe_vwap --apply
on ae-trading. Expected: 904 symbols migrated (816 + 88), audit JSON
written to s3://alpha-engine-research/builders/migrate_universe_vwap_audit/.
Then rerun alpha-engine-daily-data.service (per-stock writes succeed)
and alpha-engine-eod.service (held-stock close lookups succeed; EOD
email + 2026-04-27 eod_pnl row land).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant