Skip to content

fix(universe): migrate ArcticDB universe to canonical OHLCV+VWAP schema#105

Merged
cipher813 merged 1 commit into
mainfrom
fix/universe-vwap-schema-normalize
Apr 27, 2026
Merged

fix(universe): migrate ArcticDB universe to canonical OHLCV+VWAP schema#105
cipher813 merged 1 commit into
mainfrom
fix/universe-vwap-schema-normalize

Conversation

@cipher813
Copy link
Copy Markdown
Owner

Background — 2026-04-27 EOD-email blackout investigation ======================================================== The structural fix in PR #104 decoupled macro/SPY freshness from stock-coverage correctness. Validation today exposed a second, latent issue: with the universe-coverage guard now passing, daily_append's per-stock writes finally execute — and 100% of them fail with an ArcticDB schema-mismatch error.

Schema audit (2026-04-27 22:14 UTC) revealed heterogeneous universe state:

  • 816 symbols (~90%): 64 cols, no VWAP at all
  • 88 symbols (~10%): 65 cols, VWAP at idx=64 (appended at end)

daily_append writes via OHLCV_COLS = [Open, High, Low, Close, Volume, VWAP, ...features], which puts VWAP at idx=5. ArcticDB update() requires column order match — both schema variants fail. Per-stock universe writes have therefore been failing since the polygon-VWAP work landed on 2026-04-24 (PRs #90/#91/#92), masked until today by the macro-coupled universe-coverage guard.

Operational design (yfinance EOD → polygon morning) ====================================================

  • yfinance EOD post-close hook writes daily_closes parquet with VWAP=NaN (yfinance does not expose true volume-weighted VWAP).
  • polygon morning enrichment overwrites the parquet with real VWAP values from polygon grouped-daily.
  • daily_append runs end-of-day and writes whatever VWAP is in the parquet to ArcticDB universe — NaN initially, real values after the morning enrichment re-runs daily_append.

For that flow to work, VWAP must be a first-class column in the universe schema with a stable position. This migration normalizes every symbol to the canonical layout:

[Open, High, Low, Close, Volume, VWAP] + FEATURES

NaN-fills VWAP historically for the 816 symbols that didn't have it. Repositions VWAP for the 88 symbols that had it appended at idx=64. Existing FEATURES block keeps its relative order.

Idempotent — symbols already in canonical order are skipped. Per-symbol error isolation — one symbol's write failure does not abort the batch (records into errors[], continues with the rest).

Tests

  • _canonical_column_order: VWAP inserted at idx=5, feature block preserved in relative order, drops nothing.
  • _is_canonical: recognizes correct layout, rejects appended-VWAP and missing-VWAP variants.
  • migrate_universe_vwap apply path:
    • Inserts VWAP at idx=5 with FLOAT64 NaN when absent.
    • Relocates VWAP from idx=last when appended (preserving values).
    • Skips already-canonical symbols (idempotent).
    • Honors --tickers override for canary / subset runs.
    • Per-symbol error isolation — partial-status return on partial failure.
  • All 275 existing tests still pass (261 + 14 new).

Operational follow-up (not in this PR)

After merge, deploy + run:
python -m builders.migrate_universe_vwap --apply
on ae-trading. Expected: 904 symbols migrated (816 + 88), audit JSON written to s3://alpha-engine-research/builders/migrate_universe_vwap_audit/. Then rerun alpha-engine-daily-data.service (per-stock writes succeed) and alpha-engine-eod.service (held-stock close lookups succeed; EOD email + 2026-04-27 eod_pnl row land).

Background — 2026-04-27 EOD-email blackout investigation
========================================================
The structural fix in PR #104 decoupled macro/SPY freshness from
stock-coverage correctness. Validation today exposed a second, latent
issue: with the universe-coverage guard now passing, daily_append's
per-stock writes finally execute — and 100% of them fail with an
ArcticDB schema-mismatch error.

Schema audit (2026-04-27 22:14 UTC) revealed heterogeneous universe state:

  - 816 symbols (~90%): 64 cols, no VWAP at all
  - 88  symbols (~10%): 65 cols, VWAP at idx=64 (appended at end)

daily_append writes via OHLCV_COLS = [Open, High, Low, Close, Volume,
VWAP, ...features], which puts VWAP at idx=5. ArcticDB update() requires
column order match — both schema variants fail. Per-stock universe
writes have therefore been failing since the polygon-VWAP work landed
on 2026-04-24 (PRs #90/#91/#92), masked until today by the macro-coupled
universe-coverage guard.

Operational design (yfinance EOD → polygon morning)
====================================================
- yfinance EOD post-close hook writes daily_closes parquet with
  VWAP=NaN (yfinance does not expose true volume-weighted VWAP).
- polygon morning enrichment overwrites the parquet with real VWAP
  values from polygon grouped-daily.
- daily_append runs end-of-day and writes whatever VWAP is in the
  parquet to ArcticDB universe — NaN initially, real values after the
  morning enrichment re-runs daily_append.

For that flow to work, VWAP must be a first-class column in the
universe schema with a stable position. This migration normalizes
every symbol to the canonical layout:

    [Open, High, Low, Close, Volume, VWAP] + FEATURES

NaN-fills VWAP historically for the 816 symbols that didn't have it.
Repositions VWAP for the 88 symbols that had it appended at idx=64.
Existing FEATURES block keeps its relative order.

Idempotent — symbols already in canonical order are skipped.
Per-symbol error isolation — one symbol's write failure does not abort
the batch (records into errors[], continues with the rest).

Tests
=====
- _canonical_column_order: VWAP inserted at idx=5, feature block
  preserved in relative order, drops nothing.
- _is_canonical: recognizes correct layout, rejects appended-VWAP and
  missing-VWAP variants.
- migrate_universe_vwap apply path:
  - Inserts VWAP at idx=5 with FLOAT64 NaN when absent.
  - Relocates VWAP from idx=last when appended (preserving values).
  - Skips already-canonical symbols (idempotent).
  - Honors --tickers override for canary / subset runs.
  - Per-symbol error isolation — partial-status return on partial failure.
- All 275 existing tests still pass (261 + 14 new).

Operational follow-up (not in this PR)
======================================
After merge, deploy + run:
    python -m builders.migrate_universe_vwap --apply
on ae-trading. Expected: 904 symbols migrated (816 + 88), audit JSON
written to s3://alpha-engine-research/builders/migrate_universe_vwap_audit/.
Then rerun alpha-engine-daily-data.service (per-stock writes succeed)
and alpha-engine-eod.service (held-stock close lookups succeed; EOD
email + 2026-04-27 eod_pnl row land).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit 68062d7 into main Apr 27, 2026
1 check passed
@cipher813 cipher813 deleted the fix/universe-vwap-schema-normalize branch April 27, 2026 22:26
cipher813 added a commit that referenced this pull request May 5, 2026
The Phase 2 Lambda deploy job has been failing on:

  ERROR: Cannot find command 'git' - do you have 'git' installed and
  in your PATH?

The Lambda Python 3.12 base image (public.ecr.aws/lambda/python:3.12)
doesn't ship with git, but the Dockerfile uses `pip install ... @
git+https://...` which requires it. Surfaced today when PR #159's
post-merge Deploy fired against a fresh build (vs the prior in-flight
image which had a cached install layer that masked the gap).

Fix: same one-line microdnf install applied to alpha-engine-research
Dockerfile after PR #105's lib-public flip. AL2023 minimal package
manager; image-size impact ~25MB.

Out of scope: the FromPlatformFlagConstDisallowed warning (line 1)
about `--platform=linux/amd64` is a separate buildkit lint that
doesn't block builds — leave for a follow-up.

Test plan
- [x] Diff parity with alpha-engine-research Dockerfile (same line)
- [ ] Deploy workflow re-runs cleanly post-merge

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant