fix(universe): migrate ArcticDB universe to canonical OHLCV+VWAP schema#105
Merged
Conversation
Background — 2026-04-27 EOD-email blackout investigation ======================================================== The structural fix in PR #104 decoupled macro/SPY freshness from stock-coverage correctness. Validation today exposed a second, latent issue: with the universe-coverage guard now passing, daily_append's per-stock writes finally execute — and 100% of them fail with an ArcticDB schema-mismatch error. Schema audit (2026-04-27 22:14 UTC) revealed heterogeneous universe state: - 816 symbols (~90%): 64 cols, no VWAP at all - 88 symbols (~10%): 65 cols, VWAP at idx=64 (appended at end) daily_append writes via OHLCV_COLS = [Open, High, Low, Close, Volume, VWAP, ...features], which puts VWAP at idx=5. ArcticDB update() requires column order match — both schema variants fail. Per-stock universe writes have therefore been failing since the polygon-VWAP work landed on 2026-04-24 (PRs #90/#91/#92), masked until today by the macro-coupled universe-coverage guard. Operational design (yfinance EOD → polygon morning) ==================================================== - yfinance EOD post-close hook writes daily_closes parquet with VWAP=NaN (yfinance does not expose true volume-weighted VWAP). - polygon morning enrichment overwrites the parquet with real VWAP values from polygon grouped-daily. - daily_append runs end-of-day and writes whatever VWAP is in the parquet to ArcticDB universe — NaN initially, real values after the morning enrichment re-runs daily_append. For that flow to work, VWAP must be a first-class column in the universe schema with a stable position. This migration normalizes every symbol to the canonical layout: [Open, High, Low, Close, Volume, VWAP] + FEATURES NaN-fills VWAP historically for the 816 symbols that didn't have it. Repositions VWAP for the 88 symbols that had it appended at idx=64. Existing FEATURES block keeps its relative order. Idempotent — symbols already in canonical order are skipped. Per-symbol error isolation — one symbol's write failure does not abort the batch (records into errors[], continues with the rest). Tests ===== - _canonical_column_order: VWAP inserted at idx=5, feature block preserved in relative order, drops nothing. - _is_canonical: recognizes correct layout, rejects appended-VWAP and missing-VWAP variants. - migrate_universe_vwap apply path: - Inserts VWAP at idx=5 with FLOAT64 NaN when absent. - Relocates VWAP from idx=last when appended (preserving values). - Skips already-canonical symbols (idempotent). - Honors --tickers override for canary / subset runs. - Per-symbol error isolation — partial-status return on partial failure. - All 275 existing tests still pass (261 + 14 new). Operational follow-up (not in this PR) ====================================== After merge, deploy + run: python -m builders.migrate_universe_vwap --apply on ae-trading. Expected: 904 symbols migrated (816 + 88), audit JSON written to s3://alpha-engine-research/builders/migrate_universe_vwap_audit/. Then rerun alpha-engine-daily-data.service (per-stock writes succeed) and alpha-engine-eod.service (held-stock close lookups succeed; EOD email + 2026-04-27 eod_pnl row land). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 tasks
cipher813
added a commit
that referenced
this pull request
May 5, 2026
The Phase 2 Lambda deploy job has been failing on: ERROR: Cannot find command 'git' - do you have 'git' installed and in your PATH? The Lambda Python 3.12 base image (public.ecr.aws/lambda/python:3.12) doesn't ship with git, but the Dockerfile uses `pip install ... @ git+https://...` which requires it. Surfaced today when PR #159's post-merge Deploy fired against a fresh build (vs the prior in-flight image which had a cached install layer that masked the gap). Fix: same one-line microdnf install applied to alpha-engine-research Dockerfile after PR #105's lib-public flip. AL2023 minimal package manager; image-size impact ~25MB. Out of scope: the FromPlatformFlagConstDisallowed warning (line 1) about `--platform=linux/amd64` is a separate buildkit lint that doesn't block builds — leave for a follow-up. Test plan - [x] Diff parity with alpha-engine-research Dockerfile (same line) - [ ] Deploy workflow re-runs cleanly post-merge Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background — 2026-04-27 EOD-email blackout investigation ======================================================== The structural fix in PR #104 decoupled macro/SPY freshness from stock-coverage correctness. Validation today exposed a second, latent issue: with the universe-coverage guard now passing, daily_append's per-stock writes finally execute — and 100% of them fail with an ArcticDB schema-mismatch error.
Schema audit (2026-04-27 22:14 UTC) revealed heterogeneous universe state:
daily_append writes via OHLCV_COLS = [Open, High, Low, Close, Volume, VWAP, ...features], which puts VWAP at idx=5. ArcticDB update() requires column order match — both schema variants fail. Per-stock universe writes have therefore been failing since the polygon-VWAP work landed on 2026-04-24 (PRs #90/#91/#92), masked until today by the macro-coupled universe-coverage guard.
Operational design (yfinance EOD → polygon morning) ====================================================
For that flow to work, VWAP must be a first-class column in the universe schema with a stable position. This migration normalizes every symbol to the canonical layout:
NaN-fills VWAP historically for the 816 symbols that didn't have it. Repositions VWAP for the 88 symbols that had it appended at idx=64. Existing FEATURES block keeps its relative order.
Idempotent — symbols already in canonical order are skipped. Per-symbol error isolation — one symbol's write failure does not abort the batch (records into errors[], continues with the rest).
Tests
Operational follow-up (not in this PR)
After merge, deploy + run:
python -m builders.migrate_universe_vwap --apply
on ae-trading. Expected: 904 symbols migrated (816 + 88), audit JSON written to s3://alpha-engine-research/builders/migrate_universe_vwap_audit/. Then rerun alpha-engine-daily-data.service (per-stock writes succeed) and alpha-engine-eod.service (held-stock close lookups succeed; EOD email + 2026-04-27 eod_pnl row land).