feat(risk-model): weekly F+D S3 persistence from PredictorTraining (C.2b) by cipher813 · Pull Request #202 · cipher813/alpha-engine-predictor

cipher813 · 2026-05-27T17:13:36Z

Summary

ROADMAP C.2b — wires the C.2a math primitives (`risk_model.build_factor_risk_model`, #200) into the Saturday SF PredictorTraining stage and persists F + D parquets to `s3://{bucket}/risk_model/{date}/{F,D}.parquet` weekly. Plus `metadata.json` with build params + shape diagnostics.

This is the production-persistence layer for the structural factor risk model. The actual wiring of Σ = B · F · Bᵀ + D into `executor.portfolio_optimizer.solve_target_weights` (workstream C.3) is gated on the B.5 cutover gate per the plan doc's HARD SEQUENCING CONSTRAINT — two simultaneous Σ-substrate changes make backtester regressions untraceable. C.2b ships the weekly persistence so by the time C.3 reads `risk_model/{date}/`, there are ≥4 weeks of F + D accumulated.

What the stage does

~10s for the build, dominated by the cross-sectional OLS at each date:

Reads each ticker parquet from the `train_handler`'s `tmp_cache` (the same per-ticker files `download_from_arctic` just populated).
Extracts `Close` → log returns (T × N panel).
Extracts the 8 `*_zscore` factor-loading columns from C.1 (alpha-engine-data #324). Tickers missing all 8 are skipped at the per-ticker level; dates with any-NaN per-row are skipped at the per-date level.
Calls `build_factor_risk_model` — Fama-MacBeth 1973 cross-sectional OLS → (K_eff × K_eff) F covariance + (N,) D idiosyncratic variance + metadata.
Writes `F.parquet`, `D.parquet`, `metadata.json` to `s3://{bucket}/risk_model/{date}/`.

Graceful degradation

`status=skipped` when `returns_panel < 60` dates (universe cache empty or freshly bootstrapped)
`status=skipped` when fewer than 30 tickers carry all 8 loading columns (pre-2026-05-26 universe cache, before #324 shipped). Stage auto-activates once C.1 loadings have accumulated.
Malformed parquets skipped silently at read time (one bad ticker can't abort the weekly build).
S3 persist failure is best-effort + WARN (mirrors the sweep-stage pattern). The build itself + the in-process model dict survive.

Wiring

Wired into `training/train_handler.py:main()` as Step 2d2 (after training summary write, before triple-barrier cutover gate). Non-blocking: failure logs a WARN and does NOT abort the training pipeline. Skipped on `--dry-run` paths.

Tests (9 new)

Class	Cases	Purpose
`TestSparseDataSkipPaths`	4	empty / <60 dates / <30 tickers / 0 loadings → status=skipped
`TestHappyPath`	2	persists F + D + metadata to expected prefix; F is square (load-bearing for C.3 Σ matmul)
`TestDryRun`	1	dry_run builds model but skips S3 write
`TestS3PersistFailure`	1	best-effort S3 persist
`TestMalformedParquet`	1	unreadable parquet silently skipped

Suite: 1201 → 1210 (+9).

Test plan

Full predictor suite passes (1210 passed)
9 new stage tests pass
On merge: next Saturday SF (Sat 2026-05-31 ~02:00 PT) writes the first `risk_model/{date}/{F,D}.parquet`. Coverage threshold may take a few weeks to clear depending on C.1 loadings accumulation rate.
After ≥4 weeks of accumulation: C.3 unblocks (gated additionally on B.5 cutover gate)

Composes with PR #200 (C.2a F+D math primitives), alpha-engine-data #324 (C.1 `*_zscore` factor loadings), and ROADMAP C.2b → unblocks C.3 wiring once B.5 cutover passes and ≥4 weeks of F + D are in S3.

🤖 Generated with Claude Code

….2b) ROADMAP C.2b — wires the C.2a math primitives (``risk_model.build_factor_risk_model``, PR #200) into the Saturday SF PredictorTraining stage and persists F + D parquets to ``s3://{bucket}/risk_model/{date}/{F,D}.parquet`` weekly. Plus metadata.json with build params + shape diagnostics. This is the production-persistence layer for the structural factor risk model. The actual wiring of Σ = B · F · Bᵀ + D into ``executor.portfolio_optimizer.solve_target_weights`` (workstream C.3) is gated on the B.5 cutover gate per the plan doc HARD SEQUENCING CONSTRAINT — two simultaneous Σ-substrate changes make backtester regressions untraceable. C.2b ships the weekly persistence so by the time C.3 reads ``risk_model/{date}/``, there are ≥4 weeks of F + D accumulated. What the stage does (~10s for the build, dominated by the cross-sectional OLS at each date): 1. Reads each ticker parquet from the train_handler's tmp_cache (the same per-ticker files download_from_arctic just populated). 2. Extracts Close → log returns (T × N panel). 3. Extracts the 8 ``*_zscore`` factor-loading columns from C.1 (alpha-engine-data #324). Tickers missing all 8 are skipped at the per-ticker level; dates with any-NaN per-row are skipped at the per-date level. 4. Calls ``build_factor_risk_model`` — Fama-MacBeth 1973 cross-sectional OLS → (K_eff × K_eff) F covariance + (N,) D idiosyncratic variance + metadata. 5. Writes F.parquet, D.parquet, metadata.json to ``s3://{bucket}/risk_model/{date}/``. Graceful degradation: - ``status=skipped`` when returns_panel < 60 dates (universe cache empty or freshly bootstrapped) - ``status=skipped`` when fewer than 30 tickers carry all 8 loading columns (pre-2026-05-26 universe cache, before #324 shipped). Stage auto-activates once C.1 loadings have accumulated. - Malformed parquets skipped silently at read time (one bad ticker can't abort the weekly build). - S3 persist failure is best-effort + WARN (mirrors the sweep stage pattern). The build itself + the in-process model dict survive. Wired into ``training/train_handler.py:main()`` as Step 2d2 (after training summary write, before triple-barrier cutover gate). Non- blocking: failure logs a WARN and does NOT abort the training pipeline. Skipped on ``--dry-run`` paths. Tests (9 new): - Empty / sparse data_dir → status=skipped with specific reasons - 0 tickers with loadings → status=skipped with #324-tagged reason - Happy path persists F.parquet + D.parquet + metadata.json to ``risk_model/{date}/`` prefix - F is square (load-bearing for the C.3 Σ = B·F·Bᵀ + D matmul) - ``dry_run=True`` builds the model but skips S3 write - S3 persist failure → status=ok + WARN (best-effort) - Malformed parquet silently skipped (graceful read) Suite: 1201 → 1210 (+9). Composes with PR #200 (C.2a F+D math primitives), alpha-engine-data #324 (C.1 ``*_zscore`` factor loadings), and ROADMAP C.2b → unblocks C.3 wiring once B.5 cutover passes and ≥4 weeks of F + D are in S3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cipher813 merged commit cfccb20 into main May 27, 2026
1 check passed

cipher813 deleted the feat-c2b-risk-model-persist-260527 branch May 27, 2026 17:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(risk-model): weekly F+D S3 persistence from PredictorTraining (C.2b)#202

feat(risk-model): weekly F+D S3 persistence from PredictorTraining (C.2b)#202
cipher813 merged 1 commit into
mainfrom
feat-c2b-risk-model-persist-260527

cipher813 commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cipher813 commented May 27, 2026

Summary

What the stage does

Graceful degradation

Wiring

Tests (9 new)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant