Skip to content

feat(research): wire factor-profiles producer into the graph (un-orphan compute_and_write_factor_profiles)#203

Merged
cipher813 merged 1 commit into
mainfrom
feat/wire-factor-profiles-producer
May 18, 2026
Merged

feat(research): wire factor-profiles producer into the graph (un-orphan compute_and_write_factor_profiles)#203
cipher813 merged 1 commit into
mainfrom
feat/wire-factor-profiles-producer

Conversation

@cipher813
Copy link
Copy Markdown
Owner

What

Splice a new compute_factor_profiles_node into graph/research_graph.py so the previously-orphaned producer scoring.factor_scoring.compute_and_write_factor_profiles (zero production callers, test-only → s3://alpha-engine-research/factors/ empty in prod) actually runs every Saturday SF research run.

Splice point + why

fetch_data → load_regime_substrate_node → macro_economist_node
  → compute_factor_profiles_node → compute_focus_list_node → dispatch_sectors_and_exit → … → score_aggregator

Spliced on the macro_economist_node → compute_focus_list_node edge (one re-route: that edge becomes macro → compute_factor_profiles_node + new compute_factor_profiles_node → compute_focus_list_node):

  • The producer needs sector_map + run_date, both populated in fetch_data and not mutated by load_regime_substrate_node / macro_economist_node.
  • It must write factors/profiles/{run_date}/by_ticker.json + latest.json before both consumers' existing read_factor_profiles_from_s3(): compute_focus_list_node (:1198) and score_aggregator (:1322, downstream of the dispatch that hangs off compute_focus_list_node). This one splice satisfies both.
  • The fetch_data → load_regime_substrate_node edge was not used because the Stage-C serial chain is a pinned topology invariant (tests/test_regime_stage_b_graph_topology.py). The macro→focus-list edge is the cleanest splice preserving every pinned edge and the regime / macro / focus-list / dispatch chain + conditional edges.

No SF/infra change required — it is a plain in-graph serial node.

Graceful-degrade

The node wraps the producer so any failure (missing run_date, missing/short features/{run_date}/*.parquet, S3 error, compute exception) is caught, logged flow-doctor-visibly (logger.warning/logger.error), and the node returns cleanly ({"factor_profiles_written": False, "factor_profiles_s3_key": ""}). The graph continues; the consumers degrade exactly as they do today when the substrate is absent (they already if not factor_profiles: skip) — no worse than the prior orphaned state. The weekly research run is never hard-failed on this new dependency. Profiles are not threaded through state — consumers read from S3 by design; only a small observability delta flows.

Behavior-safety

No flag is flipped. config.FACTOR_BLEND_ENABLED and config.FOCUS_LIST_GATING_ENABLED stay default-false. This is substrate-only: it makes the factor substrate exist/ready and lets the focus-list shadow audit populate scanner_evaluations.focus_*. No scoring or agent behavior changes.

Closes / unblocks

  • Closes ROADMAP P1 "Wire the orphaned factor-profiles producer into the Saturday SF".
  • Unblocks the FOCUS_LIST P0's real gate — its shadow audit now sees a populated factor substrate each run.

Tests

  • tests/test_factor_profiles_node.py (new): (a) node calls compute_and_write_factor_profiles with the state's run_date + sector_map and returns the observability delta on success; (b) producer exception / missing run_date → node logs + returns cleanly, no raise; (c) static-AST graph-wiring assertions (mirroring test_regime_stage_b_graph_topology.py) that the node is registered, runs after fetch_data via the macro chain, and strictly before compute_focus_list_node AND score_aggregator, without altering the sector dispatch.
  • tests/test_dry_run.py: fixed a pre-existing order-dependent test-isolation bug surfaced (not caused) by the new test file. TestGraphModuleGuard.test_skips_late_bound_patches_when_graph_absent and TestInstallRestore setup/teardown evicted/replaced the real sys.modules["graph.research_graph"] without restoring it; a later re-import created a second module object so other test modules' collection-time-bound _build_signals_payload no longer saw their monkeypatch.setattr("graph.research_graph.<FLAG>", …) (the exact leak the test_regime_stage_b_graph_topology.py docstring documents and previously only "sidestepped" via filename ordering). Both sites now snapshot + restore the real module.

Full suite: 1366 passed (~/Development/alpha-engine-research/.venv/bin/python -m pytest -q).

🤖 Generated with Claude Code

…an compute_and_write_factor_profiles)

Splice a new `compute_factor_profiles_node` into graph/research_graph.py
between `macro_economist_node` and `compute_focus_list_node`:

  fetch_data → load_regime_substrate_node → macro_economist_node
  → compute_factor_profiles_node → compute_focus_list_node → dispatch …

Splice point + why:
- The producer `scoring.factor_scoring.compute_and_write_factor_profiles`
  needs `sector_map` + `run_date`, both populated in `fetch_data` and
  NOT mutated by load_regime_substrate_node or macro_economist_node.
- It must land `factors/profiles/{run_date}/by_ticker.json` +
  `latest.json` in S3 BEFORE both consumers do their existing
  `read_factor_profiles_from_s3()`: `compute_focus_list_node` (~:1198)
  and `score_aggregator` (~:1322, downstream of the dispatch off
  compute_focus_list_node). Splicing on the
  macro→compute_focus_list_node edge satisfies both with one re-route.
- This edge was chosen over `fetch_data → load_regime_substrate_node`
  because the Stage-C serial chain (fetch_data → substrate loader →
  macro) is a pinned topology invariant
  (tests/test_regime_stage_b_graph_topology.py); the macro→focus-list
  edge is the cleanest splice that preserves every pinned edge and the
  regime / macro / focus-list / dispatch chain + conditional edges.

Graceful-degrade: the node wraps the producer so ANY failure (missing
run_date, missing/short `features/{run_date}/*.parquet`, S3 error,
compute exception) is caught, logged flow-doctor-visibly
(warning/error), and the node returns cleanly
({"factor_profiles_written": False, "factor_profiles_s3_key": ""}) so
the graph continues. The consumers then degrade exactly as they do
today when the substrate is absent (they already `if not
factor_profiles: skip`) — i.e. no worse than the prior orphaned state.
The weekly research run is never hard-failed on this new dependency.
Profiles are NOT threaded through state — consumers read from S3 by
design; only a small observability delta flows
(`factor_profiles_written` / `factor_profiles_s3_key`).

Behavior-safety: NO flag is flipped. `config.FACTOR_BLEND_ENABLED`
and `config.FOCUS_LIST_GATING_ENABLED` stay default-false — this is
substrate-only: it makes `s3://alpha-engine-research/factors/` exist
(it is empty in prod today since the producer was orphaned / test-only)
and lets the focus-list shadow audit populate
`scanner_evaluations.focus_*`. No scoring/agent behavior changes.

Closes ROADMAP P1 "Wire the orphaned factor-profiles producer into the
Saturday SF" and unblocks the FOCUS_LIST P0's real gate (its shadow
audit now sees a populated factor substrate each run).

Tests:
- tests/test_factor_profiles_node.py (new): (a) node calls
  compute_and_write_factor_profiles with the state's run_date +
  sector_map and returns the observability delta on success;
  (b) producer exception / missing run_date → node logs + returns
  cleanly, no raise (graph continues); (c) static-AST graph-wiring
  assertions (mirroring test_regime_stage_b_graph_topology.py) that the
  node is registered, runs after fetch_data via the macro chain, and
  strictly before compute_focus_list_node AND score_aggregator, without
  altering the sector dispatch.
- tests/test_dry_run.py: fixed a pre-existing order-dependent
  test-isolation bug surfaced (not caused) by the new test file.
  `TestGraphModuleGuard.test_skips_late_bound_patches_when_graph_absent`
  and `TestInstallRestore` setup/teardown evicted/replaced the real
  `sys.modules["graph.research_graph"]` WITHOUT restoring it; a later
  re-import created a second module object so other test modules'
  collection-time-bound `_build_signals_payload` no longer saw their
  `monkeypatch.setattr("graph.research_graph.<FLAG>", …)` (the leak the
  test_regime_stage_b_graph_topology.py docstring documents). Now both
  snapshot + restore the real module. Full suite: 1366 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit b4f60f3 into main May 18, 2026
1 check passed
@cipher813 cipher813 deleted the feat/wire-factor-profiles-producer branch May 18, 2026 23:35
cipher813 added a commit that referenced this pull request May 19, 2026
…ollow-up to #203) (#204)

#203 wired the producer with graceful-degrade (catch/log/continue).
Per Brian + feedback_no_silent_fails: that recreates the exact
orphaned-producer silent-failure class this wiring exists to fix — a
failing producer would log a warning nobody reads while focus-list +
factor-blend silently go inert again.

compute_factor_profiles_node now RAISES on any failure (missing
run_date, producer exception) → the Research SF state fails loudly +
alerts. Not spuriously fragile: features/{run_date}/*.parquet is
produced by DataPhase1 UPSTREAM in the same Saturday SF, so its
absence is already an incident (DataPhase1 should have failed) — this
surfaces real breakage, never fails a healthy run. Matches the
system's fail-loud norm (DataPhase2 populated-ratio gate; optimizer
PR5 empty-order-book-not-legacy-fallback). Still substrate-only — no
flag flipped, no scoring change. Docstring + 2 tests flipped
graceful-return → pytest.raises. Suite 1366 passed.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant