fix(cio): reconcile CIO decisions vs candidate set, not raw count#200
Merged
Conversation
The 2026-05-17 Saturday SF research run hard-failed in strict mode:
Sonnet's structured-output batch returned 19 decisions for 18
candidates (one stray extra/duplicate decision object). The
`len(decisions) != len(candidates)` assertion in run_cio (added
2026-05-02 for the partial-list edge) turned this benign LLM artifact
into a failure of the entire weekly pipeline.
Replace the raw count check with `_reconcile_cio_decisions`, which
validates against the candidate ticker SET:
* extraneous (ticker not in candidate set) -> dropped + logged
* duplicate -> collapsed conservative-wins (a duplicate can never
upgrade a candidate into advancement; ties keep first occurrence)
* decisions emitted in candidate order, ticker normalised to the
candidate's canonical spelling so _post_process_cio_decisions
exact-match stays deterministic across LLM casing/whitespace drift
* genuine MISSING candidate after reconciliation -> strict-mode
raise (preserves the 2026-05-02 partial-list protection)
Strictly stronger than the old check: 18-for-18 with one ticker
duplicated (a real candidate silently dropped) PASSED the count check;
set reconciliation correctly hard-fails it. Strict-mode message keeps
the "N decisions for M candidates" substrings for log/grep continuity.
Tests: +6 reconciliation cases (exact 2026-05-17 shape, hallucinated
ticker, conservative-wins, casing/whitespace, count-equal-but-missing,
first-wins tie) extending test_cio_per_candidate_invariant.py. Full
suite 1337 passed; the lone test_scoring RSI failure is a pre-existing
stale-local-config artifact (config clone 25 commits behind origin's
L1695 #209 revert), unrelated to this change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The 2026-05-17 Saturday SF research rerun hard-failed (strict mode): the CIO Sonnet batch returned 19 decisions for 18 candidates — one stray extra/duplicate decision object. The
len(decisions) != len(candidates)assertion inrun_cio(added 2026-05-02 for the partial-list edge) turned this benign LLM structured-output artifact into a failure of the entire weekly pipeline. No downstream stages ran; nothing was promoted.What
Replace the brittle raw-count check with
_reconcile_cio_decisions, validating against the candidate ticker set:_post_process_cio_decisionsexact-match stays deterministic across LLM casing/whitespace driftN decisions for M candidatessubstrings for log/grep continuityStrictly stronger than the old check: an 18-for-18 response with one ticker duplicated (a real candidate silently dropped) passed the count check; set reconciliation correctly hard-fails it.
Tests
+6 reconciliation cases extending
test_cio_per_candidate_invariant.py(exact 2026-05-17 shape, hallucinated ticker, conservative-wins, casing/whitespace normalisation, count-equal-but-candidate-missing, first-wins tie). Existing invariant tests unchanged and green.Full suite: 1337 passed, 1 failed. The lone failure (
test_scoring.py::TestRSIScoring::test_bull_overbought_matches_neutral_post_revert) is a pre-existing, unrelated stale-local-config artifact — the localalpha-engine-configclone is 25 commits behind origin's L1695 #209 RSI revert; CI (fresh config checkout) is green. Not introduced by and orthogonal to this change (diff isic_cio.py+ its test only).Deploy
Research Lambda — not yet redeployed. Once merged, redeploy research (
deploy.sh main/ workflow_dispatch) before rerunning the Saturday SF, then relaunch the Research→downstream rerun.🤖 Generated with Claude Code