Skip to content

fix(qa): #1036 — structural_completeness WARN (not FATAL-cap) for authored campaigns#1037

Merged
100yenadmin merged 1 commit into
mainfrom
fix/structural-completeness-authored-1036
Jun 19, 2026
Merged

fix(qa): #1036 — structural_completeness WARN (not FATAL-cap) for authored campaigns#1037
100yenadmin merged 1 commit into
mainfrom
fix/structural-completeness-authored-1036

Conversation

@100yenadmin

@100yenadmin 100yenadmin commented Jun 19, 2026

Copy link
Copy Markdown
Member

Closes #1036. The structural_completeness gate FATAL-capped authored golden-spine runs to 2.5: the campaign-arc quest (seeded from the adventure hook) is multi-session by design and authored adventures author no closable sub-quests, so a 25-beat run still REDs — a self-inflicted false-cap, the one sibling #1030 didn't reach.

Option A scope guard (Option B "DM under-drives sub-quests" isn't viable until content authors per-act sub-quests): is_authored_campaign = bool(tools.get("start_adventure") or state.get("scenes")); the unresolved_arc sub-check (b) demotes FATAL→WARN when authored and only the hook-seeded arc is open. Preserved FATAL: any non-authored run, AND an authored run with DM-added sub-quests (add_quest) left unresolved — genuine dropped threads. The approval-frozen sub-check (a) stays FATAL always.

Honesty proof (not score-gaming): new GREEN corpus fixture (authored → WARN/GREEN); the existing non-authored fixture stays FATAL RED; +4 unit tests (authored→WARN, non-authored→FATAL, add_quest→FATAL, scenes-signal→WARN) + a corpus green-case guard. 78 focused tests pass. Mirrors #1030's WARN-vs-FATAL discipline.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Behavioral gate now conditionally handles unresolved quest arcs: demotes severity from fatal to warning for authored campaigns when only auto-seeded quests remain unresolved.
  • Tests

    • Added comprehensive test coverage for authored-campaign quest-arc evaluation, including fixture cases and assertions for warning vs. failure scenarios.

…red campaigns (#1036)

ROOT CAUSE
The `structural_completeness` behavioral gate (qa/assert_behavioral.py) FATAL-capped
AUTHORED golden-spine runs to 2.5. Sub-check (b) `unresolved_arc` fires when an active
quest reaches session end open across a >=2-location arc with no quest-resolution call.
But the campaign-arc quest is SEEDED from the authored adventure `hook` and is multi-
session by design; the authored adventures (e.g. embergloom-pact) author NO closable
sub-quests, so the DM legitimately never calls complete_quest / set_quest_status — and
(b) FATAL-REDs even a clean 25-beat authored run. A self-inflicted false-cap. Sibling
#1030 fixed party_traveled / combat_not_left_active the same way but missed this one.

FIX (Option A scope guard — mirrors #1030's WARN-vs-FATAL discipline EXACTLY)
- Compute `is_authored_campaign = bool(tools.get("start_adventure") or state.get("scenes"))`.
  `start_adventure` is the authored cold-open call (server.py:697), always in the tool
  stream `_tally` sees; `state["scenes"]` is non-empty only for seeded authored adventures
  (server.py serializes it; content.py persists authored scenes).
- Demote ONLY sub-check (b) unresolved_arc from FATAL->WARN when authored AND the only open
  quest is the hook-seeded arc. The gate still APPENDS the WARN message (visibility kept);
  the run is no longer RED-capped on (b) alone.
- Clause (a) approval-frozen stays FATAL ALWAYS.
- PRESERVE FATAL for: any NON-authored run (the original narrated-not-engaged failure),
  AND an authored run that called add_quest (server.py:10165 — the DM's own quest-creation
  tool, distinguishable from the hook-seeded quest at gate time) and left it unresolved —
  a genuine dropped thread.

  New severity: `_unresolved_fatal = unresolved_arc and (not is_authored_campaign or
  bool(tools.get("add_quest")))`; `_structural_fatal = approval_frozen_run or _unresolved_fatal`.

ANTI-SCORE-GAMING DUAL CORPUS PROOF
- NEW GREEN fixture qa/gate_corpus/cases/structural_completeness_authored_warn/ (built by
  builder.py `case_structural_completeness_authored_warn`, recorded under a new RED-only-
  safe `green_cases` manifest key): authored profile (start_adventure + scenes + frozen
  companion + active hook quest + 2 locations + no resolution) -> gate exits GREEN with
  structural_completeness as [WARN]. Locked by test_behavioral_gate_corpus.py
  ::test_green_case_warns_but_stays_green (the inverse guard: re-promoting (b) to FATAL
  flips it RED and fails).
- The EXISTING non-authored qa/gate_corpus/cases/structural_completeness/ fixture (no
  start_adventure, no scenes) regenerated cleanly and STILL exits FATAL RED.
- 4 unit tests in qa/test_assert_behavioral.py: authored->WARN/GREEN, non-authored->FATAL/
  RED, authored+add_quest->FATAL (carve-out), authored-via-scenes->WARN.
- Coverage audit (test_manifest_covers_every_fatal_check) stays green — the gate still
  classifies structural_completeness as FATAL (fatal=<var>, not fatal=False), no drift.
- BEHAVIORAL_GATE_TAXONOMY.json hint updated to document the #1036 authored-WARN behavior.

Additive, no existing guard weakened. 78 focused tests pass single-process (no xdist).
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 13d2cdb1-caaa-44df-a9de-79d82712c8e4

📥 Commits

Reviewing files that changed from the base of the PR and between f16f49e and 50e4c6c.

📒 Files selected for processing (8)
  • qa/BEHAVIORAL_GATE_TAXONOMY.json
  • qa/assert_behavioral.py
  • qa/gate_corpus/builder.py
  • qa/gate_corpus/cases/structural_completeness_authored_warn/run.jsonl
  • qa/gate_corpus/cases/structural_completeness_authored_warn/state.json
  • qa/gate_corpus/manifest.json
  • qa/test_assert_behavioral.py
  • qa/test_behavioral_gate_corpus.py

📝 Walkthrough

Walkthrough

The structural_completeness behavioral gate's unresolved_arc sub-check is changed from always-FATAL to conditional: authored-campaign runs (detected via start_adventure tool call or non-empty state["scenes"]) where only the hook-seeded quest remains unresolved now produce a WARN instead of FATAL. approval_frozen_run and DM-added (add_quest) unresolved quests remain FATAL. A GREEN corpus fixture, manifest entry, four unit tests, and a manifest-driven corpus test lock this relaxation.

Changes

structural_completeness Authored-Campaign FATAL→WARN Guard

Layer / File(s) Summary
Gate logic: authored-campaign detection and severity change
qa/assert_behavioral.py, qa/BEHAVIORAL_GATE_TAXONOMY.json
Detects authored runs via start_adventure/state["scenes"], checks add_quest usage, and demotes unresolved_arc from FATAL to WARN only for authored runs with no DM-added unresolved quest. approval_frozen_run stays always-FATAL. Taxonomy hint is updated to document the new clause.
GREEN corpus fixture and builder
qa/gate_corpus/builder.py, qa/gate_corpus/cases/structural_completeness_authored_warn/run.jsonl, qa/gate_corpus/cases/structural_completeness_authored_warn/state.json, qa/gate_corpus/manifest.json
Adds case_structural_completeness_authored_warn builder, _GREEN_CASES_SPEC registry, build() extension to write GREEN artifacts and embed green_cases in manifest.json, committed fixture files, and updated manifest _doc.
Unit tests: four authored-campaign edge cases
qa/test_assert_behavioral.py
Authored run → [WARN]/green; non-authored same profile → [FAIL]; authored + add_quest unresolved → [FAIL]; authored via state["scenes"][WARN]/green.
Corpus test: GREEN parametrized runner
qa/test_behavioral_gate_corpus.py
Adds _warned_checks helper, loads green_cases from manifest, and test_green_case_warns_but_stays_green asserts rc==0, warn_check not in [FAIL], warn_check in [WARN].

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

  • electricsheephq/WorldOS#961: Introduced the underlying structural_completeness FATAL check for unresolved quest engagement in qa/assert_behavioral.py that this PR now conditionally relaxes for authored campaigns.
  • electricsheephq/WorldOS#977: Also modified qa/BEHAVIORAL_GATE_TAXONOMY.json and qa/gate_corpus/manifest.json for structural_completeness gate metadata, the same files updated here.

Poem

🐇 Hop hop, the gate was strict — too harsh for authored tales,
A quest left open on the spine shouldn't tip the scales!
I sniffed start_adventure, checked if scenes were set,
Then softened FATAL down to WARN — no need to fret.
The GREEN corpus locks the guard, the tests all pass with cheer,
For authored runs, dear reviewer, the path is now more clear! ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description includes a detailed summary of the change, the scope guard implementation, preserved FATAL cases, and test coverage; however, it is missing the required CLA checkbox attestation and explicit validation/checks-run documentation. Add the complete CLA checklist with checked boxes and list the specific validation checks (unit tests, corpus tests, etc.) that were run to verify this change.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and clearly describes the main change: a conditional demotion of the structural_completeness gate from FATAL to WARN for authored campaigns, addressing issue #1036.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 50e4c6c681

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread qa/assert_behavioral.py

# Severity: clause (a) approval-frozen stays FATAL ALWAYS. Clause (b) unresolved_arc is
# FATAL unless it's an authored campaign whose ONLY open quest is the hook-seeded arc.
_unresolved_fatal = unresolved_arc and (not is_authored_campaign or dm_added_quest)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict authored demotion to the actual seeded quest

For resumed authored campaigns, state["scenes"] remains true but the current transcript may not include the earlier add_quest call that created an active side quest. In that case dm_added_quest is false, so this line demotes unresolved_arc to a WARN for every active quest in the state, even when the open quest was a DM-created thread from a prior session rather than the hook-seeded campaign arc. That creates a false-green on the exact dropped-thread scenario the comment says should remain fatal; the demotion needs to verify the open quest is the single seeded hook (or otherwise track quest provenance), not just rely on this run’s tool counts.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

structural_completeness still FATAL-caps authored-campaign runs (campaign-arc quest can't resolve in one session)

1 participant