Skip to content

feat(auto): safe-default closure mode + partial-unsafe blocker code (PR-B2)#1167

Merged
shaun0927 merged 1 commit into
Q00:mainfrom
shaun0927:feat/auto-interview-safe-default-closure
May 22, 2026
Merged

feat(auto): safe-default closure mode + partial-unsafe blocker code (PR-B2)#1167
shaun0927 merged 1 commit into
Q00:mainfrom
shaun0927:feat/auto-interview-safe-default-closure

Conversation

@shaun0927
Copy link
Copy Markdown
Collaborator

Summary

PR-B2 of the L4 Auto Envelope v2 freeze (#1157, #821, #1138).

Closes two gaps observed in live ooo auto runs against the #821 acceptance matrix where the safe-default policy could have produced a clean closure but the result envelope did not reflect what happened:

  • The existing safe-default success path closed the interview but never set state.interview_closure_mode. Callers could not distinguish mutual_agreement (default None) from a safe-default-applied closure. PR-B2 tags the envelope with interview_closure_mode = "safe_default" on that path.
  • The partial safe-default case — some required gaps were safely defaultable but at least one remained unsafe (e.g. CONFLICTING ledger entry, per-section unsafe-context flag) — used to fall through to the generic interview_max_rounds_exhausted blocker with no structured event and no rollback. PR-B2 routes it to a dedicated event auto.interview.safe_default_partial_unsafe_gaps and a new typed code interview_unsafe_gaps_remain. Partial defaults are rolled back via the existing _revert_safe_default_entries helper because synthesis was never pushed to the backend transcript — same invariant as the synthesis-failure rollback already in place.

The genuine-deadlock case (nothing defaultable at all) is unchanged — it continues to emit interview_max_rounds_exhausted.

Why this scope

This is the smallest first slice that closes the documented PR-B2 gap from the #1157 living SSOT and pushes the L4 Envelope v2 lane from 🟡 partial to closer-to-🟢-complete. After this lands, closure_mode taxonomy has four values (None / ledger_only / safe_default; blockers do not carry it), and stop_reason_code taxonomy has eight codes (3 interview-layer + 5 Ralph-layer).

What is NOT done here

Scope

  • src/ouroboros/auto/interview_driver.py
    • Tag safe_default closure_mode on the existing success branch.
    • Insert the new partial-unsafe branch with rollback + typed code, ahead of the existing mutual_agreement_deadlock_at_max_rounds event emission.
  • skills/auto/SKILL.md
    • Extend the canonical stop_reason_code taxonomy table to 8 codes (adds interview_unsafe_gaps_remain).
    • Add an Interview closure mode taxonomy table covering None / ledger_only / safe_default.
  • tests/unit/auto/test_interview_pipeline.py
    • Extend the existing safe-default success test to assert state.interview_closure_mode == "safe_default".
    • Add a new test for the partial-unsafe path that constructs a CONFLICTING ledger entry on one section and asserts the new typed code, the rollback invariant, and the preserved user-recorded entry.
    • Extend the existing unsafe-everything test (case C) with a regression guard that last_error_code stays interview_max_rounds_exhausted.

Test plan

  • uv run pytest tests/unit/auto/test_interview_pipeline.py -k "safe_default or unsafe_gaps or finalize" -q → 39 passed
  • uv run pytest tests/unit/auto tests/integration/auto -q → 908 passed (baseline 907 + 1 new)
  • uv run ruff check on touched files → clean
  • uv run ruff format --check on touched files → clean
  • uv run mypy src/ouroboros/auto/interview_driver.py → clean

Refs #1157 (L4 lane), #821 (autonomy acceptance matrix), #1138 (PR-A instrumentation), #1148 (PR-B1 ledger_only), #1151 (PR-E stop_reason_code).

PR-B2 of the L4 Auto Envelope v2 freeze (Q00#1157, Q00#821).

## Summary

Close two gaps observed in live ``ooo auto`` runs against the Q00#821
acceptance matrix where the safe-default policy could have produced a
clean closure but the result envelope did not reflect what happened:

- The existing safe-default success path closed the interview via line
  ~470 of ``interview_driver.py`` but never set
  ``state.interview_closure_mode``. Callers could not distinguish
  ``mutual_agreement`` (default ``None``) from a safe-default-applied
  closure. PR-B2 tags the envelope with ``interview_closure_mode =
  "safe_default"`` on that path.
- The partial safe-default case — some required gaps were safely
  defaultable but at least one remained unsafe (e.g. a CONFLICTING
  ledger entry, a per-section unsafe-context flag) — used to fall
  through to the generic ``interview_max_rounds_exhausted`` blocker
  with no structured event and no rollback. PR-B2 routes it to a
  dedicated event ``auto.interview.safe_default_partial_unsafe_gaps``
  and a new typed code ``interview_unsafe_gaps_remain``; the partial
  defaults are rolled back via the existing ``_revert_safe_default_entries``
  helper because synthesis was never pushed to the backend transcript
  (same invariant as the synthesis-failure rollback already in place).

The genuine-deadlock case (nothing defaultable at all) is preserved
unchanged — it continues to emit ``interview_max_rounds_exhausted``.

## Scope

- ``src/ouroboros/auto/interview_driver.py``: tag ``safe_default``
  closure_mode on the existing success branch; insert the new
  partial-unsafe branch before the existing ``ledger_done`` event.
- ``skills/auto/SKILL.md``: extend the stop_reason_code taxonomy
  (8 codes now) and add an Interview closure mode taxonomy table.
- ``tests/unit/auto/test_interview_pipeline.py``:
  - Extend the existing safe-default success test to assert
    ``state.interview_closure_mode == "safe_default"``.
  - Add a new test for the partial-unsafe path that constructs a
    CONFLICTING ledger entry on one section and asserts the new
    typed code, the rollback invariant, and the preserved
    user-recorded entry.
  - Extend the existing unsafe-everything test (case C) with a
    regression guard that ``last_error_code`` stays
    ``interview_max_rounds_exhausted``.

No schema change, no manifest change, no new envelope field. Pure
behavior + envelope-tag refinement on existing fields shipped by
Q00#1148 and Q00#1151.

## Test plan

- ``uv run pytest tests/unit/auto/test_interview_pipeline.py -k "safe_default or unsafe_gaps or finalize" -q`` → 39 passed
- ``uv run pytest tests/unit/auto tests/integration/auto -q`` → 908 passed (baseline 907 + 1 new)
- ``uv run ruff check`` on touched files → clean
- ``uv run ruff format --check`` on touched files → clean
- ``uv run mypy src/ouroboros/auto/interview_driver.py`` → clean

Refs Q00#1157 (L4 lane), Q00#821 (autonomy acceptance matrix),
Q00#1138 (PR-A instrumentation), Q00#1148 (PR-B1 ledger_only), Q00#1151 (PR-E stop_reason_code).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: APPROVE

Reviewing commit c2f34e8 for PR #1167

Review record: 30bb7c29-1ab2-4b29-9013-abbefa631dfa

Blocking Findings

No in-scope blocking findings remained after policy filtering.

Non-blocking Suggestions

None.

Design Notes

Unable to complete the review: every attempt to read the provided diff, changed-file list, comments, or source files failed before execution because the sandbox wrapper cannot create a user namespace (bwrap: No permissions to create a new namespace). I did not inspect the PR contents, so I cannot provide a valid architectural assessment.

Recovery Notes

First recoverable review artifact generated from codex analysis log.


Reviewed by ouroboros-agent[bot] via Codex deep analysis

@Q00
Copy link
Copy Markdown
Owner

Q00 commented May 25, 2026

Follow-up policy note from #1219 / PR #1220: if the safe-default synthesis is successfully written to the persisted interview transcript but the backend returns a non-terminal follow-up turn, the auto driver should now fail forward as a safe_default closure. The ledger is already structurally complete and mirrored into the transcript, so rolling defaults back creates the observed cli-todo false block. The remaining fail-closed path is transcript-sync failure before the synthesis is persisted.

Posted by agentos-roadmap-warden — bot. Reply with /warden ignore to suppress further comments on this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants