Skip to content

fix: add code-level guards for Phase E and Phase D reliability (#274)#285

Closed
kiyotis wants to merge 16 commits intomainfrom
274-fix-phase-e-phase-d-reliability
Closed

fix: add code-level guards for Phase E and Phase D reliability (#274)#285
kiyotis wants to merge 16 commits intomainfrom
274-fix-phase-e-phase-d-reliability

Conversation

@kiyotis
Copy link
Copy Markdown
Contributor

@kiyotis kiyotis commented Mar 31, 2026

Closes #274

Approach

Phase E and Phase D unreliability was rooted in two distinct layers: (1) LLM behavioral issues that could be addressed by prompt constraints, and (2) structural regressions that prompts alone cannot reliably prevent. PR #282 addressed the prompt layer; this PR adds the code-level enforcement layer that #282 could not cover.

Three mechanisms are introduced:

  1. _apply_diff_guard in Phase E — After Phase E writes a fix, the guard strips any new sections the LLM added outside the explicitly allowed set. This deterministically prevents fix-induced scope creep (E-1) and fabricated sections (E-2, E-5) without relying on LLM compliance.

  2. Prompt constraints in fix.md and content_check.md — Additional rules for E-2 (no new sections without cross-reference), E-3 (preserve source notation), E-4 (no meaning distortion), E-5 (no rules not in source), and D-1 (severity stability across rounds). These are best-effort guards for issues that the diff guard cannot detect structurally (e.g., meaning inversion within an existing section).

  3. Severity flip detection logging in Phase D — Phase D now logs when a finding's severity flips between rounds (critical↔minor for the same file/location). This surfaces D-1 violations empirically so they can be analyzed and the D-1 prompt rule can be refined.

The diff guard provides deterministic enforcement for structural issues; the prompt rules provide coverage for semantic issues that require LLM judgment. Together they make fix rounds converge rather than cycle.

Tasks

Core Implementation ✅ (Complete)

  • Implement _extract_allowed_sections and _apply_diff_guard in phase_e_fix.py
  • Wire diff guard into fix_one with full-rebuild bypass for no_knowledge_content_invalid and section_issue
  • Add prompt constraints to fix.md (E-2, E-3, E-4, E-5)
  • Add D-1 severity stability rule to content_check.md (moved to General Rules section)
  • Add severity flip detection and logging to phase_d_content_check.py
  • Add unit tests for _extract_allowed_sections and _apply_diff_guard (test_diff_guard.py)
  • Add unit tests for severity flip detection (test_severity_flip.py)
  • Adapt E2E tests to cover diff guard wiring
  • Apply expert review improvements (comment clarity, error logging, test cleanup)
  • Fix: pass prior_round_findings to Phase D prompt to stabilize D-1 across rounds
  • Fix: Round 3 (Final Verification) now respects --target filtering
  • Fix: diff guard rejection check now also inspects index entry changes

Root Cause Analysis ✅ (Complete)

  • Investigated Phase D false positive — Confirmed root cause: prompt compliance failure
    • R1 findings: BusinessDateProvider + Initializable reported as missing (correct)
    • R1 fix: Both added to index[0].hints in cache (correct)
    • R2 findings: Same findings re-reported despite cache containing them (false positive)
    • Root cause: Phase D prompt's negative instruction ("do NOT report") ignored by LLM
      • Prior findings passed correctly but LLM treats re-checking as independent validation
      • Current phrasing too implicit; needs explicit positive instruction

Fixes Implemented ✅ (Complete)

  • Updated Phase D prompt — Explicit exclusion of prior_round_findings

    • Rewrote prior_round_findings section (lines 35-70) in content_check.md with:
      • Explicit (location, category) pair extraction and matching logic
      • Clear guidance on what "content has changed" means per finding type
      • Concrete examples showing SKIP vs REPORT decision points
    • Added severity defaults to General Rules: V3/V4 always minor
    • Result: Clear positive instruction replacing implicit negative guidance
  • Clarified E-4 constraint in fix.md (lines 51-67)

    • Added label: "E-4: Adjacent content preservation (CRITICAL)"
    • Added violation examples: word inversion, adjacent text deletion
    • Added verification step: "Read fixed section aloud — does it convey same information?"
  • Existing implementations verified:

    • FB-1 (section_issue V3 handling): Already implemented in phase_e_fix.py:35-41 with full_rebuild fallback
    • FB-4 (test case for V3 format): Already present in test_diff_guard.py:96-110

Test Status: IN PROGRESS ⏳

  • Unit Tests: 259/269 passing
  • E2E Tests: 10/269 failing
    • New Phase D prompt changes findings count expectations
    • Expected: 677 (v6), 872 (v5), 923 (v1.4), 555 (v1.3)
    • Actual: 381 (v6), 498 (v5), ~55% reduction overall
    • Status: Root cause analysis needed before fixing expectations
    • Task Setup repository structure and nabledge-6 skill #4: Fix all E2E tests to pass

User Verification: PENDING ⏳

  • Real-world KC execution on v1.4 or v1.3 to verify:
    • Phase D false positive (hints_missing re-detection) is eliminated in Round 2
    • Phase E doesn't create new issues in previously clean files
    • Fix rounds converge properly (R1 → R2 → R3 showing improvement, not cycling)
  • Task Setup repository structure and nabledge-6 skill (#4) #5: User to run KC and confirm original issues resolved

Investigation Results and Root Cause

Phase D False Positive Root Cause (Confirmed):

Round hints state Phase D output Issue
R1 businessDateProvider ✅, BusinessDateProvider ❌ "hints_missing" ✅ Correct detection
R1 Fix Both added to cache Phase E modified correctly
R2 businessDateProvider ✅, BusinessDateProvider ✅ "hints_missing" ❌ LLM compliance failure (false positive)

Why false positive occurred:

  • Prior findings passed: R1 findings sent to R2 prompt correctly
  • Prompt instruction: "do NOT report findings from prior_round_findings unless content changed"
  • LLM behavior: Ignored negative instruction, performed independent validation
  • Result: Found "BusinessDateProvider still missing" (false, because it was added) and reported it

Why current fix works:

  1. Explicit positive instruction replaces implicit negative

    • Old: "do NOT report..."
    • New: "Extract (location, category) pairs... Skip any matching pair..."
    • LLM compliance improves with positive, concrete instructions
  2. Concrete examples make decision logic clear

    • Example 1: SKIP when cache shows fix applied and source unchanged
    • Example 2: REPORT when new requirement added to source
  3. Content change semantics clarified

    • hints_missing: Judge source content, not hints array
    • Other categories: Source content is what counts

Expert Review

AI-driven expert reviews conducted before PR creation (see .claude/rules/expert-review.md):

Success Criteria Status

Criterion Status Evidence Notes
E-1: No new issues in files clean in previous round ✅ Code Met Diff guard strips out-of-scope sections Needs E2E pass + user verification
E-2: omission(critical)→fabrication pattern eliminated ✅ Code Met Diff guard prevents new-section fabrications Needs user verification
E-3: Source notation and typos preserved ✅ Code Met Prompt rule E-3 with examples Needs user verification
E-4: No word inversions, deletions, mistranscriptions ✅ Code Met Enhanced E-4 rule with examples Prompt-level; needs user verification
E-5: Rules not in source are not added ✅ Code Met Diff guard + E-5 constraint Needs user verification
D-1: Severity stable across rounds ✅ Code Met D-1 rule + severity defaults Critical: Needs Phase D false positive resolution verification

Remaining Work Before PR Submission

Blocker #1: E2E Tests (Task #4)

  • 10 tests failing due to findings count mismatch
  • Need to determine: Are expectations outdated, or is there an implementation issue?
  • Current findings reduction (~45%) suggests false positives are being eliminated
  • Action: Investigate and fix all E2E test failures

Blocker #2: User Verification (Task #5)

  • Must confirm Phase D false positive is actually resolved in production
  • Must confirm fix rounds converge (not cycling with new issues)
  • Must be done on real v1.4/v1.3 KC runs, not mocked tests
  • Action: User to run KC and report findings

Next Session Resume Steps

  1. Read this PR body to understand current state
  2. Work on Task Setup repository structure and nabledge-6 skill #4: Get all E2E tests passing (259/269 → 269/269)
    • Debug why findings count changed
    • Fix expectations or implementation
  3. Request Task Setup repository structure and nabledge-6 skill (#4) #5 from user: Run KC on v1.4/v1.3
    • User executes: ./run.py --version 1.4 (or 1.3)
    • User verifies: No Phase D false positives, Phase E doesn't create new issues, rounds converge
  4. After both blockers cleared: Submit PR

Files Changed

  • tools/knowledge-creator/prompts/content_check.md — Updated Phase D prior_round_findings section with explicit exclusion logic
  • tools/knowledge-creator/prompts/fix.md — Enhanced E-4 constraint with examples
  • .pr/00274/notes.md — Investigation results and root cause analysis

🤖 Generated with Claude Code

@kiyotis kiyotis added the bug Something isn't working label Mar 31, 2026
@kiyotis kiyotis changed the title fix: improve Phase E and Phase D prompt reliability (#274) fix: add code-level diff guard to Phase E to prevent fix-induced regressions (#274) Mar 31, 2026
@kiyotis kiyotis changed the title fix: add code-level diff guard to Phase E to prevent fix-induced regressions (#274) fix: add code-level guards for Phase E and Phase D reliability (#274) Mar 31, 2026
Copy link
Copy Markdown
Contributor Author

@kiyotis kiyotis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

フォローアップ: 4件の改善ポイント

FB-1: section_issue finding でlocationにsection IDが含まれない場合の未対処

箇所: tools/knowledge-creator/scripts/phase_e_fix.py_extract_allowed_sections 関数 (r'\bs(\d+)\b' の部分)

section_issue カテゴリのfindingは、LLMが "S9: Section count 2 < source headings 3" のようにV3記述形式でlocationを返すことがある。その場合、このregexでsection IDが1件も取れず allowed_sections = set() となる。

結果として diff guard がPhase E出力の全セクションをリバートし、fixが事実上無効化される。

section_issue は構造変更を指示するfindingのため、no_knowledge_content_invalid と同様に is_full_rebuild = True 扱いにするか、locationからsection IDが取れなかった section_issue を full rebuild フォールバックとして扱う必要がある。


FB-2: content_check.md V3ヘッダーに (severity: minor) が残っている

箇所: tools/knowledge-creator/prompts/content_check.md 97行目 — ### V3: Section Issues (severity: minor)

V1・V2では (severity: critical) をヘッダーから除去してGeneral RulesのD-1ルールに移行したが、V3は旧形式のままになっている。

V1/V2と統一して、severity指定をV3本文内の明示的なルールに移すか、少なくとも severity: minor (all findings in V3 are minor by definition) のように本文内に記述する形に変更すべき。PR #282のexpert reviewでもM-1として指摘されていた項目。


FB-3: fix.md にE-4(隣接テキストの破壊)のプロンプト制約がない

箇所: tools/knowledge-creator/prompts/fix.md — Constraintsセクション

E-2・E-3・E-5の制約は追加されているが、E-4(スコープ内のsectionにおける隣接テキストの破壊)に対するプロンプト制約がない。

diff guardはsection単位で保護するため、同一section内で「fixした行の隣の行が歪む」問題(例:"normal termination""abnormal termination" のような意味反転)はコードでは防げない。

PR #282にあった E-4 制約を追加すべき:

**E-4: Adjacent content preservation**
When editing content within a section, copy all existing content outside the edited location exactly as it appears.

diff guardが防げない唯一の残存リスクがここにある。


FB-4: test_diff_guard.pysection_issue findingのdiff guard動作テストが欠落

箇所: tools/knowledge-creator/tests/ut/test_diff_guard.pyTestExtractAllowedSections クラス

FB-1の問題(locationにsection IDが含まれない section_issue finding → allowed_sections が空集合になりfixが無効化される)を再現するテストケースがない。

TestExtractAllowedSections に以下を追加すべき:

  • locationがV3記述形式の section_issue finding(例:"S9: Section count 2 < source headings 3")を渡した場合のテスト
  • 期待動作:full rebuild フォールバックが機能し is_full_rebuild = True になる(またはFB-1の修正後の設計に合わせた動作)

@kiyotis
Copy link
Copy Markdown
Contributor Author

kiyotis commented Mar 31, 2026

Fixed all 4 items.

Commit: 3d4eaea

  • FB-1: _extract_allowed_sections now treats section_issue with no parseable sN location as is_full_rebuild = True, preventing the diff guard from nullifying structural fixes
  • FB-2: Removed (severity: minor) from the V3 header in content_check.md; replaced with inline note All findings in this section are minor by definition. to match V1/V2 style
  • FB-3: Added E-4: Adjacent content preservation constraint to fix.md Constraints section
  • FB-4: Added two new tests to TestExtractAllowedSections: one for V3-style location triggering full rebuild, one for plain sN location continuing normally

All 227 unit tests pass.

Co-Authored-By: Claude (jp.anthropic.claude-sonnet-4-6) noreply@anthropic.com

kiyotis and others added 14 commits April 3, 2026 14:04
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_extract_allowed_sections + _apply_diff_guard in phase_e_fix.py revert
any section not referenced in findings back to its input state.
Empty-fix detection rejects LLM output that changes nothing in scope.
E2E mock updated to return per-section findings so diff guard passes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_detect_severity_flips compares current round findings against the
previous round and logs a WARNING for any finding whose severity
changed at the same location+category. Wired into check_one for
round >= 2 to track D-1 instability in production runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fix.md: E-2 verbatim extraction rule, E-5 no-fabrication-from-patterns
rule, E-3 preserve source notation, scope constraint.
content_check.md: explicit severity assignment criteria for V1/V2,
D-1 stability rule requiring justification in description field.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
phase_e_fix.py: clarify comments on full-rebuild path and new-section
handling; make non-zero returncode path explicit with error message.
phase_d_content_check.py: log error instead of silently swallowing
exceptions in check_one, consistent with Phase E error handling.
content_check.md: move D-1 stability rule to a top-level General Rules
section so it clearly applies to all V1-V5 checks, not just V2.
test_severity_flip.py: remove redundant write_json import alias.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- FB-1: section_issue with no sN location now triggers full rebuild
  to prevent diff guard from nullifying structural fixes
- FB-2: remove (severity: minor) from V3 header in content_check.md,
  move to inline note for consistency with V1/V2
- FB-3: add E-4 adjacent content preservation constraint to fix.md
- FB-4: add two tests for section_issue V3-style location handling

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Root cause of D-1 bug: prior round findings were not passed to the
content_check.md prompt, so the model had no anchor to judge whether
a location was previously clean, leading to inconsistent verdicts
across rounds for unchanged content.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Applied kc fix --target to libraries-04_Permission, libraries-07_TagReference,
libraries-thread_context, and toolbox-01_DefInfoGenerator segments.
Also removes libraries-05_MessagingLog--s1 which was dropped from the catalog.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…et filtering

- phase_e_fix.py: diff guard rejection check now also inspects index entry
  changes, not just sections. Fixes hints_missing findings being rejected
  when LLM correctly fixes index.hints but sections remain unchanged.
- run.py: pass effective_target to _run_final_verification so Round 3
  respects --target filtering instead of running on all files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fixed libraries-07_TagReference--s11 and s1 (hints_missing now correctly
applied), libraries-04_Permission--s1 and s10. Also reflects prior
libraries-thread_context, toolbox-01_DefInfoGenerator fixes and
libraries-05_MessagingLog removal in output files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… constraint

- Add explicit exclusion logic in content_check.md for prior_round_findings (location+category matching)
- Provide clear guidance on what 'content change' means for different finding categories
- Add concrete examples showing what violations look like
- Enhance fix.md E-4 rule with examples to prevent adjacent text corruption
- Unify V4 severity notation with V1/V2 in General Rules
- All changes address root cause: LLM compliance failure with implicit negative instructions
@kiyotis kiyotis force-pushed the 274-fix-phase-e-phase-d-reliability branch from 61b568c to c762ec9 Compare April 3, 2026 06:22
kiyotis and others added 2 commits April 7, 2026 11:34
…oach

- Analyzed test run 20260331T171824: libraries-04_Permission--s10 error
- Root cause: Phase D re-reports same findings (prompt compliance failure)
  * prior_round_findings is correctly passed to prompt
  * But LLM ignores "do NOT report same location" instruction
  * Results in unnecessary Phase E modification → diff guard error
- Knowledge file was already fixed in R1, so no changes in R2 → error
- Proposed remediation: strengthen prior_round_findings logic in content_check.md
  * Define "content change" explicitly (RST source changes, not KB updates)
  * Make prior_round_findings exclusion explicit (location + category matching)
  * Add concrete examples to guide LLM compliance

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…T171824)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
@kiyotis
Copy link
Copy Markdown
Contributor Author

kiyotis commented Apr 7, 2026

superseded by #295 - new implementation from scratch per Issue #274 task list

@kiyotis kiyotis closed this Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

As a knowledge creator, I want Phase E and Phase D to operate more reliably so that knowledge file quality converges across fix rounds

1 participant