fix: add code-level guards for Phase E and Phase D reliability (#274)#285
fix: add code-level guards for Phase E and Phase D reliability (#274)#285
Conversation
kiyotis
left a comment
There was a problem hiding this comment.
フォローアップ: 4件の改善ポイント
FB-1: section_issue finding でlocationにsection IDが含まれない場合の未対処
箇所: tools/knowledge-creator/scripts/phase_e_fix.py — _extract_allowed_sections 関数 (r'\bs(\d+)\b' の部分)
section_issue カテゴリのfindingは、LLMが "S9: Section count 2 < source headings 3" のようにV3記述形式でlocationを返すことがある。その場合、このregexでsection IDが1件も取れず allowed_sections = set() となる。
結果として diff guard がPhase E出力の全セクションをリバートし、fixが事実上無効化される。
section_issue は構造変更を指示するfindingのため、no_knowledge_content_invalid と同様に is_full_rebuild = True 扱いにするか、locationからsection IDが取れなかった section_issue を full rebuild フォールバックとして扱う必要がある。
FB-2: content_check.md V3ヘッダーに (severity: minor) が残っている
箇所: tools/knowledge-creator/prompts/content_check.md 97行目 — ### V3: Section Issues (severity: minor)
V1・V2では (severity: critical) をヘッダーから除去してGeneral RulesのD-1ルールに移行したが、V3は旧形式のままになっている。
V1/V2と統一して、severity指定をV3本文内の明示的なルールに移すか、少なくとも severity: minor (all findings in V3 are minor by definition) のように本文内に記述する形に変更すべき。PR #282のexpert reviewでもM-1として指摘されていた項目。
FB-3: fix.md にE-4(隣接テキストの破壊)のプロンプト制約がない
箇所: tools/knowledge-creator/prompts/fix.md — Constraintsセクション
E-2・E-3・E-5の制約は追加されているが、E-4(スコープ内のsectionにおける隣接テキストの破壊)に対するプロンプト制約がない。
diff guardはsection単位で保護するため、同一section内で「fixした行の隣の行が歪む」問題(例:"normal termination" → "abnormal termination" のような意味反転)はコードでは防げない。
PR #282にあった E-4 制約を追加すべき:
**E-4: Adjacent content preservation**
When editing content within a section, copy all existing content outside the edited location exactly as it appears.
diff guardが防げない唯一の残存リスクがここにある。
FB-4: test_diff_guard.py にsection_issue findingのdiff guard動作テストが欠落
箇所: tools/knowledge-creator/tests/ut/test_diff_guard.py — TestExtractAllowedSections クラス
FB-1の問題(locationにsection IDが含まれない section_issue finding → allowed_sections が空集合になりfixが無効化される)を再現するテストケースがない。
TestExtractAllowedSections に以下を追加すべき:
- locationがV3記述形式の
section_issuefinding(例:"S9: Section count 2 < source headings 3")を渡した場合のテスト - 期待動作:full rebuild フォールバックが機能し
is_full_rebuild = Trueになる(またはFB-1の修正後の設計に合わせた動作)
|
Fixed all 4 items. Commit: 3d4eaea
All 227 unit tests pass. Co-Authored-By: Claude (jp.anthropic.claude-sonnet-4-6) noreply@anthropic.com |
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_extract_allowed_sections + _apply_diff_guard in phase_e_fix.py revert any section not referenced in findings back to its input state. Empty-fix detection rejects LLM output that changes nothing in scope. E2E mock updated to return per-section findings so diff guard passes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_detect_severity_flips compares current round findings against the previous round and logs a WARNING for any finding whose severity changed at the same location+category. Wired into check_one for round >= 2 to track D-1 instability in production runs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fix.md: E-2 verbatim extraction rule, E-5 no-fabrication-from-patterns rule, E-3 preserve source notation, scope constraint. content_check.md: explicit severity assignment criteria for V1/V2, D-1 stability rule requiring justification in description field. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
phase_e_fix.py: clarify comments on full-rebuild path and new-section handling; make non-zero returncode path explicit with error message. phase_d_content_check.py: log error instead of silently swallowing exceptions in check_one, consistent with Phase E error handling. content_check.md: move D-1 stability rule to a top-level General Rules section so it clearly applies to all V1-V5 checks, not just V2. test_severity_flip.py: remove redundant write_json import alias. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- FB-1: section_issue with no sN location now triggers full rebuild to prevent diff guard from nullifying structural fixes - FB-2: remove (severity: minor) from V3 header in content_check.md, move to inline note for consistency with V1/V2 - FB-3: add E-4 adjacent content preservation constraint to fix.md - FB-4: add two tests for section_issue V3-style location handling Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Root cause of D-1 bug: prior round findings were not passed to the content_check.md prompt, so the model had no anchor to judge whether a location was previously clean, leading to inconsistent verdicts across rounds for unchanged content. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Applied kc fix --target to libraries-04_Permission, libraries-07_TagReference, libraries-thread_context, and toolbox-01_DefInfoGenerator segments. Also removes libraries-05_MessagingLog--s1 which was dropped from the catalog. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…et filtering - phase_e_fix.py: diff guard rejection check now also inspects index entry changes, not just sections. Fixes hints_missing findings being rejected when LLM correctly fixes index.hints but sections remain unchanged. - run.py: pass effective_target to _run_final_verification so Round 3 respects --target filtering instead of running on all files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fixed libraries-07_TagReference--s11 and s1 (hints_missing now correctly applied), libraries-04_Permission--s1 and s10. Also reflects prior libraries-thread_context, toolbox-01_DefInfoGenerator fixes and libraries-05_MessagingLog removal in output files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… constraint - Add explicit exclusion logic in content_check.md for prior_round_findings (location+category matching) - Provide clear guidance on what 'content change' means for different finding categories - Add concrete examples showing what violations look like - Enhance fix.md E-4 rule with examples to prevent adjacent text corruption - Unify V4 severity notation with V1/V2 in General Rules - All changes address root cause: LLM compliance failure with implicit negative instructions
61b568c to
c762ec9
Compare
…oach - Analyzed test run 20260331T171824: libraries-04_Permission--s10 error - Root cause: Phase D re-reports same findings (prompt compliance failure) * prior_round_findings is correctly passed to prompt * But LLM ignores "do NOT report same location" instruction * Results in unnecessary Phase E modification → diff guard error - Knowledge file was already fixed in R1, so no changes in R2 → error - Proposed remediation: strengthen prior_round_findings logic in content_check.md * Define "content change" explicitly (RST source changes, not KB updates) * Make prior_round_findings exclusion explicit (location + category matching) * Add concrete examples to guide LLM compliance Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
…T171824) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Closes #274
Approach
Phase E and Phase D unreliability was rooted in two distinct layers: (1) LLM behavioral issues that could be addressed by prompt constraints, and (2) structural regressions that prompts alone cannot reliably prevent. PR #282 addressed the prompt layer; this PR adds the code-level enforcement layer that #282 could not cover.
Three mechanisms are introduced:
_apply_diff_guardin Phase E — After Phase E writes a fix, the guard strips any new sections the LLM added outside the explicitly allowed set. This deterministically prevents fix-induced scope creep (E-1) and fabricated sections (E-2, E-5) without relying on LLM compliance.Prompt constraints in
fix.mdandcontent_check.md— Additional rules for E-2 (no new sections without cross-reference), E-3 (preserve source notation), E-4 (no meaning distortion), E-5 (no rules not in source), and D-1 (severity stability across rounds). These are best-effort guards for issues that the diff guard cannot detect structurally (e.g., meaning inversion within an existing section).Severity flip detection logging in Phase D — Phase D now logs when a finding's severity flips between rounds (critical↔minor for the same file/location). This surfaces D-1 violations empirically so they can be analyzed and the D-1 prompt rule can be refined.
The diff guard provides deterministic enforcement for structural issues; the prompt rules provide coverage for semantic issues that require LLM judgment. Together they make fix rounds converge rather than cycle.
Tasks
Core Implementation ✅ (Complete)
_extract_allowed_sectionsand_apply_diff_guardinphase_e_fix.pyfix_onewith full-rebuild bypass forno_knowledge_content_invalidandsection_issuefix.md(E-2, E-3, E-4, E-5)content_check.md(moved to General Rules section)phase_d_content_check.py_extract_allowed_sectionsand_apply_diff_guard(test_diff_guard.py)test_severity_flip.py)prior_round_findingsto Phase D prompt to stabilize D-1 across rounds--targetfilteringindexentry changesRoot Cause Analysis ✅ (Complete)
Fixes Implemented ✅ (Complete)
Updated Phase D prompt — Explicit exclusion of prior_round_findings
content_check.mdwith:Clarified E-4 constraint in
fix.md(lines 51-67)Existing implementations verified:
section_issueV3 handling): Already implemented inphase_e_fix.py:35-41with full_rebuild fallbacktest_diff_guard.py:96-110Test Status: IN PROGRESS ⏳
User Verification: PENDING ⏳
Investigation Results and Root Cause
Phase D False Positive Root Cause (Confirmed):
Why false positive occurred:
Why current fix works:
Explicit positive instruction replaces implicit negative
Concrete examples make decision logic clear
Content change semantics clarified
hints_missing: Judge source content, not hints arrayExpert Review
AI-driven expert reviews conducted before PR creation (see
.claude/rules/expert-review.md):Success Criteria Status
Remaining Work Before PR Submission
Blocker #1: E2E Tests (Task #4)
Blocker #2: User Verification (Task #5)
Next Session Resume Steps
./run.py --version 1.4(or 1.3)Files Changed
tools/knowledge-creator/prompts/content_check.md— Updated Phase D prior_round_findings section with explicit exclusion logictools/knowledge-creator/prompts/fix.md— Enhanced E-4 constraint with examples.pr/00274/notes.md— Investigation results and root cause analysis🤖 Generated with Claude Code