fix: add code-level guards for Phase E and Phase D reliability (#274) by kiyotis · Pull Request #285 · nablarch/nabledge-dev

kiyotis · 2026-03-31T01:21:22Z

Closes #274

Approach

Phase E and Phase D unreliability was rooted in two distinct layers: (1) LLM behavioral issues that could be addressed by prompt constraints, and (2) structural regressions that prompts alone cannot reliably prevent. PR #282 addressed the prompt layer; this PR adds the code-level enforcement layer that #282 could not cover.

Three mechanisms are introduced:

_apply_diff_guard in Phase E — After Phase E writes a fix, the guard strips any new sections the LLM added outside the explicitly allowed set. This deterministically prevents fix-induced scope creep (E-1) and fabricated sections (E-2, E-5) without relying on LLM compliance.
Prompt constraints in fix.md and content_check.md — Additional rules for E-2 (no new sections without cross-reference), E-3 (preserve source notation), E-4 (no meaning distortion), E-5 (no rules not in source), and D-1 (severity stability across rounds). These are best-effort guards for issues that the diff guard cannot detect structurally (e.g., meaning inversion within an existing section).
Severity flip detection logging in Phase D — Phase D now logs when a finding's severity flips between rounds (critical↔minor for the same file/location). This surfaces D-1 violations empirically so they can be analyzed and the D-1 prompt rule can be refined.

The diff guard provides deterministic enforcement for structural issues; the prompt rules provide coverage for semantic issues that require LLM judgment. Together they make fix rounds converge rather than cycle.

Tasks

Core Implementation ✅ (Complete)

Root Cause Analysis ✅ (Complete)

Investigated Phase D false positive — Confirmed root cause: prompt compliance failure
- R1 findings: BusinessDateProvider + Initializable reported as missing (correct)
- R1 fix: Both added to index[0].hints in cache (correct)
- R2 findings: Same findings re-reported despite cache containing them (false positive)
- Root cause: Phase D prompt's negative instruction ("do NOT report") ignored by LLM
  - Prior findings passed correctly but LLM treats re-checking as independent validation
  - Current phrasing too implicit; needs explicit positive instruction

Fixes Implemented ✅ (Complete)

Test Status: IN PROGRESS ⏳

Unit Tests: 259/269 passing ✅
E2E Tests: 10/269 failing ❌
- New Phase D prompt changes findings count expectations
- Expected: 677 (v6), 872 (v5), 923 (v1.4), 555 (v1.3)
- Actual: 381 (v6), 498 (v5), ~55% reduction overall
- Status: Root cause analysis needed before fixing expectations
- Task Setup repository structure and nabledge-6 skill #4: Fix all E2E tests to pass

User Verification: PENDING ⏳

Real-world KC execution on v1.4 or v1.3 to verify:
- Phase D false positive (hints_missing re-detection) is eliminated in Round 2
- Phase E doesn't create new issues in previously clean files
- Fix rounds converge properly (R1 → R2 → R3 showing improvement, not cycling)
Task Setup repository structure and nabledge-6 skill (#4) #5: User to run KC and confirm original issues resolved

Investigation Results and Root Cause

Phase D False Positive Root Cause (Confirmed):

Round	hints state	Phase D output	Issue
R1	businessDateProvider ✅, BusinessDateProvider ❌	"hints_missing" ✅	Correct detection
R1 Fix	Both added to cache	—	Phase E modified correctly
R2	businessDateProvider ✅, BusinessDateProvider ✅	"hints_missing" ❌	LLM compliance failure (false positive)

Why false positive occurred:

Prior findings passed: R1 findings sent to R2 prompt correctly
Prompt instruction: "do NOT report findings from prior_round_findings unless content changed"
LLM behavior: Ignored negative instruction, performed independent validation
Result: Found "BusinessDateProvider still missing" (false, because it was added) and reported it

Why current fix works:

Explicit positive instruction replaces implicit negative
- Old: "do NOT report..."
- New: "Extract (location, category) pairs... Skip any matching pair..."
- LLM compliance improves with positive, concrete instructions
Concrete examples make decision logic clear
- Example 1: SKIP when cache shows fix applied and source unchanged
- Example 2: REPORT when new requirement added to source
Content change semantics clarified
- hints_missing: Judge source content, not hints array
- Other categories: Source content is what counts

Expert Review

AI-driven expert reviews conducted before PR creation (see .claude/rules/expert-review.md):

Software Engineer - Rating: 4/5

Success Criteria Status

Criterion	Status	Evidence	Notes
E-1: No new issues in files clean in previous round	✅ Code Met	Diff guard strips out-of-scope sections	Needs E2E pass + user verification
E-2: omission(critical)→fabrication pattern eliminated	✅ Code Met	Diff guard prevents new-section fabrications	Needs user verification
E-3: Source notation and typos preserved	✅ Code Met	Prompt rule E-3 with examples	Needs user verification
E-4: No word inversions, deletions, mistranscriptions	✅ Code Met	Enhanced E-4 rule with examples	Prompt-level; needs user verification
E-5: Rules not in source are not added	✅ Code Met	Diff guard + E-5 constraint	Needs user verification
D-1: Severity stable across rounds	✅ Code Met	D-1 rule + severity defaults	Critical: Needs Phase D false positive resolution verification

Remaining Work Before PR Submission

Blocker #1: E2E Tests (Task #4)

10 tests failing due to findings count mismatch
Need to determine: Are expectations outdated, or is there an implementation issue?
Current findings reduction (~45%) suggests false positives are being eliminated
Action: Investigate and fix all E2E test failures

Blocker #2: User Verification (Task #5)

Must confirm Phase D false positive is actually resolved in production
Must confirm fix rounds converge (not cycling with new issues)
Must be done on real v1.4/v1.3 KC runs, not mocked tests
Action: User to run KC and report findings

Next Session Resume Steps

Read this PR body to understand current state
Work on Task Setup repository structure and nabledge-6 skill #4: Get all E2E tests passing (259/269 → 269/269)
- Debug why findings count changed
- Fix expectations or implementation
Request Task Setup repository structure and nabledge-6 skill (#4) #5 from user: Run KC on v1.4/v1.3
- User executes: ./run.py --version 1.4 (or 1.3)
- User verifies: No Phase D false positives, Phase E doesn't create new issues, rounds converge
After both blockers cleared: Submit PR

Files Changed

tools/knowledge-creator/prompts/content_check.md — Updated Phase D prior_round_findings section with explicit exclusion logic
tools/knowledge-creator/prompts/fix.md — Enhanced E-4 constraint with examples
.pr/00274/notes.md — Investigation results and root cause analysis

🤖 Generated with Claude Code

kiyotis

フォローアップ: 4件の改善ポイント

FB-1: `section_issue` finding でlocationにsection IDが含まれない場合の未対処

箇所: tools/knowledge-creator/scripts/phase_e_fix.py — _extract_allowed_sections 関数 (r'\bs(\d+)\b' の部分)

section_issue カテゴリのfindingは、LLMが "S9: Section count 2 < source headings 3" のようにV3記述形式でlocationを返すことがある。その場合、このregexでsection IDが1件も取れず allowed_sections = set() となる。

結果として diff guard がPhase E出力の全セクションをリバートし、fixが事実上無効化される。

section_issue は構造変更を指示するfindingのため、no_knowledge_content_invalid と同様に is_full_rebuild = True 扱いにするか、locationからsection IDが取れなかった section_issue を full rebuild フォールバックとして扱う必要がある。

FB-2: `content_check.md` V3ヘッダーに `(severity: minor)` が残っている

箇所: tools/knowledge-creator/prompts/content_check.md 97行目 — ### V3: Section Issues (severity: minor)

V1・V2では (severity: critical) をヘッダーから除去してGeneral RulesのD-1ルールに移行したが、V3は旧形式のままになっている。

V1/V2と統一して、severity指定をV3本文内の明示的なルールに移すか、少なくとも severity: minor (all findings in V3 are minor by definition) のように本文内に記述する形に変更すべき。PR #282のexpert reviewでもM-1として指摘されていた項目。

FB-3: `fix.md` にE-4（隣接テキストの破壊）のプロンプト制約がない

箇所: tools/knowledge-creator/prompts/fix.md — Constraintsセクション

E-2・E-3・E-5の制約は追加されているが、E-4（スコープ内のsectionにおける隣接テキストの破壊）に対するプロンプト制約がない。

diff guardはsection単位で保護するため、同一section内で「fixした行の隣の行が歪む」問題（例："normal termination" → "abnormal termination" のような意味反転）はコードでは防げない。

PR #282にあった E-4 制約を追加すべき：

**E-4: Adjacent content preservation**
When editing content within a section, copy all existing content outside the edited location exactly as it appears.

diff guardが防げない唯一の残存リスクがここにある。

FB-4: `test_diff_guard.py` に`section_issue` findingのdiff guard動作テストが欠落

箇所: tools/knowledge-creator/tests/ut/test_diff_guard.py — TestExtractAllowedSections クラス

FB-1の問題（locationにsection IDが含まれない section_issue finding → allowed_sections が空集合になりfixが無効化される）を再現するテストケースがない。

TestExtractAllowedSections に以下を追加すべき：

locationがV3記述形式の section_issue finding（例："S9: Section count 2 < source headings 3"）を渡した場合のテスト
期待動作：full rebuild フォールバックが機能し is_full_rebuild = True になる（またはFB-1の修正後の設計に合わせた動作）

kiyotis · 2026-03-31T02:30:02Z

Fixed all 4 items.

Commit: 3d4eaea

FB-1: _extract_allowed_sections now treats section_issue with no parseable sN location as is_full_rebuild = True, preventing the diff guard from nullifying structural fixes
FB-2: Removed (severity: minor) from the V3 header in content_check.md; replaced with inline note All findings in this section are minor by definition. to match V1/V2 style
FB-3: Added E-4: Adjacent content preservation constraint to fix.md Constraints section
FB-4: Added two new tests to TestExtractAllowedSections: one for V3-style location triggering full rebuild, one for plain sN location continuing normally

All 227 unit tests pass.

Co-Authored-By: Claude (jp.anthropic.claude-sonnet-4-6) noreply@anthropic.com

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

_extract_allowed_sections + _apply_diff_guard in phase_e_fix.py revert any section not referenced in findings back to its input state. Empty-fix detection rejects LLM output that changes nothing in scope. E2E mock updated to return per-section findings so diff guard passes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

_detect_severity_flips compares current round findings against the previous round and logs a WARNING for any finding whose severity changed at the same location+category. Wired into check_one for round >= 2 to track D-1 instability in production runs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix.md: E-2 verbatim extraction rule, E-5 no-fabrication-from-patterns rule, E-3 preserve source notation, scope constraint. content_check.md: explicit severity assignment criteria for V1/V2, D-1 stability rule requiring justification in description field. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

phase_e_fix.py: clarify comments on full-rebuild path and new-section handling; make non-zero returncode path explicit with error message. phase_d_content_check.py: log error instead of silently swallowing exceptions in check_one, consistent with Phase E error handling. content_check.md: move D-1 stability rule to a top-level General Rules section so it clearly applies to all V1-V5 checks, not just V2. test_severity_flip.py: remove redundant write_json import alias. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- FB-1: section_issue with no sN location now triggers full rebuild to prevent diff guard from nullifying structural fixes - FB-2: remove (severity: minor) from V3 header in content_check.md, move to inline note for consistency with V1/V2 - FB-3: add E-4 adjacent content preservation constraint to fix.md - FB-4: add two tests for section_issue V3-style location handling Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Root cause of D-1 bug: prior round findings were not passed to the content_check.md prompt, so the model had no anchor to judge whether a location was previously clean, leading to inconsistent verdicts across rounds for unchanged content. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Applied kc fix --target to libraries-04_Permission, libraries-07_TagReference, libraries-thread_context, and toolbox-01_DefInfoGenerator segments. Also removes libraries-05_MessagingLog--s1 which was dropped from the catalog. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…et filtering - phase_e_fix.py: diff guard rejection check now also inspects index entry changes, not just sections. Fixes hints_missing findings being rejected when LLM correctly fixes index.hints but sections remain unchanged. - run.py: pass effective_target to _run_final_verification so Round 3 respects --target filtering instead of running on all files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fixed libraries-07_TagReference--s11 and s1 (hints_missing now correctly applied), libraries-04_Permission--s1 and s10. Also reflects prior libraries-thread_context, toolbox-01_DefInfoGenerator fixes and libraries-05_MessagingLog removal in output files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…sults

… constraint - Add explicit exclusion logic in content_check.md for prior_round_findings (location+category matching) - Provide clear guidance on what 'content change' means for different finding categories - Add concrete examples showing what violations look like - Enhance fix.md E-4 rule with examples to prevent adjacent text corruption - Unify V4 severity notation with V1/V2 in General Rules - All changes address root cause: LLM compliance failure with implicit negative instructions

…oach - Analyzed test run 20260331T171824: libraries-04_Permission--s10 error - Root cause: Phase D re-reports same findings (prompt compliance failure) * prior_round_findings is correctly passed to prompt * But LLM ignores "do NOT report same location" instruction * Results in unnecessary Phase E modification → diff guard error - Knowledge file was already fixed in R1, so no changes in R2 → error - Proposed remediation: strengthen prior_round_findings logic in content_check.md * Define "content change" explicitly (RST source changes, not KB updates) * Make prior_round_findings exclusion explicit (location + category matching) * Add concrete examples to guide LLM compliance Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

…T171824) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

kiyotis · 2026-04-07T07:25:44Z

superseded by #295 - new implementation from scratch per Issue #274 task list

kiyotis added the bug Something isn't working label Mar 31, 2026

kiyotis changed the title ~~fix: improve Phase E and Phase D prompt reliability (#274)~~ fix: add code-level diff guard to Phase E to prevent fix-induced regressions (#274) Mar 31, 2026

kiyotis changed the title ~~fix: add code-level diff guard to Phase E to prevent fix-induced regressions (#274)~~ fix: add code-level guards for Phase E and Phase D reliability (#274) Mar 31, 2026

kiyotis commented Mar 31, 2026

View reviewed changes

kiyotis and others added 14 commits April 3, 2026 14:04

chore: initialize work notes for issue #274 (revised approach)

1c824c7

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: save software engineer expert review for PR #274

8ccdf08

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: update work notes with today's findings and next task

842a0a9

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: update work notes with Phase D false positive investigation re…

9697dea

…sults

kiyotis force-pushed the 274-fix-phase-e-phase-d-reliability branch from 61b568c to c762ec9 Compare April 3, 2026 06:22

kiyotis and others added 2 commits April 7, 2026 11:34

chore: add knowledge generation report for nabledge-1.4 (run 20260331…

f430e02

…T171824) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

kiyotis closed this Apr 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add code-level guards for Phase E and Phase D reliability (#274)#285

fix: add code-level guards for Phase E and Phase D reliability (#274)#285
kiyotis wants to merge 16 commits intomainfrom
274-fix-phase-e-phase-d-reliability

kiyotis commented Mar 31, 2026 •

edited

Loading

Uh oh!

kiyotis left a comment

Uh oh!

kiyotis commented Mar 31, 2026

Uh oh!

kiyotis commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kiyotis commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approach

Tasks

Core Implementation ✅ (Complete)

Root Cause Analysis ✅ (Complete)

Fixes Implemented ✅ (Complete)

Test Status: IN PROGRESS ⏳

User Verification: PENDING ⏳

Investigation Results and Root Cause

Expert Review

Success Criteria Status

Remaining Work Before PR Submission

Blocker #1: E2E Tests (Task #4)

Blocker #2: User Verification (Task #5)

Next Session Resume Steps

Files Changed

Uh oh!

kiyotis left a comment

Choose a reason for hiding this comment

フォローアップ: 4件の改善ポイント

FB-1: section_issue finding でlocationにsection IDが含まれない場合の未対処

FB-2: content_check.md V3ヘッダーに (severity: minor) が残っている

FB-3: fix.md にE-4（隣接テキストの破壊）のプロンプト制約がない

FB-4: test_diff_guard.py にsection_issue findingのdiff guard動作テストが欠落

Uh oh!

kiyotis commented Mar 31, 2026

Uh oh!

kiyotis commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kiyotis commented Mar 31, 2026 •

edited

Loading

FB-1: `section_issue` finding でlocationにsection IDが含まれない場合の未対処

FB-2: `content_check.md` V3ヘッダーに `(severity: minor)` が残っている

FB-3: `fix.md` にE-4（隣接テキストの破壊）のプロンプト制約がない

FB-4: `test_diff_guard.py` に`section_issue` findingのdiff guard動作テストが欠落