fix: per-section fix + severity lock + retry limit (#274)#295
Open
fix: per-section fix + severity lock + retry limit (#274)#295
Conversation
Root cause: Phase E outputs entire knowledge file, causing LLM to
re-generate untouched sections with mutations (E-1 through E-5).
Solution: three structural changes that guarantee convergence.
1. Per-section fix (E-1/E-2/E-3/E-4/E-5):
- Only pass target section to LLM, output section text only
- Other sections never enter LLM context -> no mutation possible
- Structural findings (section_issue, no_knowledge_content_invalid)
retain full-output mode
- Hints-only findings update index hints without modifying sections
2. Severity lock (D-1):
- When knowledge content unchanged since prior round, lock severity
to prior round value via code (not prompt)
- When content changed (Phase E modified it), accept new severity
- Uses section hash comparison to detect changes
3. Per-finding retry limit:
- Same (file_id, location, category) persistent for 2+ rounds
-> exclude from Phase E, flag for human review
- Prevents fix/recheck oscillation
Implementation:
- Add _group_findings_by_section() to categorize findings
- Update Phase E fix_one() for per-section prompt strategy
- Add severity lock logic to Phase D check_one()
- Filter Phase E targets in run.py CDE loop
- Add prior findings context to content_check.md prompt
- New test suites: test_section_fix.py, test_severity_lock.py
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
34 tasks
kiyohome
pushed a commit
to kiyohome/nabledge-dev
that referenced
this pull request
Apr 8, 2026
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> https://claude.ai/code/session_019YQtqhsU9CLMG9moyAfX4U
- FB-1: _normalize_location uses regex to extract sN from complex locations - FB-2: _lock_severity normalizes location in prior_map, current key, and prior_hash lookup - FB-3: retry limit excludes file only when ALL findings are persistent - FB-4: _section_hash attached in round 1 for severity lock in round 2 - FB-5: add reporting standards to content_check.md prompt (English) - FB-6: track consecutive critical-zero rounds; exclude confirmed-clean files from Phase D - E2E: update mock to handle per-section fix compound file_ids Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Fixed missing detail in libraries-thread_context--s1: clarified that thread context values can be set not only by ThreadContextHandler, but also by other handlers and business action handlers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Corrected help text: --max-rounds default was shown as 1 but actual default in run.py is 2. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…274) Ensures Phase D only processes target files in the final verification round when fixing specific files with --target flag. Maintains proper scoping of per-section findings during final verification after D/E loops. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Updated test mocks and assertions to reflect per-section fix behavior: - Phase D mock now returns findings for all sections, not just s1 - Phase E runs only in round 1 (retry limit prevents round 2) - Final verification includes target-scoped Phase D assertion - Added override_cache and override_merged parameters to _assert_full_output All 269 tests (227 UT + 35 E2E) pass. CRITICAL assertion in test_fix_target verifies Phase D only runs on target files in final round. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Regenerated cache files reflect updated Phase D mock that returns findings for all sections. Catalog now excludes split-1 entries for merged files (testing-framework-batch-02_RequestUnitTest--s1 removed). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ize location keys for persistent findings (#274) - phase_d_content_check.py: Fixed _lock_severity() to check `prior_hash is not None` before truthiness check, preventing empty string default from `.get("_section_hash", "")` to incorrectly skip severity lock - run.py: Added _norm_loc() helper to normalize location strings using same regex as phase_d (`\bs(\d+)\b`) before creating persistent_findings keys, preventing LLM variations (e.g., "s1" vs "S1") from being treated as different findings Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Expert review conducted for per-section fix verification (bug fixes in phase_d_content_check.py and run.py): - review-by-software-engineer.md: Code quality assessment - review-by-qa-engineer.md: Test coverage and change impact analysis Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When Phase D reports findings with non-existent section locations (e.g., "index and sections" or "s17-s22 missing"), per-section fix cannot add those sections. Accumulate such findings and run a full- knowledge fix to allow LLM to add missing sections, restoring the original full-knowledge fix behavior. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
TestMissingSectionFix verifies that missing-section findings do not trigger full-knowledge fix (which reintroduces E-1 risk). Currently fails — passes once _build_section_add_prompt is implemented. Also defines SECTION_ADD_SCHEMA as the target output contract. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
bd37807 to
4647ff4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #274
Approach
This PR implements a comprehensive solution to systematic Phase D/E convergence issues confirmed across nabledge-1.2, nabledge-1.3, and nabledge-1.4 runs. The root causes fall into three categories: unscoped mutations (E-1 through E-5) where Phase E modifies sections beyond the fix target, severity oscillation (D-1) where Phase D assigns inconsistent ratings to unchanged content, and fix-loop thrashing where persistent findings cycle indefinitely.
The solution implements three structural guarantees working together:
Per-section fix (E-1/E-2/E-3/E-4/E-5): Phase E now processes individual sections in isolation. The LLM receives only the target section text and fix instruction, outputs section text only, and never sees untouched sections. This eliminates mutation risk structurally.
Severity lock (D-1): When knowledge content is unchanged since the prior round (detected via section hash comparison), Phase D code locks severity to the prior round value.
Per-finding retry limit: After 2+ consecutive rounds, persistent findings are excluded from Phase E and flagged for human review.
Missing-section path: When Phase D reports entire sections as missing, a dedicated section-add path (
_build_section_add_prompt+prompts/section_add.md) handles generation without falling back to full-knowledge fix, eliminating E-1 risk for this case.Tasks
TestMissingSectionFix)_build_section_add_promptinphase_e_fix.py(replace fallback with section-add path)prompts/section_add.md(section-add prompt — outputsnew_sectionsonly)TestMissingSectionFix::test_missing_section_does_not_use_full_knowledge_fixpassesExpert Review
AI-driven expert reviews conducted before PR creation (see
.claude/rules/expert-review.md):Success Criteria Check
🤖 Generated with Claude Code