fix(classification): guard against empty content in dedup check#600
fix(classification): guard against empty content in dedup check#600norrietaylor wants to merge 1 commit into
Conversation
DeduplicationChecker.check() crashed with IndexError when the most similar entry returned by find_similar had empty content, because "".splitlines() returns [] and indexing [0] is out of range. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Too much diff to scan? Review this PR in Change Stack to start with the highest-impact changes. No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR fixes an IndexError in ChangesEmpty Content Safety in Deduplication
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
Comment |
Root cause
DeduplicationChecker.check()insrc/distillery/classification/dedup.pyextracted the first line of the most similar entry's content via:When the most similar entry returned by
find_similarhas empty content,"".splitlines()returns[], so indexing[0]raisesIndexError: list index out of range.Entryperforms no validation forbidding empty content, so such an entry can legitimately exist in the store and be returned as a near-duplicate match, crashing the entire dedup check (and any caller, e.g. thedistillery_storeMCP tool).Fix
Guard the index access:
When content is empty,
first_linebecomes""and the reasoning string degrades gracefully instead of raising. Surgical, single-spot change; action selection and all other behavior are unchanged.Acceptance
TestEmptyContentMatch::test_empty_content_match_returns_resultintests/test_dedup.pyreproduces the crash (fails withIndexErrorbefore the fix) and passes after.tests/test_dedup.py,tests/test_classification_engine.py,tests/test_classification/,tests/test_mcp_dedup.py,tests/test_store_dedup_response.py(119 passed).ruff checkandmypy --strictclean on touched files.🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes