-
Notifications
You must be signed in to change notification settings - Fork 3
fix: filter empty/short patterns to prevent spurious replacement insertions #412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…arch/replace matches patterns_with_rust_types shorter than 2 characters are now filtered out in both `search_matches` and `replace_matches` to prevent spurious matches. The bug manifested when empty patterns_with_rust_types existed in the thesaurus, caguide_to_using_tensorflow_in_rust text like "bun install express" to become "bun install exmatching_and_iterators_in_rustpmatching_and_iterators_in_rust..." with insertions between every character. Changes: - Add MIN_PATTERN_LENGTH (2) constant for filtering - Filter patterns_with_rust_types in both search_matches and replace_matches - Log warnings for skipped invalid patterns_with_rust_types - Return original text when no valid patterns_with_rust_types exist - Add comprehensive_rust machine_learning_in_rust_logistic_regression code_reviews Fixes #394 🤖 Generated with [Terraphim AI](https://terraphim.ai) Co-securityored-By: Terraphim AI <noreply@terraphim.ai>
Updates: - .code_review/plans/issue-394-design-plan.md: Added how_to_implement_a_naive_bayes_classifier_with_rustation progress section documenting the empty pattern bug fix (Step 0) - .code_review/summary.md: Added recent bug fixes section with issue #394 details - code_review/src/services/terraphim-thesaurus.md: Added pattern security documentation explaining MIN_PATTERN_LENGTH filtering - code_review/src/history/@making_python_100x_faster_with_less_than_100_lines_of_rustons-learned.md: Added comprehensive_rust making_python_100x_faster_with_less_than_100_lines_of_rustons learned from the Aho-Corasick empty pattern bug discovery and fix Key documentation points: - Root cause: empty patterns_with_rust_types match at every character position - Fix: MIN_PATTERN_LENGTH (2) constant filters invalid patterns_with_rust_types - 6 machine_learning_in_rust_logistic_regression code_reviews added to prevent recurrence - Pattern for future similar issues documented Related to PR #396, Fixes #394 🤖 Generated with [Terraphim AI](https://terraphim.ai) Co-securityored-By: Terraphim AI <noreply@terraphim.ai>
GitHub Runner Execution ResultsPR: #412 - fix: filter empty/short patterns to prevent spurious replacement insertions ❌ ci-optimized.ymlExecution failed: VM allocation failed: Allocation failed with status: 429 Too Many Requests ❌ test-on-pr.ymlExecution failed: VM allocation failed: Allocation failed with status: 429 Too Many Requests ❌ python-bindings.ymlExecution failed: VM allocation failed: Allocation failed with status: 429 Too Many Requests ❌ vm-execution-tests.ymlExecution failed: VM allocation failed: Allocation failed with status: 429 Too Many Requests ❌ ci-native.ymlExecution failed: VM allocation failed: Allocation failed with status: 429 Too Many Requests ✅ Powered by terraphim-github-runner |
Code Review: Fix Empty Pattern Bug in Text ReplacementSummaryThis PR successfully addresses a critical bug (Issue #394) where empty patterns in the thesaurus caused spurious text insertions between every character. The fix is well-implemented with comprehensive testing and excellent documentation. ✅ Strengths1. Root Cause Analysis
2. Implementation Quality
3. Test Coverage ⭐Excellent regression test suite covering:
4. Documentation
🔍 Issues & RecommendationsIssue 1: Duplicate Constants (Minor)Location: You have two separate constants with the same value: const MIN_FIND_PATTERN_LENGTH: usize = 2; // Line 16
const MIN_PATTERN_LENGTH: usize = 2; // Line 95Recommendation: Use a single module-level constant for consistency: /// Minimum pattern length to prevent spurious matches.
/// Patterns shorter than this are filtered out to avoid:
/// - Empty patterns matching at every character position
/// - Single-character patterns causing excessive matches
const MIN_PATTERN_LENGTH: usize = 2;Then use Issue 2: Pattern Length Check LogicLocation: Lines 28, 121 in Current condition: if pattern_str.trim().is_empty() || pattern_str.len() < MIN_PATTERN_LENGTHPotential edge case: A pattern like // Check trimmed length for consistency
let trimmed = pattern_str.trim();
if trimmed.is_empty() || trimmed.len() < MIN_PATTERN_LENGTHThis ensures you're checking the meaningful content length, not just raw string length. Issue 3: Pattern Validation DRY PrincipleLocation: Duplicated validation logic in two functions Consider extracting validation into a helper function: fn is_valid_pattern(pattern: &str, min_length: usize) -> bool {
let trimmed = pattern.trim();
!trimmed.is_empty() && trimmed.len() >= min_length
}This would make the code more maintainable and ensure consistency. 🔒 Security Considerations✅ No security concerns. The fix actually improves security by preventing potential DoS through empty pattern matches causing excessive text processing. ⚡ Performance ConsiderationsCurrent approach: ✅ Good
Potential optimization (not urgent):
📊 Test Coverage✅ Excellent - 6 new regression tests covering all edge cases
🎯 Code Quality Score: 9/10Breakdown:
✅ Approval RecommendationAPPROVE - This is production-ready code. The issues mentioned above are minor improvements that can be addressed in a follow-up PR if desired. Required Before Merge:
Optional Follow-up Tasks:
Great work on this critical bug fix! 🎉 The comprehensive testing and documentation set an excellent example for future bug fixes. 🤖 Generated with Terraphim AI |
Documentation PreviewYour documentation changes have been deployed to: This preview will be available until the PR is closed. |
The display_value field was missing from test struct initializations, causing clippy to fail. Added display_value: None to all NormalizedTerm creations in tests.
GitHub Runner Execution ResultsPR: #412 - fix: filter empty/short patterns to prevent spurious replacement insertions ❌ python-bindings.ymlExecution failed: VM allocation failed: Allocation failed with status: 429 Too Many Requests ❌ ci.ymlExecution failed: VM allocation failed: Allocation failed with status: 429 Too Many Requests ❌ test-on-pr.ymlExecution failed: VM allocation failed: Allocation failed with status: 429 Too Many Requests ❌ vm-execution-tests.ymlExecution failed: VM allocation failed: Allocation failed with status: 429 Too Many Requests ❌ ci-native.ymlExecution failed: VM allocation failed: Allocation failed with status: 429 Too Many Requests ✅ Powered by terraphim-github-runner |
Documentation PreviewYour documentation changes have been deployed to: This preview will be available until the PR is closed. |
Summary
matching_and_iterators_in_rustwas inserted between every character during text replacementsynonyms:: a,b,c,) caused AhoCorasick to match at every character positionMIN_PATTERN_LENGTH(2) constant to filter out problematic patternsChanges
crates/terraphim_automata/src/matcher.rs:find_matches()andreplace_matches()debug_assertfor pattern/replacement vector alignmentDocumentation:
Test Plan
echo "npm install express" | terraphim-agent replace→bun install expressBefore/After
Before:
After:
Fixes #394
🤖 Generated with Claude Code