Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions .docs/plans/issue-394-design-plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -338,3 +338,51 @@ The implementation is broken into 8 incremental steps, each keeping the system d
---

**Do you approve this plan as-is, or would you like to adjust any part?**

---

## Implementation Progress

### Completed Steps ✅

#### Step 0: Fix Empty Pattern Bug (CRITICAL - Added during implementation)
- **Date**: 2026-01-04
- **PR**: #396
- **Commit**: `f4c60fc1`
- **Problem Discovered**: Empty patterns in thesaurus caused spurious text insertions between every character
- **Root Cause**: Aho-Corasick empty patterns match at every position (index 0, 1, 2, ...)
- **Fix Applied**:
- Added `MIN_PATTERN_LENGTH` constant (2) to filter invalid patterns
- Updated both `find_matches()` and `replace_matches()` in `crates/terraphim_automata/src/matcher.rs`
- Added logging for skipped invalid patterns
- Return original text when no valid patterns exist
- **Tests Added**: 6 comprehensive regression tests:
- `test_empty_pattern_does_not_cause_spurious_insertions`
- `test_single_char_pattern_is_filtered`
- `test_whitespace_only_pattern_is_filtered`
- `test_valid_replacement_still_works`
- `test_empty_thesaurus_returns_original`
- `test_find_matches_filters_empty_patterns`
- **Status**: ✅ Merged to `fix/replacement-empty-pattern-bug` branch

### Steps 1-8: Original Plan (Pending)

The original design plan steps for case preservation and URL protection remain pending:
- Step 1: Add `display_value` field to `NormalizedTerm` - ✅ Completed in prior PR
- Step 2: Update `NormalizedTerm::new()` with builder method - ✅ Completed in prior PR
- Step 3: Update `index_inner()` to store original case - ✅ Completed in prior PR
- Step 4: Create `url_protector` module - ✅ Completed in prior PR
- Step 5: Update `replace_matches()` to use display_value - ✅ Completed in prior PR
- Step 6: Integrate URL protection into `replace_matches()` - ✅ Completed in prior PR
- Step 7: Update integration tests - ✅ Completed in prior PR
- Step 8: Verify WASM compatibility - ✅ Verified

### Bug Discovery Notes

The empty pattern bug was discovered during implementation when testing the replacement functionality. The symptom was:
- Input: `npm install express`
- Output: `bun install exmatching_and_iterators_in_rustpmatching_and_iterators_in_rustpmatching...`

Investigation revealed that an empty pattern `""` in the thesaurus was matching at every character boundary, causing the replacement value to be inserted between each character.

This bug is now fixed and documented for future reference.
13 changes: 12 additions & 1 deletion .docs/summary.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

Terraphim AI is a privacy-first, locally-running AI assistant featuring multi-agent systems, knowledge graph intelligence, and secure code execution in Firecracker microVMs. The project combines Rust-based backend services with vanilla JavaScript frontends, emphasizing security, performance, and production-ready architecture.

**Current Status**: v1.0.0 RELEASED - Production-ready with comprehensive multi-language package ecosystem
**Current Status**: v1.4.0 RELEASED - Production-ready with comprehensive multi-language package ecosystem
**Primary Technologies**: Rust (async/tokio), Svelte/Vanilla JS, Firecracker VMs, OpenRouter/Ollama LLMs, NAPI, PyO3
**Test Coverage**: 99+ comprehensive tests with 59 passing in main workspace

Expand Down Expand Up @@ -461,6 +461,17 @@ cd desktop && yarn run check
- Code intelligence and security validation
- Multi-language support operational

### Recent Bug Fixes (2026-01-04) ✅

**Issue #394: Empty Pattern Bug in Text Replacement**
- **Problem**: Empty patterns in thesaurus caused spurious text insertions between every character
- **Symptom**: `npm install express` → `bun install exmatching...pmatching...`
- **Root Cause**: Aho-Corasick empty patterns match at every position (index 0, 1, 2, ...)
- **Fix**: Added `MIN_PATTERN_LENGTH` (2) constant to filter invalid patterns
- **Files Changed**: `crates/terraphim_automata/src/matcher.rs`
- **Tests Added**: 6 comprehensive regression tests
- **PR**: #396

### In Progress/Pending 🔄

1. **TruthForge Deployment**:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ fn create_wrangler_thesaurus() -> Thesaurus {

for (pattern, normalized, id) in wrangler_patterns {
let normalized_term = NormalizedTerm {
display_value: None,
id,
value: NormalizedTermValue::from(normalized),
url: Some("https://developers.cloudflare.com/workers/wrangler/".to_string()),
Expand Down Expand Up @@ -71,6 +72,7 @@ fn create_comprehensive_thesaurus() -> Thesaurus {

for (pattern, normalized, id, url) in patterns {
let normalized_term = NormalizedTerm {
display_value: None,
id,
value: NormalizedTermValue::from(normalized),
url: Some(url.to_string()),
Expand Down
Loading
Loading