Problem
When multiple compiled patterns are used on large inputs, memory usage becomes excessive. On LangArena's LogParser benchmark (13 patterns × 7MB text), coregex uses 237MB vs stdlib 28MB (8.5x more) vs Rust 11MB (21x more).
Root Causes
- BoundedBacktracker visited tables: Each
FindAllStringIndex call allocates a visited table sized len(haystack) * nfa_states. With uint16 generation counters, each entry is 2 bytes. For 7MB × 35 states = 245MB per pattern scan.
- Lazy DFA cache: Each compiled pattern has its own DFA state cache. 13 patterns × independent caches.
- No sharing: Unlike Rust, visited tables and DFA caches are not shared across patterns.
Proposed Solutions
- Visited table reuse via sync.Pool — Pool visited tables by size class, reuse across
FindAll calls
- Visited table cap — Limit visited table to search span, not full haystack (partially done in v0.12.6 span fix, but could be more aggressive)
- Shared DFA cache pool — Allow multiple compiled patterns to share a DFA cache pool
- 1-bit visited with lazy reset — Use 1-bit per entry with segmented lazy clear (avoid O(n) full clear)
Impact
- LogParser: 237MB → ~30-50MB target
- Any workload with multiple patterns on large inputs
References
Problem
When multiple compiled patterns are used on large inputs, memory usage becomes excessive. On LangArena's LogParser benchmark (13 patterns × 7MB text), coregex uses 237MB vs stdlib 28MB (8.5x more) vs Rust 11MB (21x more).
Root Causes
FindAllStringIndexcall allocates a visited table sizedlen(haystack) * nfa_states. With uint16 generation counters, each entry is 2 bytes. For 7MB × 35 states = 245MB per pattern scan.Proposed Solutions
FindAllcallsImpact
References