-
Notifications
You must be signed in to change notification settings - Fork 5
perf: SIMD acceleration for sparse char_class matches #64
Copy link
Copy link
Open
Labels
arch: amd64x86-64 specific (AVX2, SSSE3)x86-64 specific (AVX2, SSSE3)area: simdLow-level SIMD assembly (AVX2, SSSE3)Low-level SIMD assembly (AVX2, SSSE3)effort: 8Very large, ~1 weekVery large, ~1 weekpriority: lowBacklog, nice to haveBacklog, nice to havestrategy: char-classCharClassSearcher optimizationCharClassSearcher optimizationtype: enhancementImprove existing featureImprove existing feature
Metadata
Metadata
Assignees
Labels
arch: amd64x86-64 specific (AVX2, SSSE3)x86-64 specific (AVX2, SSSE3)area: simdLow-level SIMD assembly (AVX2, SSSE3)Low-level SIMD assembly (AVX2, SSSE3)effort: 8Very large, ~1 weekVery large, ~1 weekpriority: lowBacklog, nice to haveBacklog, nice to havestrategy: char-classCharClassSearcher optimizationCharClassSearcher optimizationtype: enhancementImprove existing featureImprove existing feature
Summary
Add SIMD-accelerated search for CharClassSearcher when matches are sparse (>64 non-matching bytes between matches).
Background
In PR #63, we evaluated SIMD for char_class patterns but found it slower for the common case where matches are frequent (30-50% of positions). However, SIMD can provide significant speedup when:
[0-9]+in mostly alphabetic textProposed Implementation
Hybrid Approach
SIMD Functions (already implemented)
simd.MemchrWord(haystack)- find first[A-Za-z0-9_]simd.MemchrNotWord(haystack)- find first non-word charsimd.MemchrInTable(haystack, table)- generic 256-byte table lookupAcceptance Criteria
MemchrInTablefor sparse searchReferences
simd/memchr_class_amd64.s: AVX2 implementation readyPriority
Low - Current streaming implementation is already 4.9x faster than stdlib. This is an optimization for edge cases.