This document provides detailed information about the features supported by regexr.
Match literal characters:
let re = Regex::new("hello").unwrap();
assert!(re.is_match("hello world"));.- Any character except newline\d- Digit[0-9]\D- Non-digit[^0-9]\w- Word character[a-zA-Z0-9_]\W- Non-word character[^a-zA-Z0-9_]\s- Whitespace[ \t\n\r\f\v]\S- Non-whitespace
let re = Regex::new(r"\d+").unwrap();
assert!(re.is_match("123"));let re = Regex::new(r"[aeiou]").unwrap();
assert!(re.is_match("hello"));
let re = Regex::new(r"[^aeiou]").unwrap(); // Negated
assert!(re.is_match("xyz"));
let re = Regex::new(r"[a-z]").unwrap(); // Range
assert!(re.is_match("hello"));*- Zero or more+- One or more?- Zero or one{n}- Exactly n times{n,}- n or more times{n,m}- Between n and m times
let re = Regex::new(r"\d+").unwrap();
assert_eq!(re.find("abc123def").unwrap().as_str(), "123");
let re = Regex::new(r"\w{3,5}").unwrap();
assert!(re.is_match("hello"));Add ? after any quantifier to make it non-greedy:
*?- Zero or more (non-greedy)+?- One or more (non-greedy)??- Zero or one (non-greedy){n,}?- n or more (non-greedy){n,m}?- Between n and m (non-greedy)
let re = Regex::new(r#"<.*?>"#).unwrap();
assert_eq!(re.find("<a><b>").unwrap().as_str(), "<a>");
let re = Regex::new(r#"<.*>"#).unwrap(); // Greedy version
assert_eq!(re.find("<a><b>").unwrap().as_str(), "<a><b>");^- Start of string$- End of string\b- Word boundary\B- Non-word boundary
let re = Regex::new(r"^\d+").unwrap();
assert!(re.is_match("123abc"));
assert!(!re.is_match("abc123"));
let re = Regex::new(r"\bword\b").unwrap();
assert!(re.is_match("a word here"));
assert!(!re.is_match("awordhere"));let re = Regex::new(r"cat|dog|bird").unwrap();
assert!(re.is_match("I have a dog"));
assert!(re.is_match("She has a cat"));let re = Regex::new(r"(\w+)@(\w+)").unwrap();
let caps = re.captures("user@example").unwrap();
assert_eq!(&caps[1], "user");
assert_eq!(&caps[2], "example");let re = Regex::new(r"(?P<user>\w+)@(?P<domain>\w+)").unwrap();
let caps = re.captures("user@example").unwrap();
assert_eq!(&caps["user"], "user");
assert_eq!(&caps["domain"], "example");let re = Regex::new(r"(?:cat|dog)+").unwrap();
assert!(re.is_match("catdogcat"));Match the same text as a previous capture group:
let re = Regex::new(r"(\w+)\s+\1").unwrap();
assert!(re.is_match("hello hello"));
assert!(!re.is_match("hello world"));Backreferences require the BacktrackingVm or BacktrackingJit engine.
(?=...)- Positive lookahead(?!...)- Negative lookahead
let re = Regex::new(r"\w+(?=@)").unwrap();
assert_eq!(re.find("user@example").unwrap().as_str(), "user");
let re = Regex::new(r"\w+(?!@)").unwrap();
assert_eq!(re.find("user example").unwrap().as_str(), "user");(?<=...)- Positive lookbehind(?<!...)- Negative lookbehind
let re = Regex::new(r"(?<=@)\w+").unwrap();
assert_eq!(re.find("user@example").unwrap().as_str(), "example");
let re = Regex::new(r"(?<!@)\w+").unwrap();
assert_eq!(re.find("user example").unwrap().as_str(), "user");Lookaround assertions require the PikeVm or TaggedNfa engine.
Patterns can match Unicode characters using escape sequences:
let re = Regex::new(r"\w+").unwrap();
assert!(re.is_match("café")); // Matches Unicode word charactersMatch characters by Unicode properties:
// Letter category
let re = Regex::new(r"\p{Letter}+").unwrap();
assert!(re.is_match("hello"));
assert!(re.is_match("привет")); // Cyrillic
// Number category
let re = Regex::new(r"\p{Number}+").unwrap();
assert!(re.is_match("123"));
assert!(re.is_match("①②③")); // Unicode numbers
// Script
let re = Regex::new(r"\p{Greek}+").unwrap();
assert!(re.is_match("αβγ"));
let re = Regex::new(r"\p{Cyrillic}+").unwrap();
assert!(re.is_match("привет"));Letter(L): All lettersNumber(N): All numbersMark(M): Combining marksPunctuation(P): Punctuation charactersSymbol(S): SymbolsSeparator(Z): Separators
Arabic,Armenian,Bengali,Cyrillic,DevanagariGeorgian,Greek,Gujarati,Gurmukhi,HanHangul,Hebrew,Hiragana,Kannada,KatakanaKhmer,Lao,Latin,Malayalam,MyanmarOriya,Sinhala,Tamil,Telugu,Thai,Tibetan
And many more. See Unicode Character Database for the complete list.
Unicode-aware case folding:
let re = Regex::new(r"(?i)hello").unwrap();
assert!(re.is_match("HELLO"));
assert!(re.is_match("Hello"));
assert!(re.is_match("café")); // Unicode case foldingUse RegexBuilder to enable JIT compilation:
use regexr::RegexBuilder;
let re = RegexBuilder::new(r"\w+@\w+\.\w+")
.jit(true)
.build()
.unwrap();
assert!(re.is_match("user@example.com"));JIT compilation is beneficial when:
- Pattern will be matched many times: JIT has higher compilation cost but faster execution
- Performance is critical: JIT generates native code for maximum speed
- Pattern has effective prefilters: Combines SIMD literal search with native DFA execution
- Available on x86-64 (Linux, macOS, Windows) and ARM64 (Linux, macOS)
- Requires
jitfeature flag - Automatically falls back to interpreted engines if compilation fails
When JIT is enabled, the engine is selected based on:
- Backreferences: Uses BacktrackingJit
- Lookaround: Falls back to PikeVm (requires NFA semantics)
- Non-greedy quantifiers: Uses TaggedNfa
- Large Unicode classes: Uses LazyDfa (avoids state explosion)
- Alternations without prefilter: Uses JitShiftOr
- General patterns: Uses DFA JIT with SIMD prefiltering
SIMD acceleration is enabled by default through the simd feature. It provides:
- AVX2-accelerated literal search using Teddy algorithm
- Fast multi-pattern matching for 2-8 literals
- Automatic fallback to scalar implementations when SIMD is unavailable
The SIMD prefilter:
- Extracts required literals from the pattern
- Uses SIMD instructions to scan for candidates
- Verifies candidates with the full regex engine
- Returns matches
Example pattern with effective prefilter:
let re = Regex::new(r"hello\w+").unwrap();
// SIMD scans for "hello", then engine verifies \w+Build without SIMD:
cargo build --no-default-featuresFor patterns with many literal alternatives (common in tokenizers), prefix optimization merges common prefixes into a trie structure:
use regexr::RegexBuilder;
let re = RegexBuilder::new(r"(function|for|finally|from)")
.optimize_prefixes(true)
.build()
.unwrap();Without optimization:
(function|for|finally|from)
Creates separate NFA branches for each alternative.
With optimization:
f(unction|or|inally|rom)
Merges the common prefix f, reducing active NFA threads from O(vocabulary_size) to O(token_length).
Enable prefix optimization when:
- Pattern has many literal alternatives (>10)
- Alternatives share common prefixes
- Used in tokenization or keyword matching
let re = Regex::new(r"\d+").unwrap();
// Check if pattern matches
if re.is_match("abc123") {
println!("Match found");
}
// Find first match
if let Some(m) = re.find("abc123def") {
println!("Found at {}-{}: {}", m.start(), m.end(), m.as_str());
}
// Find all matches
for m in re.find_iter("123 456 789") {
println!("{}", m.as_str());
}let re = Regex::new(r"(\d{4})-(\d{2})-(\d{2})").unwrap();
let caps = re.captures("2024-01-15").unwrap();
println!("Year: {}", &caps[1]);
println!("Month: {}", &caps[2]);
println!("Day: {}", &caps[3]);
// Iterate over all captures in text
for caps in re.captures_iter("2024-01-15 2024-02-20") {
println!("{}", &caps[0]);
}let re = Regex::new(r"\d+").unwrap();
// Replace first match
let result = re.replace("Price: 100 dollars", "200");
assert_eq!(result, "Price: 200 dollars");
// Replace all matches
let result = re.replace_all("Price: 100 and 200 dollars", "X");
assert_eq!(result, "Price: X and X dollars");let re = Regex::new(r"(?P<year>\d{4})-(?P<month>\d{2})").unwrap();
// Get original pattern
println!("Pattern: {}", re.as_str());
// Get capture names
for name in re.capture_names() {
println!("Capture group: {}", name);
}
// Get engine name (for debugging)
println!("Engine: {}", re.engine_name());All regex compilation returns Result<Regex, Error>:
use regexr::Regex;
match Regex::new(r"(unclosed") {
Ok(re) => println!("Compiled successfully"),
Err(e) => eprintln!("Compilation error: {}", e),
}Common errors:
- Unclosed groups
- Invalid escape sequences
- Invalid repetition
- Invalid backreference
- Unsupported features
let re = RegexBuilder::new(pattern)
.jit(true)
.build()
.unwrap();let re = RegexBuilder::new(keyword_pattern)
.optimize_prefixes(true)
.build()
.unwrap();// Better
let re = Regex::new(r"^\d+").unwrap();
// Slower (must search entire string)
let re = Regex::new(r"\d+").unwrap();// Better
let re = Regex::new(r"[abc]").unwrap();
// Slower
let re = Regex::new(r"(a|b|c)").unwrap();// Better (non-capturing group)
let re = Regex::new(r"(?:cat|dog)+").unwrap();
// Slower (capturing group not needed)
let re = Regex::new(r"(cat|dog)+").unwrap();Use engine_name() to verify the selected engine:
let re = Regex::new(pattern).unwrap();
println!("Using engine: {}", re.engine_name());Ensure the engine matches your expectations for the pattern type.
- SIMD: Only available on x86-64 with AVX2 support
- JIT: Not available on WASM — falls back to interpreted engines automatically
- Multiline mode: Currently
.never matches newline - Backreferences: Cannot be combined with JIT DFA (uses BacktrackingJit instead)
- Variable-width lookbehind: Limited support (fixed-width lookbehind only)
| Platform | JIT Support | SIMD Support |
|---|---|---|
| Linux x86-64 | ✓ | ✓ (AVX2) |
| Linux ARM64 | ✓ | ✗ |
| macOS x86-64 | ✓ | ✓ (AVX2) |
| macOS ARM64 (Apple Silicon) | ✓ | ✗ |
| Windows x86-64 | ✓ | ✓ (AVX2) |
| WASM (wasm32) | ✗ | ✗ |
| Other | ✗ | ✗ |
| Feature | PikeVm | ShiftOr | LazyDFA | JIT DFA | BacktrackingJit |
|---|---|---|---|---|---|
| Backreferences | ✓ | ✗ | ✗ | ✗ | ✓ |
| Lookaround | ✓ | ✗ | ✗ | ✗ | ✗ |
| Non-greedy | ✓ | ✗ | ✗ | ✗ | ✗ |
| Word boundaries | ✓ | ✗ | ✓ | ✓ | ✓ |
| Anchors | ✓ | ✗ | ✓ | ✓ | ✓ |
| Captures | ✓ | ✓* | ✓* | ✓ | ✓ |
*ShiftOr and LazyDFA fall back to PikeVm for capture extraction.