A high-performance, lightweight Python library for fuzzy string matching and ranking, implemented in C++ with Pybind11.
- Blazing Fast: C++ core for 2-5x speed improvement over pure Python alternatives.
- Multiple Scorers: Support for Levenshtein, Jaccard, Token Sort, Token Set, QRatio, WRatio, and Partial Ratio.
- Partial Matching: Find the best substring matches using
mode="partial". - Hybrid Scoring: Combine multiple scorers with custom weights for complex matching tasks.
- Pandas & NumPy Integration: Native support for Series and Arrays via a dedicated accessor.
- Batch Processing: Parallelized matching for large datasets using OpenMP.
- Unicode Support: Handles international characters and basic normalization.
- Benchmarking Tools: Built-in utilities to measure and compare performance.
- Thread Safe: Releases the GIL in C++ for optimal multi-threaded performance.
- Type Safe: Includes PEP 561 type stubs for full IDE and MyPy support.
pip install fuzzybunnyimport fuzzybunny
# Basic matching
score = fuzzybunny.levenshtein("kitten", "sitting")
print(f"Similarity: {score:.2f}")
# Ranking candidates
candidates = ["apple", "apricot", "banana", "cherry"]
results = fuzzybunny.rank("app", candidates, top_n=2)
# [('apple', 0.6), ('apricot', 0.42)]Combine different algorithms using custom weights:
results = fuzzybunny.rank(
"apple banana",
["banana apple"],
scorer="hybrid",
weights={"levenshtein": 0.3, "token_sort": 0.7}
)Find the best substring match:
score = fuzzybunny.partial_ratio("apple", "apple pie") # 1.0
# Using rank with partial mode
results = fuzzybunny.rank("apple", ["apple pie", "banana"], mode="partial")
# [('apple pie', 1.0), ('banana', 0.18)]Use the specialized fuzzy accessor:
import pandas as pd
import fuzzybunny
df = pd.DataFrame({"names": ["apple pie", "banana bread", "cherry tart"]})
results = df["names"].fuzzy.match("apple", mode="partial")Compare performance on your specific data:
perf = fuzzybunny.benchmark("query", candidates)
print(f"Levenshtein mean time: {perf['levenshtein']['mean']:.6f}s")MIT
