Fast MLX port of ZeroEntropy zerank-2 cross-encoder reranker. 10x faster than PyTorch MPS on Apple Silicon. bf16, validated.
macos metal transformers semantic-search mlx fast-inference reranker rag huggingface apple-silicon cross-encoder llm-inference retrieval-augmented-generation llm-optimization qwen3 zeroentropy zerank m4-max
-
Updated
Apr 9, 2026 - Python