🚀 RuvLLM v2.3 - High-Performance LLM Inference for Apple Silicon



Run Large Language Models locally on your Mac with maximum performance
🎯 What's New in v2.3
🧠 RuvLTRA-Medium 3B Model
Purpose-built 3B model optimized for Claude Flow agent orchestration:
| Spec |
Value |
| Parameters |
3.0B |
| Hidden Size |
2560 |
| Layers |
42 |
| Context |
256K tokens |
| Features |
Flash Attention 2, Speculative Decoding, SONA Hooks |
🔌 HuggingFace Hub Integration
Full Hub integration for model distribution:
use ruvllm::hub::{ModelDownloader, ModelUploader, RuvLtraRegistry};
// Download from Hub
let downloader = ModelDownloader::new(DownloadConfig::default());
let path = downloader.download("ruvector/ruvltra-small-q4km", None)?;
// Upload to Hub
let uploader = ModelUploader::new("hf_token");
uploader.upload("./model.gguf", "username/my-model", metadata)?;
🎯 Task-Specific LoRA Adapters
5 pre-trained adapters optimized for Claude Flow agent types:
| Adapter |
Rank |
Alpha |
Targets |
Use Case |
| Coder |
16 |
32.0 |
Q,K,V,O |
Code generation, refactoring |
| Researcher |
8 |
16.0 |
Q,K,V |
Information analysis |
| Security |
16 |
32.0 |
Attention + MLP |
Vulnerability detection |
| Architect |
12 |
24.0 |
Q,V + Gate,Up |
System design |
| Reviewer |
8 |
16.0 |
Q,V |
Code review |
🔄 Adapter Merging & Hot-Swap
Advanced adapter composition strategies:
| Strategy |
Description |
| TIES |
Trim, Elect, Merge for robust composition |
| DARE |
Drop And REscale for sparse merging |
| SLERP |
Spherical interpolation for smooth transitions |
| TaskArithmetic |
Add/subtract task vectors |
// Hot-swap adapters at runtime
let mut manager = HotSwapManager::new();
manager.set_active(coder_adapter);
manager.prepare_standby(security_adapter);
manager.swap()?; // Zero-downtime switch
📊 Claude Dataset Training
2,700+ training examples for Claude Flow optimization:
- Code generation (900 examples)
- Research & analysis (450 examples)
- Security review (450 examples)
- Architecture design (450 examples)
- Code review (450 examples)
📈 v2.0-2.2 Features
🧠 Apple Neural Engine (ANE) Backend - 261-989x Faster Matmul
Native Core ML integration with Apple's Neural Engine:
| Component |
Technology |
Benefit |
| Matrix Multiply |
Core ML → ANE |
261-989x faster vs NEON |
| Attention |
Metal GPU |
Optimized for M4 Pro |
| Activations |
ARM NEON SIMD |
2.2x faster than ANE |
| Auto-Dispatch |
Hybrid Pipeline |
Best of all worlds |
🔄 SONA Self-Learning System
Three-tier learning loops for continuous optimization:
Instant Loop → <1ms per request (MicroLoRA)
Background Loop → ~10s hourly (BaseLoRA + EWC++)
Deep Loop → ~10min weekly (Pattern consolidation)
🤖 RuvLTRA-Small: Qwen 0.5B Optimized for Claude Flow
| Spec |
Value |
| Base Model |
Qwen2.5-0.5B-Instruct |
| Parameters |
494M |
| Hidden Size |
896 |
| Layers |
24 |
| Context |
32K tokens |
🏎️ Performance Benchmarks (M4 Pro)
Inference Speed
| Model |
Quant |
Prefill |
Decode |
Memory |
| RuvLTRA-Small |
Q4K |
3,500 |
135 |
491 MB |
| RuvLTRA-Medium |
Q4K |
2,200 |
85 |
1.8 GB |
| Qwen2.5-7B |
Q4K |
2,800 |
95 |
4.2 GB |
| Llama3-8B |
Q4K |
2,600 |
88 |
4.8 GB |
Kernel Performance
| Kernel |
Single-thread |
Multi-thread (10-core) |
| GEMM 4096×4096 |
1.2 GFLOPS |
12.7 GFLOPS |
| Flash Attention (2048) |
850μs |
320μs |
| HNSW Search (k=10) |
24.0μs |
- |
| SONA Adapt |
<1ms |
- |
📦 Installation
Rust
[dependencies]
ruvllm = { version = "2.3", features = ["inference-metal", "coreml", "parallel"] }
npm
npm install @ruvector/ruvllm
🔗 Links
✅ Implementation Status
🚀 RuvLLM v2.3 - High-Performance LLM Inference for Apple Silicon
Run Large Language Models locally on your Mac with maximum performance
🎯 What's New in v2.3
🧠 RuvLTRA-Medium 3B Model
Purpose-built 3B model optimized for Claude Flow agent orchestration:
🔌 HuggingFace Hub Integration
Full Hub integration for model distribution:
🎯 Task-Specific LoRA Adapters
5 pre-trained adapters optimized for Claude Flow agent types:
🔄 Adapter Merging & Hot-Swap
Advanced adapter composition strategies:
📊 Claude Dataset Training
2,700+ training examples for Claude Flow optimization:
📈 v2.0-2.2 Features
🧠 Apple Neural Engine (ANE) Backend - 261-989x Faster Matmul
Native Core ML integration with Apple's Neural Engine:
🔄 SONA Self-Learning System
Three-tier learning loops for continuous optimization:
🤖 RuvLTRA-Small: Qwen 0.5B Optimized for Claude Flow
🏎️ Performance Benchmarks (M4 Pro)
Inference Speed
Kernel Performance
📦 Installation
Rust
npm
🔗 Links
✅ Implementation Status