🚀 RuvLLM v2.3 - RuvLTRA-Medium 3B + Task-Specific LoRA + HuggingFace Hub

# 🚀 RuvLLM v2.3 - High-Performance LLM Inference for Apple Silicon

[![Crates.io](https://img.shields.io/crates/v/ruvllm.svg)](https://crates.io/crates/ruvllm)
[![npm](https://img.shields.io/npm/v/@ruvector/ruvllm.svg)](https://www.npmjs.com/package/@ruvector/ruvllm)
[![License](https://img.shields.io/badge/license-MIT%2FApache--2.0-green)](LICENSE)

<p align="center">
  <strong>Run Large Language Models locally on your Mac with maximum performance</strong>
</p>

---

## 🎯 What's New in v2.3

### 🧠 RuvLTRA-Medium 3B Model

Purpose-built 3B model optimized for Claude Flow agent orchestration:

| Spec | Value |
|------|-------|
| **Parameters** | 3.0B |
| **Hidden Size** | 2560 |
| **Layers** | 42 |
| **Context** | 256K tokens |
| **Features** | Flash Attention 2, Speculative Decoding, SONA Hooks |

### 🔌 HuggingFace Hub Integration

Full Hub integration for model distribution:

```rust
use ruvllm::hub::{ModelDownloader, ModelUploader, RuvLtraRegistry};

// Download from Hub
let downloader = ModelDownloader::new(DownloadConfig::default());
let path = downloader.download("ruvector/ruvltra-small-q4km", None)?;

// Upload to Hub
let uploader = ModelUploader::new("hf_token");
uploader.upload("./model.gguf", "username/my-model", metadata)?;
```

### 🎯 Task-Specific LoRA Adapters

5 pre-trained adapters optimized for Claude Flow agent types:

| Adapter | Rank | Alpha | Targets | Use Case |
|---------|------|-------|---------|----------|
| **Coder** | 16 | 32.0 | Q,K,V,O | Code generation, refactoring |
| **Researcher** | 8 | 16.0 | Q,K,V | Information analysis |
| **Security** | 16 | 32.0 | Attention + MLP | Vulnerability detection |
| **Architect** | 12 | 24.0 | Q,V + Gate,Up | System design |
| **Reviewer** | 8 | 16.0 | Q,V | Code review |

### 🔄 Adapter Merging & Hot-Swap

Advanced adapter composition strategies:

| Strategy | Description |
|----------|-------------|
| **TIES** | Trim, Elect, Merge for robust composition |
| **DARE** | Drop And REscale for sparse merging |
| **SLERP** | Spherical interpolation for smooth transitions |
| **TaskArithmetic** | Add/subtract task vectors |

```rust
// Hot-swap adapters at runtime
let mut manager = HotSwapManager::new();
manager.set_active(coder_adapter);
manager.prepare_standby(security_adapter);
manager.swap()?; // Zero-downtime switch
```

### 📊 Claude Dataset Training

2,700+ training examples for Claude Flow optimization:

- Code generation (900 examples)
- Research & analysis (450 examples)
- Security review (450 examples)
- Architecture design (450 examples)
- Code review (450 examples)

---

## 📈 v2.0-2.2 Features

### 🧠 Apple Neural Engine (ANE) Backend - **261-989x Faster Matmul**

Native **Core ML integration** with Apple's Neural Engine:

| Component | Technology | Benefit |
|-----------|------------|---------|
| Matrix Multiply | Core ML → ANE | 261-989x faster vs NEON |
| Attention | Metal GPU | Optimized for M4 Pro |
| Activations | ARM NEON SIMD | 2.2x faster than ANE |
| Auto-Dispatch | Hybrid Pipeline | Best of all worlds |

### 🔄 SONA Self-Learning System

Three-tier learning loops for continuous optimization:

```
Instant Loop   → <1ms per request (MicroLoRA)
Background Loop → ~10s hourly (BaseLoRA + EWC++)
Deep Loop      → ~10min weekly (Pattern consolidation)
```

### 🤖 RuvLTRA-Small: Qwen 0.5B Optimized for Claude Flow

| Spec | Value |
|------|-------|
| **Base Model** | Qwen2.5-0.5B-Instruct |
| **Parameters** | 494M |
| **Hidden Size** | 896 |
| **Layers** | 24 |
| **Context** | 32K tokens |

---

## 🏎️ Performance Benchmarks (M4 Pro)

### Inference Speed

| Model | Quant | Prefill | Decode | Memory |
|-------|-------|---------|--------|--------|
| RuvLTRA-Small | Q4K | 3,500 | 135 | 491 MB |
| RuvLTRA-Medium | Q4K | 2,200 | 85 | 1.8 GB |
| Qwen2.5-7B | Q4K | 2,800 | 95 | 4.2 GB |
| Llama3-8B | Q4K | 2,600 | 88 | 4.8 GB |

### Kernel Performance

| Kernel | Single-thread | Multi-thread (10-core) |
|--------|---------------|------------------------|
| GEMM 4096×4096 | 1.2 GFLOPS | 12.7 GFLOPS |
| Flash Attention (2048) | 850μs | 320μs |
| HNSW Search (k=10) | 24.0μs | - |
| SONA Adapt | <1ms | - |

---

## 📦 Installation

### Rust

```toml
[dependencies]
ruvllm = { version = "2.3", features = ["inference-metal", "coreml", "parallel"] }
```

### npm

```bash
npm install @ruvector/ruvllm
```

---

## 🔗 Links

- [Crate README](https://github.com/ruvnet/ruvector/tree/main/crates/ruvllm)
- [npm Package](https://www.npmjs.com/package/@ruvector/ruvllm)
- [API Docs](https://docs.rs/ruvllm)
- [Adapter Documentation](https://github.com/ruvnet/ruvector/tree/main/docs/task_specific_lora_adapters.md)

---

## ✅ Implementation Status

- [x] RuvLTRA-Small (0.5B) model
- [x] RuvLTRA-Medium (3B) model
- [x] Apple Neural Engine backend
- [x] SONA self-learning system
- [x] Flash Attention 2
- [x] Paged KV Cache
- [x] Speculative Decoding
- [x] HuggingFace Hub integration
- [x] Task-specific LoRA adapters
- [x] Adapter merging (TIES, DARE, SLERP)
- [x] Hot-swap adapter management
- [x] Claude dataset training system
- [x] HNSW semantic routing (150x faster)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚀 RuvLLM v2.3 - RuvLTRA-Medium 3B + Task-Specific LoRA + HuggingFace Hub #118

🚀 RuvLLM v2.3 - High-Performance LLM Inference for Apple Silicon

🎯 What's New in v2.3

🧠 RuvLTRA-Medium 3B Model

🔌 HuggingFace Hub Integration

🎯 Task-Specific LoRA Adapters

🔄 Adapter Merging & Hot-Swap

📊 Claude Dataset Training

📈 v2.0-2.2 Features

🧠 Apple Neural Engine (ANE) Backend - 261-989x Faster Matmul

🔄 SONA Self-Learning System

🤖 RuvLTRA-Small: Qwen 0.5B Optimized for Claude Flow

🏎️ Performance Benchmarks (M4 Pro)

Inference Speed

Kernel Performance

📦 Installation

Rust

npm

🔗 Links

✅ Implementation Status

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Spec	Value
Parameters	3.0B
Hidden Size	2560
Layers	42
Context	256K tokens
Features	Flash Attention 2, Speculative Decoding, SONA Hooks

Adapter	Rank	Alpha	Targets	Use Case
Coder	16	32.0	Q,K,V,O	Code generation, refactoring
Researcher	8	16.0	Q,K,V	Information analysis
Security	16	32.0	Attention + MLP	Vulnerability detection
Architect	12	24.0	Q,V + Gate,Up	System design
Reviewer	8	16.0	Q,V	Code review

Strategy	Description
TIES	Trim, Elect, Merge for robust composition
DARE	Drop And REscale for sparse merging
SLERP	Spherical interpolation for smooth transitions
TaskArithmetic	Add/subtract task vectors

Component	Technology	Benefit
Matrix Multiply	Core ML → ANE	261-989x faster vs NEON
Attention	Metal GPU	Optimized for M4 Pro
Activations	ARM NEON SIMD	2.2x faster than ANE
Auto-Dispatch	Hybrid Pipeline	Best of all worlds

Spec	Value
Base Model	Qwen2.5-0.5B-Instruct
Parameters	494M
Hidden Size	896
Layers	24
Context	32K tokens

Model	Quant	Prefill	Decode	Memory
RuvLTRA-Small	Q4K	3,500	135	491 MB
RuvLTRA-Medium	Q4K	2,200	85	1.8 GB
Qwen2.5-7B	Q4K	2,800	95	4.2 GB
Llama3-8B	Q4K	2,600	88	4.8 GB

Kernel	Single-thread	Multi-thread (10-core)
GEMM 4096×4096	1.2 GFLOPS	12.7 GFLOPS
Flash Attention (2048)	850μs	320μs
HNSW Search (k=10)	24.0μs	-
SONA Adapt	<1ms	-

🚀 RuvLLM v2.3 - RuvLTRA-Medium 3B + Task-Specific LoRA + HuggingFace Hub #118

Description

🚀 RuvLLM v2.3 - High-Performance LLM Inference for Apple Silicon

🎯 What's New in v2.3

🧠 RuvLTRA-Medium 3B Model

🔌 HuggingFace Hub Integration

🎯 Task-Specific LoRA Adapters

🔄 Adapter Merging & Hot-Swap

📊 Claude Dataset Training

📈 v2.0-2.2 Features

🧠 Apple Neural Engine (ANE) Backend - 261-989x Faster Matmul

🔄 SONA Self-Learning System

🤖 RuvLTRA-Small: Qwen 0.5B Optimized for Claude Flow

🏎️ Performance Benchmarks (M4 Pro)

Inference Speed

Kernel Performance

📦 Installation

Rust

npm

🔗 Links

✅ Implementation Status

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions