Training Framework for the Sovereign AI Stack
A pure Rust training framework providing autograd, LoRA/QLoRA fine-tuning, quantization (Int4/Int8), model merging, knowledge distillation, and Compiler-in-the-Loop (CITL) training. Built on trueno for SIMD-accelerated compute and aprender for ML algorithms.
Features | Installation | Usage | Architecture | Quality | Sovereign Stack | Documentation | License
- What is entrenar?
- Installation
- Usage
- Features
- Architecture
- Quality
- Sovereign AI Stack
- Documentation
- Contributing
- License
Entrenar (Spanish: "to train") is a production-grade neural network training library in pure Rust. It provides everything needed to train, fine-tune, quantize, merge, and distill models -- with no Python dependency.
Core capabilities:
- Autograd Engine -- Tape-based reverse-mode automatic differentiation
- Optimizers -- SGD, Adam, AdamW with cosine scheduling and gradient clipping
- LoRA / QLoRA -- Parameter-efficient fine-tuning with 4-bit quantized base weights
- Quantization -- QAT, PTQ, GGUF-compatible Q4_0/Q8_0, NF4 training
- Model Merging -- TIES, DARE, SLERP algorithms
- Knowledge Distillation -- Multi-teacher, progressive layer-wise
- CITL -- Compiler-in-the-Loop training for transpiler optimization
- GPU Training -- WGPU backend (AMD/Intel/cross-platform), CUDA/cuBLAS (NVIDIA)
- Monitoring -- Real-time metrics, drift detection, Andon alerts
Part of the PAIML Sovereign AI Stack.
Add to your Cargo.toml:
[dependencies]
entrenar = "0.7"cargo install entrenargit clone https://github.com/paiml/entrenar
cd entrenar
cargo install --path .use entrenar::train::{Trainer, TrainConfig, MSELoss, EarlyStopping};
use entrenar::optim::Adam;
use entrenar::Tensor;
let params = vec![Tensor::zeros(784 * 128, true)];
let optimizer = Adam::new(0.001, 0.9, 0.999, 1e-8);
let mut trainer = Trainer::new(params, Box::new(optimizer), TrainConfig::default());
trainer.set_loss(Box::new(MSELoss));
trainer.add_callback(EarlyStopping::new(5, 0.001));
let result = trainer.train(100, || batches.clone(), |x| model.forward(x));
println!("Final loss: {:.4}", result.final_loss);use entrenar::autograd::{matmul, softmax, layer_norm, attention};
let y = matmul(&x, &w);
let s = softmax(&logits);
let n = layer_norm(&x, &gamma, &beta);
let a = attention(&q, &k, &v);use entrenar::lora::{LoRALayer, QLoRALayer};
// Standard LoRA
let lora = LoRALayer::new(4096, 4096, 16, 32.0);
// QLoRA: 4-bit base + FP16 adapters (7B model: 28GB -> 3.5GB)
let qlora = QLoRALayer::new(base_weights, 16, 32.0);use entrenar::quant::{FakeQuantize, PTQCalibrator, GGUFQuantizer};
let fq = FakeQuantize::new(8, true); // QAT with STE
let calibrator = PTQCalibrator::percentile(0.999); // Post-training
let quantizer = GGUFQuantizer::q4_0(); // GGUF exportuse entrenar::merge::{TiesMerge, DareMerge, SlerpMerge};
let merged = TiesMerge::new(0.2).merge(&models, &weights);
let merged = DareMerge::new(0.9).merge(&base, &finetuned);
let merged = SlerpMerge::new().merge(&a, &b, 0.5);# train.yaml
model:
path: base-model.gguf
data:
train: train.parquet
batch_size: 8
optimizer:
name: adamw
lr: 0.0001
lora:
rank: 64
alpha: 16
training:
epochs: 10
grad_clip: 1.0entrenar train train.yamlentrenar train config.yaml --epochs 10
entrenar quantize model.safetensors --bits 4 --output model_q4.json
entrenar merge model1.safetensors model2.safetensors --method ties
entrenar bench config.yaml --warmup 5 --iterations 100
entrenar inspect model.safetensors -v
entrenar audit predictions.parquet --type bias --threshold 0.8
entrenar monitor data.parquet --threshold 0.2Tape-based reverse-mode automatic differentiation with verified gradients. Supports matmul, softmax, layer normalization, and scaled dot-product attention. All gradients validated against finite-difference reference implementations.
Parameter-efficient fine-tuning with up to 99.75% parameter reduction. QLoRA combines 4-bit NF4 quantized base weights with FP16 low-rank adapters, reducing 7B model memory from 28GB to 3.5GB. PEFT-compatible adapter export for interoperability with HuggingFace tooling.
Three quantization strategies: Quantization-Aware Training (QAT) with straight-through estimator, Post-Training Quantization (PTQ) with percentile calibration, and GGUF-compatible Q4_0/Q8_0 export for llama.cpp interoperability. NF4 training with cuBLAS backward pass support.
Three model merging algorithms for combining fine-tuned checkpoints: TIES (Trim, Elect Sign, Merge) for multi-model consolidation, DARE (Dropout And Rescale) for parameter-efficient merging, and SLERP (Spherical Linear Interpolation) for smooth two-model blending.
Temperature-scaled KD loss with configurable alpha weighting between hard and soft targets. Multi-teacher ensemble distillation with weighted aggregation. Progressive layer-wise distillation for large-to-small model transfer.
Training loop that incorporates compiler feedback for transpiler optimization. Uses RAG-based fix suggestions via trueno-rag to guide training toward compilable outputs. Designed for the depyler/bashrs/decy transpilation stack.
WGPU backend for cross-platform GPU training (AMD, Intel, Apple Silicon). NVIDIA CUDA/cuBLAS backend for dedicated GPU acceleration. NVML integration for real-time GPU monitoring. VRAM ledger with file-based locking for multi-process coordination.
Toyota Way-inspired quality monitoring with real-time metrics collection, drift detection (z-score based), and Andon alert system for automatic anomaly notification. NaN/Inf detection, gradient explosion guards, and loss divergence tracking.
| Flag | Purpose |
|---|---|
gpu |
GPU-accelerated training via wgpu |
cuda |
NVIDIA CUDA/cuBLAS training |
citl |
Compiler-in-the-Loop with trueno-rag |
monitor |
Training monitoring with trueno-db persistence |
server |
REST/HTTP API server via axum |
parquet |
Parquet batch loading via alimentar |
hub |
HuggingFace Hub model fetching |
wasm |
Browser-compatible WASM build |
tracing |
Renacer distributed tracing integration |
nvml |
Real GPU monitoring via NVIDIA NVML |
entrenar/
autograd/ Tape-based automatic differentiation
optim/ SGD, Adam, AdamW, schedulers
lora/ LoRA, QLoRA fine-tuning
quant/ QAT, PTQ, GGUF quantization
merge/ TIES, DARE, SLERP merging
distill/ Knowledge distillation
finetune/ ClassifyPipeline, ClassifyTrainer, evaluation
eval/ Classification metrics, drift detection, Andon
train/ Trainer, callbacks, metrics, WGPU transformer trainer
monitor/ Real-time monitoring, Andon alerts
config/ Declarative YAML configuration
io/ Model persistence (SafeTensors, APR)
| Metric | Value |
|---|---|
| Tests | 7,527+ passing |
| Coverage | 96% |
| TDG Score | A+ (96.8/100) |
| Critical Defects | 0 |
| Property Tests | 200K+ iterations |
| Gradient Checking | Finite-difference validated |
| Mutation Testing | >80% kill rate |
| MSRV | 1.87 |
| Crate | Purpose | Version |
|---|---|---|
| trueno | SIMD/GPU compute primitives | 0.16.x |
| aprender | ML algorithms, APR v2 format | 0.27.x |
| entrenar | Training and optimization | 0.7.x |
| realizar | Inference engine (APR/GGUF/SafeTensors) | 0.8.x |
| repartir | Distributed compute (CPU/GPU/Remote) | 2.0.x |
| whisper-apr | Pure Rust Whisper ASR | 0.2.x |
| simular | Simulation engine | 0.3.x |
| batuta | Stack orchestration | 0.7.x |
- API Reference -- Generated from source
- Book -- Comprehensive guide with examples
- Examples -- Runnable training, merging, and monitoring examples
- Fork the repository
- Create your changes on the
masterbranch - Run quality gates:
make lint && make test - Run coverage:
make coverage - Submit a pull request
See entrenar-cookbook for examples and recipes.
MIT