8-bit Adafactor Optimizer with Fused CUDA Kernels
machine-learning deep-learning cuda pytorch transformer quantization memory-efficient fine-tuning low-rank diffusion-models optimizers stable-diffusion llm-training 8bit-optimizer adafactor memory-efficient-training optimizer-research apollo-optimizer
-
Updated
Jun 16, 2026 - Python