Configs, launchers, benchmarks, and tooling for running Qwen3.5 GGUF models locally with llama.cpp on a 16GB NVIDIA GPU
-
Updated
Mar 28, 2026 - Python
Configs, launchers, benchmarks, and tooling for running Qwen3.5 GGUF models locally with llama.cpp on a 16GB NVIDIA GPU
RTX 5090 & RTX 5060 Docker container with PyTorch + TensorFlow. First fully-tested Blackwell GPU support for ML/AI. CUDA 12.8, Python 3.11, Ubuntu 24.04. Works with RTX 50-series (5090/5080/5070/5060) and RTX 40-series.
Pre-built onnxruntime-gpu 1.24.1 with Blackwell sm_120 CUDA kernels (RTX 5090/5080/5070)
Provide tested tools and configs to run Qwen 3.5 GGUF models efficiently on a single 16GB NVIDIA GPU using llama.cpp locally.
Add a description, image, and links to the rtx-5080 topic page so that developers can more easily learn about it.
To associate your repository with the rtx-5080 topic, visit your repo's landing page and select "manage topics."