The Quantization for Ryzen AI usage guide documents two quantizer paths:
RyzenAIOnnxQuantizer — Vitis AI based, fully documented, INT8 PTQ for timm CNN models.
BrevitasQuantizer — wraps Xilinx Brevitas, recommended for "other" model types beyond timm CNNs. The doc currently says "Coming soon." (source)
I'm building a 1-bit / ternary inference engine for AMD Strix Halo. The GPU lane (ROCm/Vulkan llama.cpp) runs 1-bit (IQ1_S, TQ2_0) cleanly today. The NPU lane is currently UINT4-AWQ via FastFlowLM — no public ternary path exists for XDNA 2.
Brevitas is the natural home for sub-INT8 BitNet-style quantization targeting Ryzen AI, since it already supports N-bit (down to binary) in upstream. The integration through BrevitasQuantizer is the missing link.
Asks
- Is there a target window for promoting the
BrevitasQuantizer from "Coming soon" to documented + tested? Even a rough quarter would help downstream projects plan.
- Are sub-INT8 / 1-bit weights on the agenda for the Ryzen AI Brevitas integration, or will the initial release focus on INT8 parity with the Vitis AI path?
- Is there a working code branch / draft PR / staging example we could follow or pre-test against? Happy to file bugs against pre-release code.
The "ResNet only" perception of AMD's official Linux NPU support is real for now, and the Brevitas path is the visible bridge to richer quantization. Any clarity on its trajectory would be appreciated.
The Quantization for Ryzen AI usage guide documents two quantizer paths:
RyzenAIOnnxQuantizer— Vitis AI based, fully documented, INT8 PTQ for timm CNN models.BrevitasQuantizer— wraps Xilinx Brevitas, recommended for "other" model types beyond timm CNNs. The doc currently says "Coming soon." (source)I'm building a 1-bit / ternary inference engine for AMD Strix Halo. The GPU lane (ROCm/Vulkan llama.cpp) runs 1-bit (IQ1_S, TQ2_0) cleanly today. The NPU lane is currently UINT4-AWQ via FastFlowLM — no public ternary path exists for XDNA 2.
Brevitas is the natural home for sub-INT8 BitNet-style quantization targeting Ryzen AI, since it already supports N-bit (down to binary) in upstream. The integration through
BrevitasQuantizeris the missing link.Asks
BrevitasQuantizerfrom "Coming soon" to documented + tested? Even a rough quarter would help downstream projects plan.The "ResNet only" perception of AMD's official Linux NPU support is real for now, and the Brevitas path is the visible bridge to richer quantization. Any clarity on its trajectory would be appreciated.