Skip to content

Timeline for BrevitasQuantizer Ryzen AI integration (sub-INT8 / 1-bit support)? #178

@bong-water-water-bong

Description

@bong-water-water-bong

The Quantization for Ryzen AI usage guide documents two quantizer paths:

  1. RyzenAIOnnxQuantizer — Vitis AI based, fully documented, INT8 PTQ for timm CNN models.
  2. BrevitasQuantizer — wraps Xilinx Brevitas, recommended for "other" model types beyond timm CNNs. The doc currently says "Coming soon." (source)

I'm building a 1-bit / ternary inference engine for AMD Strix Halo. The GPU lane (ROCm/Vulkan llama.cpp) runs 1-bit (IQ1_S, TQ2_0) cleanly today. The NPU lane is currently UINT4-AWQ via FastFlowLM — no public ternary path exists for XDNA 2.

Brevitas is the natural home for sub-INT8 BitNet-style quantization targeting Ryzen AI, since it already supports N-bit (down to binary) in upstream. The integration through BrevitasQuantizer is the missing link.

Asks

  1. Is there a target window for promoting the BrevitasQuantizer from "Coming soon" to documented + tested? Even a rough quarter would help downstream projects plan.
  2. Are sub-INT8 / 1-bit weights on the agenda for the Ryzen AI Brevitas integration, or will the initial release focus on INT8 parity with the Vitis AI path?
  3. Is there a working code branch / draft PR / staging example we could follow or pre-test against? Happy to file bugs against pre-release code.

The "ResNet only" perception of AMD's official Linux NPU support is real for now, and the Brevitas path is the visible bridge to richer quantization. Any clarity on its trajectory would be appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions