Roadmap

# BitsAndBytes

* https://github.com/agrocylo/bitsandbytes-rocm
* ~~https://github.com/are-we-gfx1100-yet/bitsandbytes-rocm~~

# GPTQ for LLaMA

https://github.com/WapaMario63/GPTQ-for-LLaMa-ROCm

# AutoGPTQ

https://github.com/are-we-gfx1100-yet/AutoGPTQ-rocm

Good performance. 43it/s for 7B, 25it/s for 13B, 15it/s for 30B, 0.25it/s for 40B 3bit, 1 beam.

# Triton

Navi 3x support is currently work in progress. Stay tuned.

[13% performance compared to rocBLAS](https://github.com/are-we-gfx1100-yet/triton/issues/2), when running [03-matrix-multiplication](https://triton-lang.org/main/getting-started/tutorials/03-matrix-multiplication.html), with [this branch](https://github.com/ROCmSoftwarePlatform/triton/pull/244), which is merged back recently.

There is still a lot of room for improvement.

# AITemplate

Navi 3x support is currently work in progress. Stay tuned.

Reach 25it/s in generating a 512x512 image with Stable Diffusion, with [this branch](https://github.com/ROCmSoftwarePlatform/AITemplate/tree/navi3_rel_ver_1.0).

Somewhat disappointing. Is this really the limit of the RX 7900 XTX?

# Flash Attention

To be ported to Navi 3x.

# ROCm

ROCm 5.6.0 is available now, but we can't find Windows support anywhere.

I think it might be more appropriate to call it ROCm 5.5.2.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap #2

BitsAndBytes

GPTQ for LLaMA

AutoGPTQ

Triton

AITemplate

Flash Attention

ROCm

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Roadmap #2

Description

BitsAndBytes

GPTQ for LLaMA

AutoGPTQ

Triton

AITemplate

Flash Attention

ROCm

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions