Skip to content
This repository was archived by the owner on Jun 28, 2024. It is now read-only.
This repository was archived by the owner on Jun 28, 2024. It is now read-only.

Roadmap #2

@evshiron

Description

@evshiron

BitsAndBytes

GPTQ for LLaMA

https://github.com/WapaMario63/GPTQ-for-LLaMa-ROCm

AutoGPTQ

https://github.com/are-we-gfx1100-yet/AutoGPTQ-rocm

Good performance. 43it/s for 7B, 25it/s for 13B, 15it/s for 30B, 0.25it/s for 40B 3bit, 1 beam.

Triton

Navi 3x support is currently work in progress. Stay tuned.

13% performance compared to rocBLAS, when running 03-matrix-multiplication, with this branch, which is merged back recently.

There is still a lot of room for improvement.

AITemplate

Navi 3x support is currently work in progress. Stay tuned.

Reach 25it/s in generating a 512x512 image with Stable Diffusion, with this branch.

Somewhat disappointing. Is this really the limit of the RX 7900 XTX?

Flash Attention

To be ported to Navi 3x.

ROCm

ROCm 5.6.0 is available now, but we can't find Windows support anywhere.

I think it might be more appropriate to call it ROCm 5.5.2.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions