I have an AMD strix halo, and would like to add it to my hardware in hugging face.
Currently, optimum-amd focuses heavily on datacenter Instinct hardware. As the Strix Halo (Ryzen AI Max) platform becomes available, there is a lack of official validation and metadata tagging for this high-performance APU. Specifically, the RDNA 3.5 (gfx1151) architecture combined with massive unified memory pools (128GB) presents a unique opportunity for local LLM inference that isn't currently addressed.
Describe the solution you'd like
Hardware Tagging: Add strix-halo or ryzen-ai-max to the supported hardware metadata in Hugging Face Hub.
Validation: Official validation for optimum-amd workflows on RDNA 3.5 (gfx1151) using the Linux 6.18.8+ kernel.
Unified Memory Optimization: Ensure that memory allocation strategies in optimum acknowledge the unified nature of this chip, specifically supporting GTT allocations beyond the default 50% RAM cap.
Describe alternatives you've considered Currently using manual amdgpu kernel parameters (amdgpu.gttsize=126976) and Vulkan/ROCm nightly builds to achieve stability. While these work for power users, native support in optimum would lower the barrier for the broader AI community using these APUs.
Additional context
Hardware: AMD Ryzen AI Max 395 (Strix Halo)
RAM: 128GB Unified Memory (LPDDR5x)
OS: Fedora 43
Kernel: 6.18.8 (includes critical VGPR mismatch and ROCm initialization fixes)
Architecture: RDNA 3.5 (gfx1151) + XDNA 2 NPU
Performance Note: Using this setup, I am successfully running large models (30B+ MoE) that typically require datacenter hardware, thanks to the massive unified memory pool.
I have an AMD strix halo, and would like to add it to my hardware in hugging face.
Currently, optimum-amd focuses heavily on datacenter Instinct hardware. As the Strix Halo (Ryzen AI Max) platform becomes available, there is a lack of official validation and metadata tagging for this high-performance APU. Specifically, the RDNA 3.5 (gfx1151) architecture combined with massive unified memory pools (128GB) presents a unique opportunity for local LLM inference that isn't currently addressed.
Describe the solution you'd like
Hardware Tagging: Add strix-halo or ryzen-ai-max to the supported hardware metadata in Hugging Face Hub.
Validation: Official validation for optimum-amd workflows on RDNA 3.5 (gfx1151) using the Linux 6.18.8+ kernel.
Unified Memory Optimization: Ensure that memory allocation strategies in optimum acknowledge the unified nature of this chip, specifically supporting GTT allocations beyond the default 50% RAM cap.
Describe alternatives you've considered Currently using manual amdgpu kernel parameters (amdgpu.gttsize=126976) and Vulkan/ROCm nightly builds to achieve stability. While these work for power users, native support in optimum would lower the barrier for the broader AI community using these APUs.
Additional context
Hardware: AMD Ryzen AI Max 395 (Strix Halo)
RAM: 128GB Unified Memory (LPDDR5x)
OS: Fedora 43
Kernel: 6.18.8 (includes critical VGPR mismatch and ROCm initialization fixes)
Architecture: RDNA 3.5 (gfx1151) + XDNA 2 NPU
Performance Note: Using this setup, I am successfully running large models (30B+ MoE) that typically require datacenter hardware, thanks to the massive unified memory pool.