Skip to content

Add Apple Metal backend support#103

Open
ArturSkowronski wants to merge 1 commit intobeehive-lab:mainfrom
ArturSkowronski:feat/metal-backend-support
Open

Add Apple Metal backend support#103
ArturSkowronski wants to merge 1 commit intobeehive-lab:mainfrom
ArturSkowronski:feat/metal-backend-support

Conversation

@ArturSkowronski
Copy link
Copy Markdown

@ArturSkowronski ArturSkowronski commented Apr 7, 2026

Add Apple Metal backend support to the llama-tornado launcher, enabling GPULlama3 to run on macOS with TornadoVM's native Metal driver (shipped in TornadoVM 4.0+).

Changes (launcher only, no Java code changes):

  • Add METAL variant to the Backend enum
  • Add --metal CLI flag for backend selection
  • Configure Metal-specific module path (tornado.drivers.metal) and export list (metal-exports)

The TornadoVM API is backend-agnostic, so the Java inference code works without modification - only the launcher needed updating.

Motivation

TornadoVM 4.0 shipped a Metal backend (PR #796), but llama-tornado only supported --opencl and --ptx. The GPULlama3 README already notes:

"TornadoVM does not have a Metal backend yet [...] until we add a Metal backend to TornadoVM and start optimizing it."

TornadoVM 4.0 has now added it - this PR enables GPULlama3 to use it 😊

Benchmark Results

Tested on Apple M1 Pro (macOS, ARM64) with Llama-3.2-1B-Instruct-f16.gguf.

LLM Inference (GPULlama3)

Backend TornadoVM JDK tok/s Tokens Time (s)
OpenCL 2.2.0 21 (GraalVM CE) 6.35 47 7.40
OpenCL 3.0.0 25 (Temurin) 6.87 47 6.84
OpenCL 4.0.0 21 (GraalVM CE) 6.48 57 8.79
Metal 4.0.0 21 (GraalVM CE) 0.23 44 189.85

VectorAdd (10M elements)

Backend TornadoVM Best (ms) Throughput (GB/s)
OpenCL 2.2.0 2.704 41.34
OpenCL 3.0.0 2.427 46.04
OpenCL 4.0.0 2.695 41.47
Metal 4.0.0 2.392 46.72

Analysis

VectorAdd: Metal is competitive with OpenCL (~46 GB/s), slightly faster on simple parallel kernels. This matches expectations - the Metal backend handles straightforward array operations well.

LLM inference: Metal is ~28x slower than OpenCL (0.23 vs 6.48 tok/s). This is consistent with the known state of the Metal backend:

  • TornadoVM PR #796 notes that MSL-specific optimizations (threadgroup memory, SIMD shuffle, async copies) are not yet implemented
  • The Metal backend test pass rate does not yet match OpenCL/PTX/SPIR-V
  • LLM inference involves complex control flow and frequent CPU ↔ GPU transfers - precisely the workloads where the immature Metal backend struggles

Add --metal flag to enable running GPULlama3 with TornadoVM's Metal
backend on macOS. This requires TornadoVM 4.0+ which ships the Metal
driver (tornado.drivers.metal).

Tested on Apple M1 Pro with TornadoVM 4.0.0-jdk21 Metal SDK.
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 7, 2026

CLA assistant check
All committers have signed the CLA.

@mikepapadim mikepapadim self-requested a review April 7, 2026 21:27
@ArturSkowronski ArturSkowronski changed the title feat: add Apple Metal backend support Add Apple Metal backend support Apr 7, 2026
@ArturSkowronski
Copy link
Copy Markdown
Author

ArturSkowronski commented Apr 7, 2026

Hey @mikepapadim - sharing my numbers and analysis 😁 I wanted to test it for the new JVM Weekly, here are my results.

I will update also my repo with new TornadoVM: https://github.com/ArturSkowronski/conference-jvm-in-age-ai-2026

@mikepapadim
Copy link
Copy Markdown
Member

mikepapadim commented Apr 8, 2026

Hello @ArturSkowronski , thank you for your contribution! Thats great actually.

Can you let me know which models you tested with the metal backend?
I need to add into the CI workflow otherwise I am quite happy to merge.

Also, can you please sign the CLA?

@mikepapadim mikepapadim marked this pull request as ready for review April 8, 2026 08:35
@ArturSkowronski
Copy link
Copy Markdown
Author

@mikepapadim - Here you will find the whole "benchmark" I use 😊

ArturSkowronski/conference-jvm-in-age-ai-2026#13

Model under test from my side: Llama-3.2-1B-Instruct-f16.gguf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants