Add Apple Metal backend support#103
Open
ArturSkowronski wants to merge 1 commit intobeehive-lab:mainfrom
Open
Conversation
Add --metal flag to enable running GPULlama3 with TornadoVM's Metal backend on macOS. This requires TornadoVM 4.0+ which ships the Metal driver (tornado.drivers.metal). Tested on Apple M1 Pro with TornadoVM 4.0.0-jdk21 Metal SDK.
Author
|
Hey @mikepapadim - sharing my numbers and analysis 😁 I wanted to test it for the new JVM Weekly, here are my results. I will update also my repo with new TornadoVM: https://github.com/ArturSkowronski/conference-jvm-in-age-ai-2026 |
Member
|
Hello @ArturSkowronski , thank you for your contribution! Thats great actually. Can you let me know which models you tested with the metal backend? Also, can you please sign the CLA? |
Author
|
@mikepapadim - Here you will find the whole "benchmark" I use 😊 ArturSkowronski/conference-jvm-in-age-ai-2026#13 Model under test from my side: Llama-3.2-1B-Instruct-f16.gguf |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add Apple Metal backend support to the
llama-tornadolauncher, enabling GPULlama3 to run on macOS with TornadoVM's native Metal driver (shipped in TornadoVM 4.0+).Changes (launcher only, no Java code changes):
METALvariant to theBackendenum--metalCLI flag for backend selectiontornado.drivers.metal) and export list (metal-exports)The TornadoVM API is backend-agnostic, so the Java inference code works without modification - only the launcher needed updating.
Motivation
TornadoVM 4.0 shipped a Metal backend (PR #796), but
llama-tornadoonly supported--opencland--ptx. The GPULlama3 README already notes:TornadoVM 4.0 has now added it - this PR enables GPULlama3 to use it 😊
Benchmark Results
Tested on Apple M1 Pro (macOS, ARM64) with Llama-3.2-1B-Instruct-f16.gguf.
LLM Inference (GPULlama3)
VectorAdd (10M elements)
Analysis
VectorAdd: Metal is competitive with OpenCL (~46 GB/s), slightly faster on simple parallel kernels. This matches expectations - the Metal backend handles straightforward array operations well.
LLM inference: Metal is ~28x slower than OpenCL (0.23 vs 6.48 tok/s). This is consistent with the known state of the Metal backend: