Switch to CompilerCaching.jl by maleadt · Pull Request #777 · JuliaGPU/Metal.jl

maleadt · 2026-05-12T09:41:51Z

Replace `cached_compilation` with a `MetalResults` struct attached to each `CodeInstance` via `CompilerCaching`: `metallib` + entry name are session-portable (cached through precompilation), and the `MTLComputePipelineState` is materialized lazily per session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a `bitcode` field to `MetalResults` and overrides `GPUCompiler.bitcode` / `bitcode!`. Per-function runtime library bitcode now rides on the same precompilation path as `metallib`/`entry`, so cross-session loads can skip the runtime rebuild. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the single `pipeline` slot on `MetalResults` with a small linear cache of `(MTLDevice, MTLComputePipelineState)` pairs. The cache partition already covers the macOS / AIR / Metal versions that affect codegen, but two `MTLDevice`s on a single Mac (e.g. integrated + discrete) share the same `metallib` and need separate `MTLComputePipelineState`s. Hot-path cost is unchanged: one field load + one `===` compare. The common case (single device) stays at n=1. `link_pipeline` now takes the target `MTLDevice` explicitly instead of calling `device()` internally, so the call site captures the device once under `mtlfunction_lock`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`something(lookup(...), compile_metal!(...))` evaluated `compile_metal!` eagerly even on a cache hit, so every kernel launch silently re-ran the full LLVM compile pipeline. Branch explicitly on the lookup result. Warm-cache `mtlfunction` cost: ~3.4 ms → ~380 ns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds `bitcode_opaque` / `bitcode_typed` fields to `MetalResults` and overrides the new `GPUCompiler.bitcode` / `bitcode!` trait pair. Runtime functions (`gpu_malloc`, `gpu_report_exception`, …) now persist their post-irgen renamed LLVM bitcode on their own `CodeInstance` on 1.11+, which carries through package precompilation. 1.10 keeps falling back to the session-local `_runtime_libs` cache. The two slots reflect that opaque-pointer and typed-pointer LLVM IR aren't interchangeable. In practice modern LLVM always uses opaque pointers; the typed slot exists for older Julia/LLVM combinations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Matches GPUCompiler's simplified `bitcode`/`bitcode!` trait pair: LLVM's pointer mode is fixed across a precompile/load pair of the same Julia version, so one slot per `CodeInstance` is enough. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The cache-hit path in compile_or_lookup just needs to distinguish a CI that completed compilation from one that was only inferred as a callee. A direct r.metallib !== nothing check at the single call site is clearer than routing through a one-line trait override. Also switch to GPUCompiler.cache_view(job) instead of constructing the CacheView manually. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ut!. `MetalResults` now keeps every kernel pipeline stage (LLVM IR bytes, AIR, the final metallib) instead of just the last one. The intermediates were already being computed in `compile_to_metallib` and then discarded; storing them is ~free and lets reflection dump any phase post-hoc without re-running the compile. The single byte slot is named `llvm_ir` (not `bitcode`) because AIR is also a form of bitcode — the field always holds the LLVM-stage output. For runtime-library function CIs it's the final artifact; for kernel CIs it's an intermediate before the AIR downgrade. The content is binary bitcode (write(io, mod)), ~10× smaller than textual IR. Since each `MetalResults` is attached to a single CI — either a kernel MI or a runtime-function MI — the two roles never share a slot, so one field covers both cleanly. `cache_get` / `cache_put!` overrides on `MetalCompilerJob` implement GPUCompiler's new caching protocol, routing the `:llvm_ir` key through a CompilerCaching lookup to the relevant CI's `MetalResults.llvm_ir`. `compile_to_metallib` now returns a NamedTuple of all artifacts; the caller (`compile_metal!` on 1.11+, `_legacy_link` on 1.10) hands them onto the appropriate `MetalResults`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

GPUCompiler 2.0 now mediates all interaction with the compilation cache through a single version-agnostic entry point, cached_results, so Metal no longer needs to depend on CompilerCaching, fork its compile path on the Julia version, or implement the cache_get/cache_put! protocol for runtime library functions (GPUCompiler caches those itself now). This restores Julia 1.10 support: the same compile_or_lookup runs against the integrated code cache on 1.11+ and against GPUCompiler's session-local store on 1.10. MetalResults loses its llvm_ir slot — it only existed to serve the runtime library protocol — and keeps air/metallib/entry (session-portable) plus the per-device pipeline cache (session-local). The precompilation workload now compiles and links an actual kernel again: on 1.11+ the inference results and metallib bytes are stored in the package image (attached to the kernel's CodeInstance), so a fresh session can launch it without invoking the compiler — verified to hit the image, link in ~0.2s, and keep ObjectiveC handles out (mtlfunction skips the session-local pipeline cache while generating output). Launching the kernel during the workload is not possible: committing a command buffer and waiting for it hangs during precompilation. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

maleadt and others added 15 commits July 3, 2026 07:10

Use development version of GPUCompiler.jl.

06cdd88

Remove 1.10 from CI etc.

7fc3dc2

Restore 1.10 compat.

6fdfc2a

Rename.

fa7d16c

Partition Metal cache by debug level

8d72e74

Fix sources entry.

90d2673

maleadt force-pushed the tb/compilercaching branch from 00d69c1 to 90d2673 Compare July 3, 2026 06:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Switch to CompilerCaching.jl#777

Switch to CompilerCaching.jl#777
maleadt wants to merge 15 commits into
mainfrom
tb/compilercaching

maleadt commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

maleadt commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant