Skip to content
This repository was archived by the owner on May 12, 2026. It is now read-only.
This repository was archived by the owner on May 12, 2026. It is now read-only.

calculate_residuals fails on GPU with Tracker #600

@avik-pal

Description

@avik-pal

MWE:

using DiffEqBase, Flux
using ForwardDiff: Dual, partials, value

dual(x, p) = x
dual(x::Real, p) = Dual(x, p)

function mypartial(f, Δs, i, args::Vararg)
    dargs = ntuple(j -> dual(args[j], i == j), length(args))
    a = f(dargs...)
end

mypartial.(
    DiffEqBase.calculate_residuals,
    0.036217686f0,
    1,
    [6.051575f-9] |> gpu,  # Remove this gpu call and it works fine
    1.0f0,
    1.1397732f0,
    0.001f0,
    0.001f0,
    DiffEqBase.ODE_DEFAULT_NORM,
    0.0f0
)[1]

When used with Tracker, this function fails to compile on GPU

InvalidIRError: compiling kernel broadcast_kernel(CUDA.CuKernelContext, CuDeviceArray{Dual{Nothing,Float32,1},1,1}, Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(mypartial),Tuple{CUDA.CuRefValue{typeof(DiffEqBase.calculate_residuals)},Float32,Int64,Base.Broadcast.Extruded{CuDeviceArray{Float32,1,1},Tuple{Bool},Tuple{Int64}},Float32,Float32,Float32,Float32,CUDA.CuRefValue{typeof(DiffEqBase.ODE_DEFAULT_NORM)},Float32}}, Int64) resulted in invalid LLVM IR
Reason: unsupported call to the Julia runtime (call to jl_f_tuple)
Stacktrace:
 [1] _broadcast_getindex_evalf at broadcast.jl:648
 [2] _broadcast_getindex at broadcast.jl:621
 [3] getindex at broadcast.jl:575
 [4] broadcast_kernel at /home/avikpal/.julia/packages/GPUArrays/ZxsKE/src/host/broadcast.jl:62
Reason: unsupported call to the Julia runtime (call to jl_f_apply_type)
Stacktrace:
 [1] mypartial at In[103]:2
 [2] _broadcast_getindex_evalf at broadcast.jl:648
 [3] _broadcast_getindex at broadcast.jl:621
 [4] getindex at broadcast.jl:575
 [5] broadcast_kernel at /home/avikpal/.julia/packages/GPUArrays/ZxsKE/src/host/broadcast.jl:62
Reason: unsupported call to the Julia runtime (call to jl_new_structv)
Stacktrace:
 [1] mypartial at In[103]:2
 [2] _broadcast_getindex_evalf at broadcast.jl:648
 [3] _broadcast_getindex at broadcast.jl:621
 [4] getindex at broadcast.jl:575
 [5] broadcast_kernel at /home/avikpal/.julia/packages/GPUArrays/ZxsKE/src/host/broadcast.jl:62
Reason: unsupported dynamic function invocation (call to ntuple)
Stacktrace:
 [1] mypartial at In[103]:2
 [2] _broadcast_getindex_evalf at broadcast.jl:648
 [3] _broadcast_getindex at broadcast.jl:621
 [4] getindex at broadcast.jl:575
 [5] broadcast_kernel at /home/avikpal/.julia/packages/GPUArrays/ZxsKE/src/host/broadcast.jl:62
Reason: unsupported call to the Julia runtime (call to jl_f__apply_iterate)
Stacktrace:
 [1] mypartial at In[103]:3
 [2] _broadcast_getindex_evalf at broadcast.jl:648
 [3] _broadcast_getindex at broadcast.jl:621
 [4] getindex at broadcast.jl:575
 [5] broadcast_kernel at /home/avikpal/.julia/packages/GPUArrays/ZxsKE/src/host/broadcast.jl:62

Stacktrace:
 [1] check_ir(::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget,CUDA.CUDACompilerParams}, ::LLVM.Module) at /home/avikpal/.julia/packages/GPUCompiler/uTpNx/src/validation.jl:123
 [2] macro expansion at /home/avikpal/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:239 [inlined]
 [3] macro expansion at /home/avikpal/.julia/packages/TimerOutputs/ZmKD7/src/TimerOutput.jl:206 [inlined]
 [4] codegen(::Symbol, ::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool) at /home/avikpal/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:237
 [5] compile(::Symbol, ::GPUCompiler.CompilerJob; libraries::Bool, deferred_codegen::Bool, optimize::Bool, strip::Bool, validate::Bool, only_entry::Bool) at /home/avikpal/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:39
 [6] compile at /home/avikpal/.julia/packages/GPUCompiler/uTpNx/src/driver.jl:35 [inlined]
 [7] cufunction_compile(::GPUCompiler.FunctionSpec; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/avikpal/.julia/packages/CUDA/BIYoG/src/compiler/execution.jl:310
 [8] cufunction_compile(::GPUCompiler.FunctionSpec) at /home/avikpal/.julia/packages/CUDA/BIYoG/src/compiler/execution.jl:305
 [9] check_cache(::Dict{UInt64,Any}, ::Any, ::Any, ::GPUCompiler.FunctionSpec{GPUArrays.var"#broadcast_kernel#12",Tuple{CUDA.CuKernelContext,CuDeviceArray{Dual{Nothing,Float32,1},1,1},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(mypartial),Tuple{CUDA.CuRefValue{typeof(DiffEqBase.calculate_residuals)},Float32,Int64,Base.Broadcast.Extruded{CuDeviceArray{Float32,1,1},Tuple{Bool},Tuple{Int64}},Float32,Float32,Float32,Float32,CUDA.CuRefValue{typeof(DiffEqBase.ODE_DEFAULT_NORM)},Float32}},Int64}}, ::UInt64; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/avikpal/.julia/packages/GPUCompiler/uTpNx/src/cache.jl:40
 [10] broadcast_kernel at /home/avikpal/.julia/packages/GPUArrays/ZxsKE/src/host/broadcast.jl:60 [inlined]
 [11] cached_compilation at /home/avikpal/.julia/packages/GPUCompiler/uTpNx/src/cache.jl:65 [inlined]
 [12] cufunction(::GPUArrays.var"#broadcast_kernel#12", ::Type{Tuple{CUDA.CuKernelContext,CuDeviceArray{Dual{Nothing,Float32,1},1,1},Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(mypartial),Tuple{CUDA.CuRefValue{typeof(DiffEqBase.calculate_residuals)},Float32,Int64,Base.Broadcast.Extruded{CuDeviceArray{Float32,1,1},Tuple{Bool},Tuple{Int64}},Float32,Float32,Float32,Float32,CUDA.CuRefValue{typeof(DiffEqBase.ODE_DEFAULT_NORM)},Float32}},Int64}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/avikpal/.julia/packages/CUDA/BIYoG/src/compiler/execution.jl:297
 [13] cufunction at /home/avikpal/.julia/packages/CUDA/BIYoG/src/compiler/execution.jl:294 [inlined]
 [14] #launch_heuristic#853 at /home/avikpal/.julia/packages/CUDA/BIYoG/src/gpuarrays.jl:19 [inlined]
 [15] launch_heuristic at /home/avikpal/.julia/packages/CUDA/BIYoG/src/gpuarrays.jl:17 [inlined]
 [16] copyto! at /home/avikpal/.julia/packages/GPUArrays/ZxsKE/src/host/broadcast.jl:66 [inlined]
 [17] copyto! at ./broadcast.jl:886 [inlined]
 [18] copy at ./broadcast.jl:862 [inlined]
 [19] materialize(::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1},Nothing,typeof(mypartial),Tuple{Base.RefValue{typeof(DiffEqBase.calculate_residuals)},Float32,Int64,CuArray{Float32,1},Float32,Float32,Float32,Float32,Base.RefValue{typeof(DiffEqBase.ODE_DEFAULT_NORM)},Float32}}) at ./broadcast.jl:837
 [20] top-level scope at In[118]:1
 [21] include_string(::Function, ::Module, ::String, ::String) at ./loading.jl:1091

NOTE: The MWE reported is the extracted part from Tracker's backward pass (https://github.com/FluxML/Tracker.jl/blob/master/src/lib/array.jl#L546) which was producing the error.

Package Versions:

  [fbb218c0] BSON v0.2.6
  [052768ef] CUDA v2.2.1
  [f68482b8] Cthulhu v1.3.0
  [2b5f629d] DiffEqBase v6.48.2
  [459566f4] DiffEqCallbacks v2.14.1
  [aae7a2af] DiffEqFlux v1.23.0
  [41bf760c] DiffEqSensitivity v6.34.0
  [31c24e10] Distributions v0.23.12
  [ced4e74d] DistributionsAD v0.6.9
  [da5c29d0] EllipsisNotation v1.0.0
  [587475ba] Flux v0.11.2
  [f6369f11] ForwardDiff v0.10.12
  [a75be94c] GalacticOptim v0.4.1
  [f67ccb44] HDF5 v0.13.6
  [cc2ba9b6] MLDataUtils v0.5.2
  [eb30cadb] MLDatasets v0.5.2
  [1914dd2f] MacroTools v0.5.6
  [872c559c] NNlib v0.7.6
  [15e1cf62] NPZ v0.4.0
  [429524aa] Optim v1.2.0
  [1dea7af3] OrdinaryDiffEq v5.45.0
  [91a5bcdd] Plots v1.9.0
  [33c8b6b6] ProgressLogging v0.1.3
  [37e2e3b7] ReverseDiff v1.4.3
  [9f7883ad] Tracker v0.2.12 `/mnt/research/Tracker`
  [ddb6d928] YAML v0.4.2
  [e88e6eb3] Zygote v0.5.9
  [37e2e46d] LinearAlgebra
  [de0858da] Printf
  [10745b16] Statistics

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions