I encountered an unexpected backend allocation behavior when using TensorOperations.jl together with AMDGPU arrays.
It seems that @tensor := always allocates a CPU Array, even when all input tensors live on the GPU. I would expect the output array type to follow the input array backend (similar to BLAS * or OMEinsum.jl behavior).
This leads to silent GPU → CPU fallback.
Minimal reproducible example
using AMDGPU
using TensorOperations
using OMEinsum
A = AMDGPU.rand(ComplexF64, 2, 2)
C1 = A * A
@tensor C2[1,3] := A[1,2] * A[2,3]
C3 = ein"ab,bc->ac"(A,A)
@show typeof(C1)
@show typeof(C2)
@show typeof(C3)
Output:
typeof(C1) = ROCArray{ComplexF64,2,...}
typeof(C2) = Matrix{ComplexF64}
typeof(C3) = ROCArray{ComplexF64,2,...}
Environment
TensorOperations v5.5.0
AMDGPU v2.2.1
OMEinsum v0.7.6
Julia 1.11