Handle dense, contiguous non-Array matrices (e.g. FixedSizeArray) in defaultalg / dense LU caches#1038
Merged
ChrisRackauckas merged 1 commit intoJun 14, 2026
Conversation
ChrisRackauckas
approved these changes
Jun 13, 2026
85143f6 to
bb1a156
Compare
A `FixedSizeArray` (and any other dense, contiguous CPU array that isn't a `Base.Array`) is `<: DenseMatrix` and directly LAPACK/BLAS-eligible, but several parts of LinearSolve assumed the dense operator/RHS is specifically a `Matrix`/ `Array`. Fixes SciML#1034. defaultalg routing (now keyed on dense memory, not `Array`): - The dense direct-factorization branch, the Krylov-cacheval skip, and the dense-vs-sparse QR-fallback-reuse split were gated on `A isa Matrix`/ `cache.A isa Array`. Broaden to `DenseMatrix`, so a dense non-`Array` matrix routes to a direct factorization instead of the matrix-free `KrylovJL_GMRES` default (with its accuracy loss). - The MKL/AppleAccelerate default-selection guards were `b isa Array`. Broaden to `b isa DenseArray && !(b isa AnyGPUArray)` so those BLAS backends are used for dense CPU memory regardless of being a `Base.Array` — matching the intent that the choice is about dense memory, not the concrete container. Routing is now container-agnostic: a `FixedSizeMatrix` resolves to the same algorithm a `Matrix` of the same size/eltype does. GPU arrays keep their own `defaultalg` dispatch and are unaffected. init_cacheval (the type-parameterized `cacheval` slot must match what `solve!` stores): - `GenericLUFactorization`/`RFLUFactorization`: `solve!` stores a `Vector{BlasInt}` pivot, but `lu_instance(A)` types the pivot after `A`'s container (e.g. a `FixedSizeVector`). Rebuild the instance with the `Vector` pivot. - `MKL`/`OpenBLAS`/`AppleAccelerate`/`BLIS`: the preallocated cache was built from `rand(0,0)` (a `Matrix`), so a non-`Array` dense factorization was the wrong type for the slot. Build the instance from a 0×0 array of `A`'s own type, and extend the dense BLAS-eltype method to `DenseMatrix{<:BLASELTYPES}`. Adds `test/core/fixedsizearrays.jl` covering default routing (incl. container-agnostic equivalence with `Array`), the dense LU algorithms across Float32/Float64/ComplexF64, default solve/re-solve, and a binary-independent check that the BLAS-LU cacheval slot tracks `A`'s container. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
bb1a156 to
16380bc
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #1034. A
FixedSizeArray(and any other dense, contiguous, pointer-addressable CPU array that isn't aBase.Array) is<: DenseMatrixand directly LAPACK/BLAS-eligible, but several parts of LinearSolve assumed the dense operator/RHS is specifically aMatrix/Array, so such matrices were either routed to a matrix-free iterative solver or hit acachevalsetfield!TypeError.Problem 1 —
defaultalgrouting keyed on the concreteArraytypeThe selection logic is really about dense, pointer-addressable CPU memory, but it was written against
Array:cachevalskip, and the dense-vs-sparse QR-fallback-reuse split were gated onA isa Matrix/cache.A isa Array. These are broadened toDenseMatrix, so a dense non-Arraymatrix routes to a direct factorization instead of falling through toKrylovJL_GMRES(with its accuracy loss).MKLLUFactorization/AppleAccelerateLUFactorizationdefault-selection guards wereb isa Array. These are broadened tob isa DenseArray && !(b isa GPUArraysCore.AnyGPUArray)so those BLAS backends are used for dense CPU memory whenever they're faster, regardless of the concrete container. Routing is now container-agnostic: aFixedSizeMatrixresolves to the same algorithm aMatrixof the same size/eltype does (including MKL/Apple where present).GPU arrays keep their own
defaultalgdispatch and are unaffected.Problem 2 — some dense LU caches assumed
Array-typed containersThe
cachevalfield is type-parameterized, so its type is fixed byinit_cachevaland must match whatsolve!stores:GenericLUFactorization/RFLUFactorization:solve!stores aVector{BlasInt}pivot, butlu_instance(A)types the pivot afterA's container (e.g. aFixedSizeVector). The instance is rebuilt with theVectorpivot. (RFLUFactorizationis included because the broadeneddefaultalgcan select it for these matrices; it had the same latent bug.)MKLLUFactorization/OpenBLASLUFactorization/AppleAccelerateLUFactorization/BLISLUFactorization: the preallocated cache was built fromrand(0,0)(aMatrix), so a non-Arraydense factorization was the wrong type for the slot. The instance is now built from a 0×0 array ofA's own type, and the dense BLAS-eltype method is extended toDenseMatrix{<:BLASELTYPES}(which also now coversFloat64dense containers).Tests
Adds
test/core/fixedsizearrays.jl(overFixedSizeArrays) covering:FixedSizeMatrixto a direct factorization (not GMRES) with direct-LU accuracy;defaultalgon aFixedSizeMatrixequalsdefaultalgon aMatrixof the same size/eltype, across sizes/eltypes (exercises the MKL/Apple guards wherever those binaries exist on CI);LUFactorization,GenericLUFactorization,QRFactorization,OpenBLASLUFactorizationacrossFloat32/Float64/ComplexF64;MKL/OpenBLAS/AppleAcceleratecachevalslot tracksA's container (runs even where those binaries are absent).Local verification
On Julia 1.12 / Linux (OpenBLAS; MKL & AppleAccelerate binaries not present locally):
LUFactorizationwithresid ≈ 2e-15(wasKrylovJL_GMRES,resid ≈ 1.4e-8).GenericLUFactorizationandRFLUFactorizationno longer throw;OpenBLASLUFactorizationworks through the realgetrf!ccall on aFixedSizeArray— the same ccall pattern MKL/Apple/BLIS use.core/basictests.jlandcore/default_algs.jlpass with no regressions.FixedSizeArrayis verified at the routing level (container-agnostic equality test) and the cache-type level (slot-container test) plus the analogous OpenBLAS end-to-end path; the MKL/Apple ccall itself isn't exercised locally for want of those binaries.Please ignore until reviewed by @ChrisRackauckas.
🤖 Generated with Claude Code