Skip to content

Handle dense, contiguous non-Array matrices (e.g. FixedSizeArray) in defaultalg / dense LU caches#1038

Merged
ChrisRackauckas merged 1 commit into
SciML:mainfrom
ChrisRackauckas-Claude:fix-dense-nonarray-1034
Jun 14, 2026
Merged

Handle dense, contiguous non-Array matrices (e.g. FixedSizeArray) in defaultalg / dense LU caches#1038
ChrisRackauckas merged 1 commit into
SciML:mainfrom
ChrisRackauckas-Claude:fix-dense-nonarray-1034

Conversation

@ChrisRackauckas-Claude

@ChrisRackauckas-Claude ChrisRackauckas-Claude commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #1034. A FixedSizeArray (and any other dense, contiguous, pointer-addressable CPU array that isn't a Base.Array) is <: DenseMatrix and directly LAPACK/BLAS-eligible, but several parts of LinearSolve assumed the dense operator/RHS is specifically a Matrix/Array, so such matrices were either routed to a matrix-free iterative solver or hit a cacheval setfield! TypeError.

Problem 1 — defaultalg routing keyed on the concrete Array type

The selection logic is really about dense, pointer-addressable CPU memory, but it was written against Array:

  • The dense direct-factorization branch, the Krylov-cacheval skip, and the dense-vs-sparse QR-fallback-reuse split were gated on A isa Matrix / cache.A isa Array. These are broadened to DenseMatrix, so a dense non-Array matrix routes to a direct factorization instead of falling through to KrylovJL_GMRES (with its accuracy loss).
  • The MKLLUFactorization / AppleAccelerateLUFactorization default-selection guards were b isa Array. These are broadened to b isa DenseArray && !(b isa GPUArraysCore.AnyGPUArray) so those BLAS backends are used for dense CPU memory whenever they're faster, regardless of the concrete container. Routing is now container-agnostic: a FixedSizeMatrix resolves to the same algorithm a Matrix of the same size/eltype does (including MKL/Apple where present).

GPU arrays keep their own defaultalg dispatch and are unaffected.

Problem 2 — some dense LU caches assumed Array-typed containers

The cacheval field is type-parameterized, so its type is fixed by init_cacheval and must match what solve! stores:

  • GenericLUFactorization / RFLUFactorization: solve! stores a Vector{BlasInt} pivot, but lu_instance(A) types the pivot after A's container (e.g. a FixedSizeVector). The instance is rebuilt with the Vector pivot. (RFLUFactorization is included because the broadened defaultalg can select it for these matrices; it had the same latent bug.)
  • MKLLUFactorization / OpenBLASLUFactorization / AppleAccelerateLUFactorization / BLISLUFactorization: the preallocated cache was built from rand(0,0) (a Matrix), so a non-Array dense factorization was the wrong type for the slot. The instance is now built from a 0×0 array of A's own type, and the dense BLAS-eltype method is extended to DenseMatrix{<:BLASELTYPES} (which also now covers Float64 dense containers).

Tests

Adds test/core/fixedsizearrays.jl (over FixedSizeArrays) covering:

  • default routing of a dense FixedSizeMatrix to a direct factorization (not GMRES) with direct-LU accuracy;
  • container-agnostic routing: defaultalg on a FixedSizeMatrix equals defaultalg on a Matrix of the same size/eltype, across sizes/eltypes (exercises the MKL/Apple guards wherever those binaries exist on CI);
  • LUFactorization, GenericLUFactorization, QRFactorization, OpenBLASLUFactorization across Float32/Float64/ComplexF64;
  • default solve + re-solve;
  • a binary-independent check that the MKL/OpenBLAS/AppleAccelerate cacheval slot tracks A's container (runs even where those binaries are absent).

Local verification

On Julia 1.12 / Linux (OpenBLAS; MKL & AppleAccelerate binaries not present locally):

  • The issue's reproducer routes to LUFactorization with resid ≈ 2e-15 (was KrylovJL_GMRES, resid ≈ 1.4e-8).
  • GenericLUFactorization and RFLUFactorization no longer throw; OpenBLASLUFactorization works through the real getrf! ccall on a FixedSizeArray — the same ccall pattern MKL/Apple/BLIS use.
  • New tests pass (47/47); core/basictests.jl and core/default_algs.jl pass with no regressions.
  • MKL/AppleAccelerate routing for FixedSizeArray is verified at the routing level (container-agnostic equality test) and the cache-type level (slot-container test) plus the analogous OpenBLAS end-to-end path; the MKL/Apple ccall itself isn't exercised locally for want of those binaries.

Please ignore until reviewed by @ChrisRackauckas.

🤖 Generated with Claude Code

@ChrisRackauckas ChrisRackauckas marked this pull request as ready for review June 14, 2026 08:26
A `FixedSizeArray` (and any other dense, contiguous CPU array that isn't a
`Base.Array`) is `<: DenseMatrix` and directly LAPACK/BLAS-eligible, but several
parts of LinearSolve assumed the dense operator/RHS is specifically a `Matrix`/
`Array`. Fixes SciML#1034.

defaultalg routing (now keyed on dense memory, not `Array`):
- The dense direct-factorization branch, the Krylov-cacheval skip, and the
  dense-vs-sparse QR-fallback-reuse split were gated on `A isa Matrix`/
  `cache.A isa Array`. Broaden to `DenseMatrix`, so a dense non-`Array` matrix
  routes to a direct factorization instead of the matrix-free `KrylovJL_GMRES`
  default (with its accuracy loss).
- The MKL/AppleAccelerate default-selection guards were `b isa Array`. Broaden
  to `b isa DenseArray && !(b isa AnyGPUArray)` so those BLAS backends are used
  for dense CPU memory regardless of being a `Base.Array` — matching the intent
  that the choice is about dense memory, not the concrete container. Routing is
  now container-agnostic: a `FixedSizeMatrix` resolves to the same algorithm a
  `Matrix` of the same size/eltype does. GPU arrays keep their own `defaultalg`
  dispatch and are unaffected.

init_cacheval (the type-parameterized `cacheval` slot must match what `solve!`
stores):
- `GenericLUFactorization`/`RFLUFactorization`: `solve!` stores a `Vector{BlasInt}`
  pivot, but `lu_instance(A)` types the pivot after `A`'s container (e.g. a
  `FixedSizeVector`). Rebuild the instance with the `Vector` pivot.
- `MKL`/`OpenBLAS`/`AppleAccelerate`/`BLIS`: the preallocated cache was built
  from `rand(0,0)` (a `Matrix`), so a non-`Array` dense factorization was the
  wrong type for the slot. Build the instance from a 0×0 array of `A`'s own
  type, and extend the dense BLAS-eltype method to `DenseMatrix{<:BLASELTYPES}`.

Adds `test/core/fixedsizearrays.jl` covering default routing (incl.
container-agnostic equivalence with `Array`), the dense LU algorithms across
Float32/Float64/ComplexF64, default solve/re-solve, and a binary-independent
check that the BLAS-LU cacheval slot tracks `A`'s container.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
@ChrisRackauckas ChrisRackauckas merged commit b10b6a2 into SciML:main Jun 14, 2026
7 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dense, contiguous non-Array matrices (e.g. FixedSizeArray) aren't handled by defaultalg / some dense LU caches

2 participants