Skip to content

Segmentation fault in Julia v1.11.4 due to unaligned SIMD loads #57713

@dzhang314

Description

@dzhang314

There is a serious regression in Julia v1.11 that makes it completely unusable for my applications built on top of SIMD.jl + MultiFloats.jl. I think it was supposed to be fixed in #56937 and #56938, but I still observe the issue in v1.11.4.

If we have a struct with a member of type NTuple{N,VecElement{T}}, reading that struct member from memory generates an aligned vector load instruction. This is a serious problem because allocations are no longer 64-byte-aligned in Julia v1.11, and AVX-512 loads segfault if the target address is not 64-byte aligned. This makes it impossible to work with these structs, which badly breaks SIMD.jl. We cannot even print them:

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.11.4 (2025-03-10)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> struct S; data::NTuple{8,VecElement{Float64}}; end

julia> for _ = 1:10; v = Vector{S}(undef, 1); println(v); end
S[
[22569] signal 11 (128): Segmentation fault
in expression starting at REPL[2]:1
getindex at ./essentials.jl:917 [inlined]
show_delim_array at ./show.jl:1397
show_delim_array at ./show.jl:1387 [inlined]
show_vector at ./arrayshow.jl:530
show_vector at ./arrayshow.jl:515 [inlined]
show at ./arrayshow.jl:486 [inlined]
print at ./strings/io.jl:35
print at ./strings/io.jl:46
println at ./strings/io.jl:75
unknown function (ip: 0x70b1b3f04e46)
println at ./coreio.jl:4
top-level scope at ./REPL[2]:1
jl_toplevel_eval_flex at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/toplevel.c:886
ijl_toplevel_eval_in at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/toplevel.c:994
eval at ./boot.jl:430 [inlined]
eval_user_input at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:245
repl_backend_loop at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:342
#start_repl_backend#59 at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:327
start_repl_backend at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:324
#run_repl#72 at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:483
run_repl at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/usr/share/julia/stdlib/v1.11/REPL/src/REPL.jl:469
jfptr_run_repl_10102 at /home/dkzhang/.julia/juliaup/julia-1.11.4+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_PBQaY.so (unknown line)
#1150 at ./client.jl:446
jfptr_YY.1150_14761 at /home/dkzhang/.julia/juliaup/julia-1.11.4+0.x64.linux.gnu/share/julia/compiled/v1.11/REPL/u0gqU_PBQaY.so (unknown line)
jl_apply at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
jl_f__call_latest at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/builtins.c:875
#invokelatest#2 at ./essentials.jl:1055 [inlined]
invokelatest at ./essentials.jl:1052 [inlined]
run_main_repl at ./client.jl:430
repl_main at ./client.jl:567 [inlined]
_start at ./client.jl:541
jfptr__start_73560 at /home/dkzhang/.julia/juliaup/julia-1.11.4+0.x64.linux.gnu/lib/julia/sys.so (unknown line)
jl_apply at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/julia.h:2157 [inlined]
true_main at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/jlapi.c:900
jl_repl_entrypoint at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/src/jlapi.c:1059
main at /cache/build/builder-amdci5-5/julialang/julia-release-1-dot-11/cli/loader_exe.c:58
unknown function (ip: 0x70b1b522a1c9)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 800305 (Pool: 800275; Big: 30); GC: 1
Segmentation fault (core dumped)

Here, I'm performing 10 trials to be conservative -- on my machine, I always get a segfault in the first 3-4 tries. We can see the problematic load instruction using code_native:

julia> code_native(getindex, (Vector{S}, Int))
	.text
	.file	"getindex"
	.globl	julia_getindex_399              # -- Begin function julia_getindex_399
	.p2align	4, 0x90
	.type	julia_getindex_399,@function
julia_getindex_399:                     # @julia_getindex_399
; Function Signature: getindex(Array{Main.S, 1}, Int64)
; ┌ @ essentials.jl:914 within `getindex`
# %bb.0:                                # %top
; │ @ essentials.jl within `getindex`
	#DEBUG_VALUE: getindex:A <- [DW_OP_deref] $rsi
	#DEBUG_VALUE: getindex:i <- $rdx
	#DEBUG_VALUE: getindex:A <- [DW_OP_deref] 0
	push	rbp
	mov	rbp, rsp
	sub	rsp, 16
; │ @ essentials.jl:916 within `getindex`
	lea	rax, [rdx - 1]
	cmp	rax, qword ptr [rsi + 16]
	jae	.LBB0_2
# %bb.1:                                # %L15
; │ @ essentials.jl:917 within `getindex`
	mov	rcx, qword ptr [rsi]
	shl	rax, 6
	vmovaps	zmm0, zmmword ptr [rcx + rax]
	vmovaps	zmmword ptr [rdi], zmm0
	mov	rax, rdi
	add	rsp, 16
	pop	rbp
	vzeroupper
	ret
.LBB0_2:                                # %L12
; │ @ essentials.jl:916 within `getindex`
	mov	qword ptr [rbp - 8], rdx
	movabs	rcx, offset j_throw_boundserror_411
	lea	rax, [rbp - 8]
	mov	rdi, rsi
	mov	rsi, rax
	call	rcx
.Lfunc_end0:
	.size	julia_getindex_399, .Lfunc_end0-julia_getindex_399
; └
                                        # -- End function
	.type	".L+Main.S#401",@object         # @"+Main.S#401"
	.section	.rodata,"a",@progbits
	.p2align	3, 0x0
".L+Main.S#401":
	.quad	".L+Main.S#401.jit"
	.size	".L+Main.S#401", 8

.set ".L+Main.S#401.jit", 124970931979792
	.size	".L+Main.S#401.jit", 8
	.section	".note.GNU-stack","",@progbits

Either that vmovaps needs to be a vmovups, or all allocations need to be 64-byte-aligned again.

Metadata

Metadata

Assignees

No one assigned

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions