A Julia package for computing neighbour lists in molecular simulations. Originally a port of the neighbourlist from matscipy, now extended with multi-threaded CPU and portable GPU support.
The package can be used stand-alone or with AtomsBase.jl.
using Pkg
Pkg.add("NeighbourLists")The neighbour_list() function provides a unified entry point that works on both CPU and GPU with the same API. The backend is automatically detected from the array type.
Note: The legacy
PairListconstructor using linked-list algorithm is retained as a reference implementation for testing. New code should useneighbour_list()instead. See DEPRECATIONS.md for migration details.
using NeighbourLists, StaticArrays, LinearAlgebra
# Create positions (CPU Vector)
L = 10.0
X = [SVector{3,Float64}(L*rand(), L*rand(), L*rand()) for _ in 1:10000]
cell = SMatrix{3,3,Float64}(L*I)
pbc = SVector{3,Bool}(true, true, true)
# Build neighbour list (uses sort-based algorithm with multi-threading)
nlist = neighbour_list(X, 3.0, cell, pbc)
# Access neighbours of atom 1
j, R, S = neighbours(nlist, 1)using NeighbourLists, StaticArrays, LinearAlgebra
using CUDA # or AMDGPU, Metal, oneAPI
# Create positions on GPU (only difference: use CuArray)
L = 10.0
X = CuArray([SVector{3,Float64}(L*rand(), L*rand(), L*rand()) for _ in 1:10000])
cell = SMatrix{3,3,Float64}(L*I)
pbc = SVector{3,Bool}(true, true, true)
# Same API - backend auto-detected from array type
nlist = neighbour_list(X, 3.0, cell, pbc)What's the same: The neighbour_list() API is identical on CPU and GPU. Cell matrix, cutoff, and boundary conditions work the same way.
What's different: Only the array type changes (Vector vs CuArray/ROCArray/etc.). The backend is automatically detected - no need to specify it manually.
For large systems where materializing all pairs is memory-intensive, use lazy mode:
# Returns a SortedCellList instead of materializing all pairs
clist = neighbour_list(X, 3.0, cell, pbc; lazy=true)
# Iterate without storing all pairs in memory
for i in 1:nsites(clist)
for_each_neighbour(clist, i) do j, R, S
# process neighbour j with distance vector R and shift S
end
endusing AtomsBuilder, NeighbourLists, Unitful
sys = bulk(:Cu, cubic=true) * (4, 4, 4)
nlist = neighbour_list(sys, 5.0u"Å")
j, R, S = neighbours(nlist, 1) # neighbours of atom 1
# Lazy mode also works with AtomsBase systems
clist = neighbour_list(sys, 5.0u"Å"; lazy=true)
for_each_neighbour(clist, 1) do j, R, S
# process neighbour
endThe implementation uses KernelAbstractions.jl for portable parallelism and AcceleratedKernels.jl for portable sorting. On CPU this enables multi-threading; on GPU it runs native parallel kernels.
The package provides two cell list implementations:
| Implementation | Algorithm | Parallelism | Status |
|---|---|---|---|
| Sort-based | Sort by cell ID | Multi-threaded CPU, GPU | Recommended |
| Legacy | Linked-list | Single-threaded | Reference implementation for testing |
Both produce identical results (validated in tests).
API Selection:
neighbour_list()always uses the sort-based implementation (recommended)PairList(system::AbstractSystem, cutoff)uses sort-based (for AtomsBase)PairList(X::Vector{SVec}, cutoff, cell, pbc)uses legacy linked-list (reference implementation)
The legacy linked-list implementation (CellList, _celllist_, _pairlist_) is retained as a reference for testing correctness, but new code should use the unified API.
Recommended changes:
- Use
neighbour_list(X, cutoff, cell, pbc)instead ofPairList(X, cutoff, cell, pbc) - Use
neighbours(nlist, i)instead ofneigss(nlist, i)(both still work) - For memory-efficient iteration, use
neighbour_list(...; lazy=true)withfor_each_neighbour
See DEPRECATIONS.md for the complete migration guide.
Benchmarks on NVIDIA RTX A4500 (cutoff = 5.0 Å, density = 0.05 atoms/ų):
| Atoms | Pairs | Legacy | CPU (1T) | CPU (8T) | GPU | Speedup |
|---|---|---|---|---|---|---|
| 1,000 | 26k | 8 ms | 3.6 ms | 3.4 ms | 2.3 ms | 3.5x |
| 5,000 | 131k | 38 ms | 17 ms | 3.9 ms | 2.2 ms | 17x |
| 10,000 | 262k | 84 ms | 35 ms | 7.8 ms | 2.4 ms | 36x |
| 50,000 | 1.3M | 516 ms | 201 ms | 31 ms | 4.2 ms | 124x |
| 100,000 | 2.6M | 1.1 s | 400 ms | 62 ms | 6.9 ms | 160x |
GPU throughput: ~370 million pairs/second for large systems.
Note: Speedup is GPU vs Legacy. Run julia --project -t N scripts/benchmark.jl to reproduce.
- Original inspiration from matscipy neighbourlist written by Lars Pastewka
- Linked-list approach was implemented by Christoph Ortner
- Sort-based approach idea proposed by Teemu Järvinen and Timon Gutleb, and implemented by James Kermode