Skip to content

JuliaMolSim/NeighbourLists.jl

Repository files navigation

NeighbourLists.jl

A Julia package for computing neighbour lists in molecular simulations. Originally a port of the neighbourlist from matscipy, now extended with multi-threaded CPU and portable GPU support.

The package can be used stand-alone or with AtomsBase.jl.

Installation

using Pkg
Pkg.add("NeighbourLists")

Unified API (Recommended)

The neighbour_list() function provides a unified entry point that works on both CPU and GPU with the same API. The backend is automatically detected from the array type.

Note: The legacy PairList constructor using linked-list algorithm is retained as a reference implementation for testing. New code should use neighbour_list() instead. See DEPRECATIONS.md for migration details.

CPU Example (Multi-threaded)

using NeighbourLists, StaticArrays, LinearAlgebra

# Create positions (CPU Vector)
L = 10.0
X = [SVector{3,Float64}(L*rand(), L*rand(), L*rand()) for _ in 1:10000]
cell = SMatrix{3,3,Float64}(L*I)
pbc = SVector{3,Bool}(true, true, true)

# Build neighbour list (uses sort-based algorithm with multi-threading)
nlist = neighbour_list(X, 3.0, cell, pbc)

# Access neighbours of atom 1
j, R, S = neighbours(nlist, 1)

GPU Example (CUDA, ROCm, Metal, oneAPI)

using NeighbourLists, StaticArrays, LinearAlgebra
using CUDA  # or AMDGPU, Metal, oneAPI

# Create positions on GPU (only difference: use CuArray)
L = 10.0
X = CuArray([SVector{3,Float64}(L*rand(), L*rand(), L*rand()) for _ in 1:10000])
cell = SMatrix{3,3,Float64}(L*I)
pbc = SVector{3,Bool}(true, true, true)

# Same API - backend auto-detected from array type
nlist = neighbour_list(X, 3.0, cell, pbc)

What's the same: The neighbour_list() API is identical on CPU and GPU. Cell matrix, cutoff, and boundary conditions work the same way.

What's different: Only the array type changes (Vector vs CuArray/ROCArray/etc.). The backend is automatically detected - no need to specify it manually.

Lazy Mode (Memory Efficient)

For large systems where materializing all pairs is memory-intensive, use lazy mode:

# Returns a SortedCellList instead of materializing all pairs
clist = neighbour_list(X, 3.0, cell, pbc; lazy=true)

# Iterate without storing all pairs in memory
for i in 1:nsites(clist)
    for_each_neighbour(clist, i) do j, R, S
        # process neighbour j with distance vector R and shift S
    end
end

AtomsBase.jl Integration

using AtomsBuilder, NeighbourLists, Unitful

sys = bulk(:Cu, cubic=true) * (4, 4, 4)
nlist = neighbour_list(sys, 5.0u"Å")
j, R, S = neighbours(nlist, 1)  # neighbours of atom 1

# Lazy mode also works with AtomsBase systems
clist = neighbour_list(sys, 5.0u"Å"; lazy=true)
for_each_neighbour(clist, 1) do j, R, S
    # process neighbour
end

The implementation uses KernelAbstractions.jl for portable parallelism and AcceleratedKernels.jl for portable sorting. On CPU this enables multi-threading; on GPU it runs native parallel kernels.

Two Implementations

The package provides two cell list implementations:

Implementation Algorithm Parallelism Status
Sort-based Sort by cell ID Multi-threaded CPU, GPU Recommended
Legacy Linked-list Single-threaded Reference implementation for testing

Both produce identical results (validated in tests).

API Selection:

  • neighbour_list() always uses the sort-based implementation (recommended)
  • PairList(system::AbstractSystem, cutoff) uses sort-based (for AtomsBase)
  • PairList(X::Vector{SVec}, cutoff, cell, pbc) uses legacy linked-list (reference implementation)

Migration Guide

The legacy linked-list implementation (CellList, _celllist_, _pairlist_) is retained as a reference for testing correctness, but new code should use the unified API.

Recommended changes:

  • Use neighbour_list(X, cutoff, cell, pbc) instead of PairList(X, cutoff, cell, pbc)
  • Use neighbours(nlist, i) instead of neigss(nlist, i) (both still work)
  • For memory-efficient iteration, use neighbour_list(...; lazy=true) with for_each_neighbour

See DEPRECATIONS.md for the complete migration guide.

Benchmarks

Benchmarks on NVIDIA RTX A4500 (cutoff = 5.0 Å, density = 0.05 atoms/ų):

Atoms Pairs Legacy CPU (1T) CPU (8T) GPU Speedup
1,000 26k 8 ms 3.6 ms 3.4 ms 2.3 ms 3.5x
5,000 131k 38 ms 17 ms 3.9 ms 2.2 ms 17x
10,000 262k 84 ms 35 ms 7.8 ms 2.4 ms 36x
50,000 1.3M 516 ms 201 ms 31 ms 4.2 ms 124x
100,000 2.6M 1.1 s 400 ms 62 ms 6.9 ms 160x

GPU throughput: ~370 million pairs/second for large systems.

Note: Speedup is GPU vs Legacy. Run julia --project -t N scripts/benchmark.jl to reproduce.

Acknowledgements

  • Original inspiration from matscipy neighbourlist written by Lars Pastewka
  • Linked-list approach was implemented by Christoph Ortner
  • Sort-based approach idea proposed by Teemu Järvinen and Timon Gutleb, and implemented by James Kermode

About

neighbour list for particle simulations based on matscipy

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages