NeighbourLists.jl

A Julia package for computing neighbour lists in molecular simulations. Originally a port of the neighbourlist from matscipy, now extended with multi-threaded CPU and portable GPU support.

The package can be used stand-alone or with AtomsBase.jl.

Installation

using Pkg
Pkg.add("NeighbourLists")

Unified API (Recommended)

The neighbour_list() function provides a unified entry point that works on both CPU and GPU with the same API. The backend is automatically detected from the array type.

Note: The legacy PairList constructor using linked-list algorithm is retained as a reference implementation for testing. New code should use neighbour_list() instead. See DEPRECATIONS.md for migration details.

CPU Example (Multi-threaded)

using NeighbourLists, StaticArrays, LinearAlgebra

# Create positions (CPU Vector)
L = 10.0
X = [SVector{3,Float64}(L*rand(), L*rand(), L*rand()) for _ in 1:10000]
cell = SMatrix{3,3,Float64}(L*I)
pbc = SVector{3,Bool}(true, true, true)

# Build neighbour list (uses sort-based algorithm with multi-threading)
nlist = neighbour_list(X, 3.0, cell, pbc)

# Access neighbours of atom 1
j, R, S = neighbours(nlist, 1)

GPU Example (CUDA, ROCm, Metal, oneAPI)

using NeighbourLists, StaticArrays, LinearAlgebra
using CUDA  # or AMDGPU, Metal, oneAPI

# Create positions on GPU (only difference: use CuArray)
L = 10.0
X = CuArray([SVector{3,Float64}(L*rand(), L*rand(), L*rand()) for _ in 1:10000])
cell = SMatrix{3,3,Float64}(L*I)
pbc = SVector{3,Bool}(true, true, true)

# Same API - backend auto-detected from array type
nlist = neighbour_list(X, 3.0, cell, pbc)

What's the same: The neighbour_list() API is identical on CPU and GPU. Cell matrix, cutoff, and boundary conditions work the same way.

What's different: Only the array type changes (Vector vs CuArray/ROCArray/etc.). The backend is automatically detected - no need to specify it manually.

Lazy Mode (Memory Efficient)

For large systems where materializing all pairs is memory-intensive, use lazy mode:

# Returns a SortedCellList instead of materializing all pairs
clist = neighbour_list(X, 3.0, cell, pbc; lazy=true)

# Iterate without storing all pairs in memory
for i in 1:nsites(clist)
    for_each_neighbour(clist, i) do j, R, S
        # process neighbour j with distance vector R and shift S
    end
end

AtomsBase.jl Integration

using AtomsBuilder, NeighbourLists, Unitful

sys = bulk(:Cu, cubic=true) * (4, 4, 4)
nlist = neighbour_list(sys, 5.0u"Å")
j, R, S = neighbours(nlist, 1)  # neighbours of atom 1

# Lazy mode also works with AtomsBase systems
clist = neighbour_list(sys, 5.0u"Å"; lazy=true)
for_each_neighbour(clist, 1) do j, R, S
    # process neighbour
end

The implementation uses KernelAbstractions.jl for portable parallelism and AcceleratedKernels.jl for portable sorting. On CPU this enables multi-threading; on GPU it runs native parallel kernels.

Two Implementations

The package provides two cell list implementations:

Implementation	Algorithm	Parallelism	Status
Sort-based	Sort by cell ID	Multi-threaded CPU, GPU	Recommended
Legacy	Linked-list	Single-threaded	Reference implementation for testing

Both produce identical results (validated in tests).

API Selection:

neighbour_list() always uses the sort-based implementation (recommended)
PairList(system::AbstractSystem, cutoff) uses sort-based (for AtomsBase)
PairList(X::Vector{SVec}, cutoff, cell, pbc) uses legacy linked-list (reference implementation)

Migration Guide

The legacy linked-list implementation (CellList, _celllist_, _pairlist_) is retained as a reference for testing correctness, but new code should use the unified API.

Recommended changes:

Use neighbour_list(X, cutoff, cell, pbc) instead of PairList(X, cutoff, cell, pbc)
Use neighbours(nlist, i) instead of neigss(nlist, i) (both still work)
For memory-efficient iteration, use neighbour_list(...; lazy=true) with for_each_neighbour

See DEPRECATIONS.md for the complete migration guide.

Benchmarks

Benchmarks on NVIDIA RTX A4500 (cutoff = 5.0 Å, density = 0.05 atoms/Å³):

Atoms	Pairs	Legacy	CPU (1T)	CPU (8T)	GPU	Speedup
1,000	26k	8 ms	3.6 ms	3.4 ms	2.3 ms	3.5x
5,000	131k	38 ms	17 ms	3.9 ms	2.2 ms	17x
10,000	262k	84 ms	35 ms	7.8 ms	2.4 ms	36x
50,000	1.3M	516 ms	201 ms	31 ms	4.2 ms	124x
100,000	2.6M	1.1 s	400 ms	62 ms	6.9 ms	160x

GPU throughput: ~370 million pairs/second for large systems.

Note: Speedup is GPU vs Legacy. Run julia --project -t N scripts/benchmark.jl to reproduce.

Acknowledgements

Original inspiration from matscipy neighbourlist written by Lars Pastewka
Linked-list approach was implemented by Christoph Ortner
Sort-based approach idea proposed by Teemu Järvinen and Timon Gutleb, and implemented by James Kermode

Name		Name	Last commit message	Last commit date
Latest commit History 189 Commits
.github/workflows		.github/workflows
ext		ext
profile		profile
scripts		scripts
src		src
test		test
.codecov.yml		.codecov.yml
.gitignore		.gitignore
.travis.yml		.travis.yml
CondaPkg.toml		CondaPkg.toml
DEPRECATIONS.md		DEPRECATIONS.md
LICENSE.md		LICENSE.md
Project.toml		Project.toml
README.md		README.md
appveyor.yml		appveyor.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NeighbourLists.jl

Installation

Unified API (Recommended)

CPU Example (Multi-threaded)

GPU Example (CUDA, ROCm, Metal, oneAPI)

Lazy Mode (Memory Efficient)

AtomsBase.jl Integration

Two Implementations

Migration Guide

Benchmarks

Acknowledgements

About

Uh oh!

Releases 9

Packages

Uh oh!

Contributors 6

Languages

License

JuliaMolSim/NeighbourLists.jl

Folders and files

Latest commit

History

Repository files navigation

NeighbourLists.jl

Installation

Unified API (Recommended)

CPU Example (Multi-threaded)

GPU Example (CUDA, ROCm, Metal, oneAPI)

Lazy Mode (Memory Efficient)

AtomsBase.jl Integration

Two Implementations

Migration Guide

Benchmarks

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Contributors 6

Languages

Packages