Binary Thinning 3D CUDA

This package provides a blazing fast, memory-efficient GPU implementation of 3D Binary Thinning (skeletonization) using CUDA and PyTorch.

It is based on the 3D thinning algorithm by Lee, Kashyap and Chu (1994), which uses Euler characteristic invariance and 26-connectivity checks to safely erode a 3D binary volume down to a 1-pixel wide skeleton without altering its fundamental topology.

Features

This implementation provides two topologically safe operating modes to suit your needs:

Mode 0: GPU Subgrid 8-Color Parallel (mode=0, Default)
- Speed: Extremely Fast (~300x speedup over CPU)
- Behavior: Operates entirely on the GPU. It avoids race conditions by partitioning the image into an 8-color 3D checkerboard. It re-checks and deletes pixels of the same color in parallel because they are mathematically guaranteed not to touch each other.
- Topology: Topologically Safe. Produces a mathematically valid skeleton. Note: Because the deletion order differs slightly from a strict CPU raster-scan, the exact pixel placement may differ very slightly from ITK (e.g. 0.003% difference), but the overall global topology is preserved perfectly.
Mode 1: Hybrid CPU-GPU Sequential (mode=1)
- Speed: Fast (~100x speedup over CPU)
- Behavior: Calculates Euler invariance on the GPU in parallel, but performs the final 26-connectivity re-checks strictly sequentially on the CPU (using zero-overhead memory compaction and host-side sorting).
- Topology: 100% Identical to ITK. Guaranteed to produce the exact same pixel output as standard sequential CPU implementations like itk.BinaryThinningImageFilter3D.

Installation

Prerequisites

Python 3.10+
PyTorch (with CUDA support)
A CUDA-capable GPU

Install from PyPI (Recommended)

You can install the package directly from PyPI. Note that since this contains CUDA C++ extensions, it will be compiled on your machine during installation.

pip install binary-thinning-3d-cuda

Install from Source (Advanced Users)

For development or to run benchmarks, you can install from the source:

git clone https://github.com/sychen52/binary_thinning_3d_cuda.git
cd binary_thinning_3d_cuda

# Standard install
pip install -e --no-build-isolation .

# Install with development dependencies (for running benchmarks)
pip install -e --no-build-isolation ".[dev]"

(Note: itk-thickness3d and SimpleITK are not hard dependencies. They are only included in the [dev] extras for the purpose of benchmarking and validating against the CPU implementation).

Usage

The input can be a 3D PyTorch uint8 (Byte) tensor located on either a CPU or CUDA device.

If the tensor is on a CUDA device, the operation is performed in-place.
If the tensor is on the CPU, it is automatically moved to the GPU for processing and copied back to the original CPU tensor in-place.

All non-zero values are treated as foreground (0 for background, >0 for foreground).

import torch
from binary_thinning_3d import binary_thinning

# Create or load a 3D binary mask (CPU or GPU)
tensor = torch.zeros((100, 100, 100), dtype=torch.uint8)
tensor[25:75, 25:75, 25:75] = 1 # Solid block

# 1. GPU Subgrid (Default, Max Speed, Topologically Safe)
# Modifies the tensor in-place (handles CPU<->GPU transfer automatically)
binary_thinning(tensor, mode=0)

# 2. Hybrid CPU-GPU (Exact ITK Match)
binary_thinning(tensor, mode=1)

Benchmark

The following benchmark was run on a (767, 512, 512) NIfTI volume (CT Airways Label) containing 451,530 foreground voxels.

Hardware:

CPU: AMD Ryzen 7 2700 Eight-Core Processor
GPU: NVIDIA GeForce RTX 2070

The benchmark compares this CUDA implementation against itk.BinaryThinningImageFilter3D (which is run sequentially on the CPU). The CUDA timings include the time for CPU-to-GPU and GPU-to-CPU data transfers.

Method	Output Voxel Count	Time (Seconds)	Speedup vs ITK	Matches ITK CPU?
Mode 0 (GPU Subgrid)	4,286	0.38 s	331x	Topologically equivalent
Mode 1 (Hybrid CPU)	4,281	1.22 s	101x	Yes (100% Identical)
ITK (CPU Baseline)	4,281	139.90 s	1x	Baseline

To reproduce these benchmarks yourself:

# Ensure you installed with dev dependencies: pip install -e ".[dev]"
python examples/process_nifti.py

(The script will cache the slow ITK result to disk on the first run, so subsequent runs finish instantly).

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
binary_thinning_3d		binary_thinning_3d
csrc		csrc
data		data
examples		examples
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Binary Thinning 3D CUDA

Features

Installation

Prerequisites

Install from PyPI (Recommended)

Install from Source (Advanced Users)

Usage

Benchmark

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Binary Thinning 3D CUDA

Features

Installation

Prerequisites

Install from PyPI (Recommended)

Install from Source (Advanced Users)

Usage

Benchmark

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages