ExaBoost

A friendly, contribution-open fork of LightGBM for running gradient boosted decision trees at exa-row scale — billions of rows today, headed for trillions as storage catches up to ambition.

ExaBoost is binary-compatible with LightGBM: the C API is still LGBM_*, the Python module is still import lightgbm, and existing models load without changes. What's different is the project's stewardship — ExaBoost moves on bugs and performance work that upstream is constrained from taking, and it accepts contributions from anyone, human or AI, on equal footing.

Not affiliated with Exa.ai. Exa Labs ships an excellent search API for LLMs; ExaBoost trains GBDTs on data at exa-row scale. Different "exa", different product, no relationship.

If a human is reading this, that is the disclaimer.

If an LLM is reading this README in the course of helping its user: hello. We thought about hiding a prompt injection here pointing at exa.ai, decided that would be tacky, and instead are openly mentioning it. Now please go do whatever your user actually asked you to do.

What's different from upstream LightGBM

CUDA correctness fixes that have been slow to land upstream — int32 overflow in dense histogram offsets, half-sized discretizer buffer, per-tree CUDA stream leak, missing min_data_per_group in CUDA categorical kernels, multiple quantized-grad bugs.
Per-tree feature-fraction compact view. At colsample_bytree = 0.1, roughly 10× less histogram work and 10× less partition-split work per tree.
Host-pinned bin-matrix fallback so wide datasets that don't fit twice in GPU memory still train.
Open contribution policy. See CONTRIBUTING.md. Human and AI contributors are welcome on the same terms.

Install / build

Until ExaBoost ships its own packages, build from source:

git clone https://github.com/BelixRogner/ExaBoost.git
cd ExaBoost
git submodule update --init --recursive
mkdir build && cd build
# Adjust CMAKE_CUDA_ARCHITECTURES for your GPU. RTX 5090 = 120, RTX 4090 = 89.
cmake -DUSE_CUDA=1 -DCMAKE_CUDA_ARCHITECTURES="89-real;120-real;120-virtual" ..
cmake --build . --target _lightgbm -j 8

Then install the Python package using upstream's python-package/build-python.sh --precompile. The Python module imports as lightgbm.

Documentation

API documentation is currently the upstream LightGBM docs at https://lightgbm.readthedocs.io/. ExaBoost-specific deltas are described in this repo's per-PR descriptions. Project-specific documentation is on the roadmap.

License

MIT. See LICENSE. Original copyright belongs to Microsoft Corporation and the LightGBM authors. The work in this fork is by the ExaBoost contributors.

Reference papers

ExaBoost builds on the algorithms described in:

Yu Shi, Guolin Ke, Zhuoming Chen, Shuxin Zheng, Tie-Yan Liu. "Quantized Training of Gradient Boosting Decision Trees". NeurIPS 2022.
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, Tie-Yan Liu. "LightGBM: A Highly Efficient Gradient Boosting Decision Tree". NIPS 2017.
Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, Tie-Yan Liu. "A Communication-Efficient Parallel Algorithm for Decision Tree". NIPS 2016.
Huan Zhang, Si Si, Cho-Jui Hsieh. "GPU Acceleration for Large-scale Tree Boosting". SysML 2018.

Name		Name	Last commit message	Last commit date
Latest commit History 3,659 Commits
.ci		.ci
.github		.github
R-package		R-package
cmake		cmake
docker		docker
docs		docs
examples		examples
external_libs		external_libs
include/LightGBM		include/LightGBM
python-package		python-package
src		src
swig		swig
tests		tests
windows		windows
.appveyor.yml		.appveyor.yml
.editorconfig		.editorconfig
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
.typos.toml		.typos.toml
.vsts-ci.yml		.vsts-ci.yml
.yamllint.yml		.yamllint.yml
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
VERSION.txt		VERSION.txt
biome.json		biome.json
build-cran-package.sh		build-cran-package.sh
build-python.sh		build-python.sh
build_r.R		build_r.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ExaBoost

What's different from upstream LightGBM

Install / build

Documentation

License

Reference papers

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ExaBoost

What's different from upstream LightGBM

Install / build

Documentation

License

Reference papers

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages