Skip to content

RentedNoodle/llama.den

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4,766 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GGUF inference engine for Blackwell SM120. NVFP4 tensor core path via OMMA.SF.16864 — the native 4-bit instruction instead of the DP4A fallback.

Not really a llama.cpp fork at this point. The NVFP4 stack, MoE dispatch, SSM kernels, governor FSM, and RT core integration are all custom. The upstream inheritance is basically just the GGML type system.

Build

Requires CUDA 12.8.

cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES="120a" cmake --build build -j$(nproc)

Quick test

cuobjdump --dump-sass build/ggml/src/libggml.so | grep -c "OMMA.SF.16864"

License

MIT.

About

Den experimental kernel forge — raw inline PTX tensor core path for Blackwell SM120. OMMA.SF.16864 cubins, SASS verification, fragment mapping. Where kernels are proven before promotion to den-nv.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • C++ 54.8%
  • Cuda 21.4%
  • C 12.8%
  • Python 4.2%
  • Metal 2.1%
  • Jinja 1.7%
  • Other 3.0%