Skip to content

reymg/llama_custom_kernel

Repository files navigation

Custom RMSNorm kernel for LLama3-8B.

This repository provides a drop-in custom CUDA RMSNorm implementation and integration for Transformer-based large language models, with explicit support for LLaMA models.

Usage

uvicorn app:app --host 0.0.0.0 --port 8000

Generate text:

curl -X POST http://localhost:8000/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Explain RMSNorm",
    "max_new_tokens": 128
  }'

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors