Skip to content

This project benchmarks from-scratch quantization techniques for LLMs—Absmax, Zeropoint, and LLM.int8, benchmarking them based on perplexity and memory. It highlights mixed-precision int8+fp16 as an effective solution to preserve accuracy while reducing model size by nearly 3×.

Notifications You must be signed in to change notification settings

alishafique3/Quantization-From-Scratch-Pytorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

LLM.int8 vs Naive 8-bit Weight Quantization Using Pytorch

Large Language Models (LLMs) are compute-hungry beasts. Their size = number of parameters × precision. To reduce memory and accelerate inference, I explored quantization techniques, compressing weights from FP32 to INT8.

I ran two from-scratch methods:

  • Absmax (symmetric) quantization
  • Zeropoint (asymmetric) quantization

Both reduced memory significantly… but at a cost: higher perplexity, due to sensitivity to outliers.

𝗧𝗵𝗲 𝗢𝘂𝘁𝗹𝗶𝗲𝗿 𝗣𝗿𝗼𝗯𝗹𝗲𝗺

Outliers, extreme values (negative or positive), are common in transformer layers. Though rare, they skew quantization and can hurt precision. But removing them outright degrades performance.

𝗦𝗼𝗹𝘂𝘁𝗶𝗼𝗻: 𝗟𝗟𝗠.𝗶𝗻𝘁𝟴() (Bitsandbytes)

This method applies vector-wise quantization + mixed precision:

  • Most weights → INT8
  • Outliers (~0.1%) → FP16

The result: better accuracy with 2.9× smaller memory.

𝗠𝗲𝗺𝗼𝗿𝘆 𝗙𝗼𝗼𝘁𝗽𝗿𝗶𝗻𝘁

  • Original (FP16): 510 MB
  • INT8: 176 MB

🧠 Pro tip: Mixed INT8+FP16 handles outliers without effecting model performance — ideal for real-world LLM deployments.

Perplexity Plot

quantization

References:

About

This project benchmarks from-scratch quantization techniques for LLMs—Absmax, Zeropoint, and LLM.int8, benchmarking them based on perplexity and memory. It highlights mixed-precision int8+fp16 as an effective solution to preserve accuracy while reducing model size by nearly 3×.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published