Overview

A lightweight deep learning framework built from scratch using raw CUDA with a friendly Python API including all necessary functionality for core DL components.

Features

Fully connected layer
Convolutional layer
GPU Acceleration (15x faster than regular NumPy, same speed as PyTorch for smaller stuff)
Flatten layer
Max pooling layer
ReLU activation
Softmax layer
Model save/load
Cross Entropy Loss & MSE Loss
Model & sequential classes
Training & eval loop
Mini-batching

Usage

Convolutional NN on MNIST

>>> py main.py
Input shape: (60000, 784)
Labels shape: (60000,)

Model(
  [0] Conv2d       ((1, 28, 28) → (5, 24, 24))
  [1] ReLU
  [2] Flatten
  [3] Linear       (2880 → 128)
  [4] ReLU
  [5] Linear       (128 → 10)
  Loss: CrossEntropyLoss
  Total parameters: 373,063
)

TRAINING...
EPOCH 1/10, Loss: 0.1227
...
EPOCH 10/10, Loss: 0.0347
Time spent training: 437.89s

EVALUATING...
Sample labels: [9 2 9 8 9 7 1 2 4 3]
Sample preds: [9 2 9 8 9 7 1 2 4 3]
Accuracy: 98.32%

Save weights? (y/n) >>> y
File name? (empty for default) >>> cnn-weights
Saved model weights to cnn-weights.pkl

With MaxPool (GPU ver.)

Model(
  [0] Conv2d       ((1, 28, 28) → (5, 24, 24))
  [1] ReLU
  [2] MaxPool # on gpu
  [3] Flatten
  [4] Linear       (720 → 128)
  [5] ReLU
  [6] Linear       (128 → 10)
  Loss: CrossEntropyLoss
  Total parameters: 96,583
  Device: CPU
)

TRAINING...
EPOCH 1/5, Loss: 1.75497549
...
EPOCH 5/5, Loss: 0.41454145
Finished in: 492.62s # CPU time 1200s

EVALUATING...
Sample labels: [8 5 6 4 2 4 2 4 1 3]
Sample preds: [8 5 6 4 4 4 2 4 1 3]
Accuracy: 89.95%

MLP on MNIST (GPU)

>>> py main.py
Input shape: (60000, 784)
Labels shape: (60000,)
Model(
  [0] Linear       (784 → 512)
  [1] ReLU
  [2] Linear       (512 → 512)
  [3] ReLU
  [4] Linear       (512 → 512)
  [5] ReLU
  [6] Linear       (512 → 10)
  Loss: CrossEntropyLoss
  Total parameters: 932,362
  Device: GPU
)

TRAINING...
EPOCH 1/10, Loss: 0.5499
...
EPOCH 10/10, Loss: 0.2297
Finished in: 9.51s

EVALUATING...
Sample labels: [7 3 1 1 0 8 0 8 6 4]
Sample preds: [7 3 1 1 0 0 0 8 6 4]
Accuracy: 95.70%

Save weights? (y/n) >>> n

With pretrained weights:

Loaded model weights from mlp-weights.pkl

EVALUATING...
Sample labels: [2 0 1 9 6 5 5 6 7 8]
Sample preds: [2 0 1 9 6 5 5 6 7 8]
Accuracy: 98.13%

Benchmarks

Note

This library doesn't have autograd (yet), graph tracing, mixed precision, tensor cores, cuDNN, cuBLAS, or any of the fancy stuff PyTorch does.

It only runs "faster" because it's lightweight.

Still beats pytorch at batch sizes <= 512 for MNIST though, so it's a win in my book.

All benchmarks were run on a RTX 4060, training a simple MNIST NN from scratch using this library’s GPU backend.

Model: Linear(784 → 512) → ReLU → Linear(512 → 512) → ReLU → Linear(512 → 512) → ReLU → Linear(512 → 10)

Loss: CrossEntropy

Optimizer: SGD, lr=0.1

Epochs: 10

Batch Size	Framework	Time (10 Epochs)
64	PyTorch	27.2s
64	This lib	20.2s
512	PyTorch	9.7s
512	This lib	9.5s

Why make this?

GPU programming seemed like a really fun problem space
Implement a bunch of things on my own
Experiment with a framework and learn cool stuff

Stack

Getting Started

Prerequisites

Python 3.10+
pip
CUDA Toolkit
CMake
gcc or g++

Installation

Clone the repo:

git clone https://github.com/sidsurakanti/tiny-ml-lib.git
cd tiny-ml-lib

Create a virtual environment (optional but recommended):

python3 -m venv venv
source venv/bin/activate  # windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Build core CUDA lib

mkdir build && cd build
cmake .. && make && make install
cd ..

Run it

python3 main.py

or

python main.py

Roadmap

Support

Need help? Ping me on discord

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
cuda		cuda
extern		extern
native		native
.clangd		.clangd
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
CMakeLists.txt		CMakeLists.txt
README.md		README.md
batcher.py		batcher.py
cnn-weights.pkl		cnn-weights.pkl
conv2d.py		conv2d.py
defs.py		defs.py
eg.png		eg.png
flatten.py		flatten.py
layer.py		layer.py
linear.py		linear.py
losses.py		losses.py
main.py		main.py
maxpool.py		maxpool.py
mlp-weights.pkl		mlp-weights.pkl
model.py		model.py
model_weights.pkl		model_weights.pkl
pyproject.toml		pyproject.toml
relu.py		relu.py
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Overview

Features

Usage

Benchmarks

Why make this?

Stack

Getting Started

Prerequisites

Installation

Roadmap

Support

About

Uh oh!

Uh oh!

Languages

sidsurakanti/tiny-ml-lib

Folders and files

Latest commit

History

Repository files navigation

Overview

Features

Usage

Benchmarks

Why make this?

Stack

Getting Started

Prerequisites

Installation

Roadmap

Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages