Coursework for Neural Networks: Zero to Hero

This repo contains my notes and lab work for Andrej Karpathy's Neural Networks: Zero to Hero lecture series on YouTube.

I learned a ton from going through the course and especially appreciated his pedagogical approach where everything is explained and nothing is left as an abstraction for you to just accept at face value. I learned more from this approach compared to lectures where you just click "run" on already (or mostly) completed Jupyter Notebooks that heavily leverage frameworks.

Lectures & Labs

I've broken up my notes and lab work into folders based on each of the lectures. Within each folder, there's a Jupyter Notebook with my notes from the lectures (e.g. "01-Lecture-Building-micrograd.ipynb") and a separate Jupyter Notebook with my lab/exercise solutions (e.g. "01-Exercises-micrograd.ipynb"). I'm sharing these in case other folks that are trying to work through the course find it useful. Having it available on Github also lets me easily refer back to the notes I took =)

Lecture	Key Concepts
1 - Intro to Neural Networks and Backpropagation	* backpropagation (with scalars)
2 - Intro to Language Modeling	* intro to torch.tensor * language modeling process (training, sampling, loss evaluation) * numpy tutorial (optional)
3 - MLP	* multilayer perceptron (MLP) model * training * learning rate tuning * hyperparameters * evaluation * splits * under/overfitting
4 - Activations & Gradients, BatchNorm	* statistics of forward pass activations, backward pass gradients, pitfalls * diagnostic and visualization tools to understand network health * fragility of network training * batch normalization
5 - Backprop Ninja!	* backpropagation (with tensors) * intuition for how gradients flow and are computed
6 - WaveNet	* convolutional neural network * torch.nn * development process (reading docs, tracking tensor shapes, using notebooks and scripts)
7 - Transformers	* transformer architecture * self-attention * build Generatively Pretrained Transformer (GPT) from scratch
8 - Tokenizers	* character sets (ascii, Unicode) * encoding * decoding * byte-pair encoding (BPE) * common frameworks (tiktoken, sentencepiece)
9 - Reproduce GPT-2 from scratch	* build network matching GPT-2/GPT-3 papers * intro to GPU chip architecture * scaling to multi-GPU machines using Distributed Data Parallel (DDP) * optimizations to speed up training (precision, torch.compile, FlashAttention, nice/ugly numbers) * hyperparameter tuning (AdamW, gradient clipping, learning rate scheduling, gradient accumulation) * training on Internet-scale datasets (FineWeb EDU) * evaluation and benchmarking (using HellaSwag)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
01-Micrograd		01-Micrograd
02-Makemore		02-Makemore
03-MLP		03-MLP
04-ActivationsGradientsBatchNorm		04-ActivationsGradientsBatchNorm
05-Backprop		05-Backprop
06-WaveNet		06-WaveNet
07-GPT-Exercises		07-GPT-Exercises
07-GPT		07-GPT
08-Tokenizer		08-Tokenizer
09-GPT2		09-GPT2
assets		assets
data		data
.DS_Store		.DS_Store
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Coursework for Neural Networks: Zero to Hero

Lectures & Labs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Coursework for Neural Networks: Zero to Hero

Lectures & Labs

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages