Skip to content

stevenpal/zero-to-hero

Repository files navigation

Coursework for Neural Networks: Zero to Hero

This repo contains my notes and lab work for Andrej Karpathy's Neural Networks: Zero to Hero lecture series on YouTube.

I learned a ton from going through the course and especially appreciated his pedagogical approach where everything is explained and nothing is left as an abstraction for you to just accept at face value. I learned more from this approach compared to lectures where you just click "run" on already (or mostly) completed Jupyter Notebooks that heavily leverage frameworks.

Lectures & Labs

I've broken up my notes and lab work into folders based on each of the lectures. Within each folder, there's a Jupyter Notebook with my notes from the lectures (e.g. "01-Lecture-Building-micrograd.ipynb") and a separate Jupyter Notebook with my lab/exercise solutions (e.g. "01-Exercises-micrograd.ipynb"). I'm sharing these in case other folks that are trying to work through the course find it useful. Having it available on Github also lets me easily refer back to the notes I took =)

Lecture Key Concepts
1 - Intro to Neural Networks and Backpropagation * backpropagation (with scalars)
2 - Intro to Language Modeling * intro to torch.tensor
* language modeling process (training, sampling, loss evaluation)
* numpy tutorial (optional)
3 - MLP * multilayer perceptron (MLP) model
* training
* learning rate tuning
* hyperparameters
* evaluation
* splits
* under/overfitting
4 - Activations & Gradients, BatchNorm * statistics of forward pass activations, backward pass gradients, pitfalls
* diagnostic and visualization tools to understand network health
* fragility of network training
* batch normalization
5 - Backprop Ninja! * backpropagation (with tensors)
* intuition for how gradients flow and are computed
6 - WaveNet * convolutional neural network
* torch.nn
* development process (reading docs, tracking tensor shapes, using notebooks and scripts)
7 - Transformers * transformer architecture
* self-attention
* build Generatively Pretrained Transformer (GPT) from scratch
8 - Tokenizers * character sets (ascii, Unicode)
* encoding
* decoding
* byte-pair encoding (BPE)
* common frameworks (tiktoken, sentencepiece)
9 - Reproduce GPT-2 from scratch * build network matching GPT-2/GPT-3 papers
* intro to GPU chip architecture
* scaling to multi-GPU machines using Distributed Data Parallel (DDP)
* optimizations to speed up training (precision, torch.compile, FlashAttention, nice/ugly numbers)
* hyperparameter tuning (AdamW, gradient clipping, learning rate scheduling, gradient accumulation)
* training on Internet-scale datasets (FineWeb EDU)
* evaluation and benchmarking (using HellaSwag)

About

Lecture notes and exercises for Andrej Karpathy's "Neural Networks: Zero to Hero" YouTube lecture series.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors