Skip to content

SolbiatiAlessandro/machine-learning

Repository files navigation

machine-learning

Folder Project Status
/DeepSeek Deepseek R1 https://www.youtube.com/watch?v=XMnxKGVnEUc&t=20s tiny R1 https://github.com/Jiayi-Pan/TinyZero Not Started
/Vision R1 https://www.youtube.com/watch?v=XMnxKGVnEUc&t=20s tiny R1 https://github.com/Jiayi-Pan/TinyZero Not Started
/Llama2 Llama2 from scratch https://www.youtube.com/watch?v=oM4VmoabDAI&pp=0gcJCUUJAYcqIYzv Not Started
/stable-diffusion Coding stable diffusion from scratch https://www.youtube.com/watch?v=ZBKpAp_6TGI Not Started
/flash-attention triton Flash Attention kernel https://youtu.be/zy8ChVd_oTM?si=5106nYOB4sj_X5RQ Not Started
/SFT Supervised Fine Tunining - InstructGPT https://arxiv.org/abs/2203.02155 Not Started
/GPT-2 GPT-2, replicating GPT2 from scratch following karpathy/nanoGPT but with my own tokenizer Completed
/BERT BERT, replicating BERT paper (without additional guidance) using my own tokenizer Completed
/llm-tokenizer Tokenizers, LLMs tokenizers following karpathy/minbpe Completed
/mingpt MiniGPT, implementing transformers from 'Attention is All You Need' following karpathy/minGPT Completed
/makemore Makemore, simple language models following karpathy/makemore Completed
/micrograd Micrograd, implementation of backpropagation following karpathy/micrograd Completed
/zero-shot-retrieval Zero Shot LLM Retrieval, submissions to Kaggle VMWare Zero-shot Retrieval competition Completed
/personalized-fashion-recommendations SparseNN Recommender System, submissions to Kaggle H&M Recommender System competition Completed
/algorithms Codeforces contests and Leetcode Hard Design questions 🟠 In Progress

machine-learning/GPT-2

Replicated GPT2 from scratch following karpathy/nanoGPT but with my own tokenizer, achieving same loss and eval performance to hugging face GPT2 model. This is an example of improving token/s throughput for given model size by optimizing pytorch operations. Read more on my GPT2 training notes

machine-learning/BERT

Implemention of BERT paper without additional guidance using my own tokenizer. This is an example of pretraining on Next Sentence Prediction (NSP) and Masked Language Modelling (MLM) tasks.

ML-engineer/MiniGPT

Implementation of "Attention is All You Need" trasfomer architecture with minimal pytorch APIs, similar to karpathy/minGPT. This is the next word prediction cross-entropy loss achieved on the Shakespeare dataset with different baselines.

*the number of parameters is wrong, should be in the range of millions. Re-running the best model with Karpathy hyperparams achieved a validation loss of 1.66. This is an example generation of Shakespeare like text with Transformer@3k parameters.

machine-learning/Makemore

Replicated the makemore repo from Karpathy from his Zero to Hero NN course. Implementation of name-generation language models. Bi-grams, MLP, RNNs and other models in plain pytorch. This is the performance I was able to reproduce independently on the several architectures covered in the course.

Here are some interesting histogram from hyperparameter search on some simple language model

machine-learning/Micrograd

Python-only implemention of Neural Networks. Playing with my own implementation of micrograd from Karpathy. Some interesting results

  • make_moons_30_Jan_A.ipynb - a small MLP is able to optimize loss function, but it learns a linear function. Not able to make the model learn non linearity.
  • make_moons_30_Jan_B.ipynb - a small MLP with more techniques is able to learn non linear function from scikit learn moon. The circles function are half learned but not completely

machine-learning/Zero Shot LLM Retrieval

Using VMWare docs corpus (30M documents) from Kaggle to implement a e2e retrieval system using LLM encoders and generative models. Picture below is the tensorboard of the 12 stacked transformer blocks from https://huggingface.co/intfloat/e5-small-v2 used for text embedding

machine-learning/SparseNN Recommender System

Using Fashion Recommender System dataset to build a muli-stage ranking recommender system for 10M users and 100k fashion articles https://www.kaggle.com/competitions/h-and-m-personalized-fashion-recommendations

  • personalized-fashion-recommendation-2-Feb-B.ipynb TTSN model for candidate retrieval, trained only on categorical features with customer and article tower, improving recall@1000 from 1% to 10%. Will probably need to bring recall higher before moving on to ranking stage.

Other Resources

About

machine learning from the ground up

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published