This project explores multiple deep learning techniques across computer vision and natural language processing. The objectives include:
- Implementing and training LeNet-5, a classic convolutional neural network.
- Calculating the total number of trainable parameters in the model.
- Experimenting with different training configurations by modifying batch size, learning rate, and number of epochs.
- Training nanoGPT on the Shakespeare dataset for character-level text generation.
- Fine-tuning DistilGPT2 on custom text data and generating text using various decoding strategies.
This project contains three major components:
- Implementation and training of CNNs using PyTorch on the CIFAR-100 dataset.
- Character-level language modeling using nanoGPT trained on Shakespeare data.
- Supervised fine-tuning of DistilGPT2 on a custom dataset, followed by text generation.
Training may take up to one hour depending on hardware setup.
This project uses:
- Python 3
numpytorchtorchvisiontqdm
A Conda environment setup is recommended:
conda create -n "dl-project" pytorch torchvision torchaudio anaconda::tqdm cpuonly -c pytorch
conda activate dl-projectThe CNN models in this project are trained on the CIFAR-100 dataset:
- 100 classes
- 600 images per class
- Image size: 32×32
- 500 training images + 100 test images per class
Helper scripts are provided to download and prepare the dataset automatically.
Implemented using PyTorch with the following layers:
- Conv2d → ReLU → MaxPool
- Conv2d → ReLU → MaxPool
- Flatten
- Fully connected (256) → ReLU
- Fully connected (128) → ReLU
- Fully connected (100)
The forward pass returns:
- model output
- a dictionary containing intermediate feature map shapes from each stage
A function computes the total number of trainable parameters (in units of millions) using model.named_parameters().
LeNet-5 is trained under multiple configurations, including:
- Default settings
- Batch sizes: 8, 16
- Learning rates: 0.05, 0.01
- Epoch counts: 20, 5
Each configuration produces a trained model and validation accuracy. Results are stored in results.txt.
This project includes a lightweight GPT implementation trained on Shakespeare’s complete works.
-
Use nanoGPT repository
-
Prepare dataset using:
python data/shakespeare_char/prepare.py
A smaller transformer is trained with:
- 4 layers
- 4 attention heads
- Embedding size 128
- Block size 64
- Batch size 12
- 2000 training iterations
Training parameters may be adjusted for experimentation.
Generate Shakespeare-like text:
python sample.py --out_dir=out-shakespeare-char --device=cpuGenerated samples are saved in generated_nanogpt.txt.
This project fine-tunes DistilGPT2 on a custom dataset built from WikiText sources.
-
Generate dataset:
python make_data_csv.py
-
Train on CPU:
python distilgpt2_sft_cpu.py --data data.csv --mode train
-
Implement decoding control and text generation:
python distilgpt2_sft_cpu.py --mode gen
Generated text is stored in distilgpt2.txt.