📘 Deep Learning Experiments: CNNs, nanoGPT, and DistilGPT2

1. Project Goals

This project explores multiple deep learning techniques across computer vision and natural language processing. The objectives include:

Implementing and training LeNet-5, a classic convolutional neural network.
Calculating the total number of trainable parameters in the model.
Experimenting with different training configurations by modifying batch size, learning rate, and number of epochs.
Training nanoGPT on the Shakespeare dataset for character-level text generation.
Fine-tuning DistilGPT2 on custom text data and generating text using various decoding strategies.

2. Summary

This project contains three major components:

Implementation and training of CNNs using PyTorch on the CIFAR-100 dataset.
Character-level language modeling using nanoGPT trained on Shakespeare data.
Supervised fine-tuning of DistilGPT2 on a custom dataset, followed by text generation.

Training may take up to one hour depending on hardware setup.

3. Environment Setup

This project uses:

Python 3
numpy
torch
torchvision
tqdm

A Conda environment setup is recommended:

conda create -n "dl-project" pytorch torchvision torchaudio anaconda::tqdm cpuonly -c pytorch
conda activate dl-project

4. Dataset: CIFAR-100

The CNN models in this project are trained on the CIFAR-100 dataset:

100 classes
600 images per class
Image size: 32×32
500 training images + 100 test images per class

Helper scripts are provided to download and prepare the dataset automatically.

5. Model Implementation

5.1 LeNet-5

Implemented using PyTorch with the following layers:

Conv2d → ReLU → MaxPool
Conv2d → ReLU → MaxPool
Flatten
Fully connected (256) → ReLU
Fully connected (128) → ReLU
Fully connected (100)

The forward pass returns:

model output
a dictionary containing intermediate feature map shapes from each stage

5.2 Parameter Counting

A function computes the total number of trainable parameters (in units of millions) using model.named_parameters().

6. Training Experiments (CNN)

LeNet-5 is trained under multiple configurations, including:

Default settings
Batch sizes: 8, 16
Learning rates: 0.05, 0.01
Epoch counts: 20, 5

Each configuration produces a trained model and validation accuracy. Results are stored in results.txt.

7. Training nanoGPT on Shakespeare

This project includes a lightweight GPT implementation trained on Shakespeare’s complete works.

Setup

Use nanoGPT repository
Prepare dataset using:
```
python data/shakespeare_char/prepare.py
```

Training

A smaller transformer is trained with:

4 layers
4 attention heads
Embedding size 128
Block size 64
Batch size 12
2000 training iterations

Training parameters may be adjusted for experimentation.

Inference

Generate Shakespeare-like text:

python sample.py --out_dir=out-shakespeare-char --device=cpu

Generated samples are saved in generated_nanogpt.txt.

8. Fine-Tuning DistilGPT2 (NLP)

This project fine-tunes DistilGPT2 on a custom dataset built from WikiText sources.

Steps:

Generate dataset:
```
python make_data_csv.py
```

Train on CPU:

python distilgpt2_sft_cpu.py --data data.csv --mode train

Implement decoding control and text generation:
```
python distilgpt2_sft_cpu.py --mode gen
```

Generated text is stored in distilgpt2.txt.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
dataloader.py		dataloader.py
distilgpt2_sft_cpu.py		distilgpt2_sft_cpu.py
eval_cifar100.py		eval_cifar100.py
make_data_csv.py		make_data_csv.py
student_code.py		student_code.py
train_cifar100.py		train_cifar100.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📘 Deep Learning Experiments: CNNs, nanoGPT, and DistilGPT2

1. Project Goals

2. Summary

3. Environment Setup

4. Dataset: CIFAR-100

5. Model Implementation

5.1 LeNet-5

5.2 Parameter Counting

6. Training Experiments (CNN)

7. Training nanoGPT on Shakespeare

Setup

Training

Inference

8. Fine-Tuning DistilGPT2 (NLP)

Steps:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📘 Deep Learning Experiments: CNNs, nanoGPT, and DistilGPT2

1. Project Goals

2. Summary

3. Environment Setup

4. Dataset: CIFAR-100

5. Model Implementation

5.1 LeNet-5

5.2 Parameter Counting

6. Training Experiments (CNN)

7. Training nanoGPT on Shakespeare

Setup

Training

Inference

8. Fine-Tuning DistilGPT2 (NLP)

Steps:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages