The goal of this project is to help bridge communication gaps for people with hearing or speech impairments by building an image-based ASL recognition system. Using deep learning, the model can classify hand gestures representing letters of the ASL alphabet.
- Overview
- Thesis Reference
- Dataset
- Model Architecture
- Implemented Notebooks
- Model Comparison
- Results
- Installation
- How to Run
- Key Insights
- Future Improvements
- About the Author
- Repository Structure
This project implements Convolutional Neural Networks (CNNs) to recognize American Sign Language (ASL) alphabet letters from images.
It uses data augmentation, regularization, and transfer learning to build robust models capable of accurately classifying hand gestures
The project was conducted as part of my Bachelor Thesis in Artificial Intelligence, exploring computer vision for accessibility — aiming to enhance communication for individuals with hearing and speech impairments.
This repository contains two main approaches:
- A Custom CNN built from scratch with data augmentation and regularization.
- A Transfer Learning solution using VGG-16 fine-tuning.
Both approaches and their evaluation are discussed in my bachelor thesis (link and details below).
Bachelor Thesis (PDF) — Full report, methodology, and results:
🔗 Drive link (public):
Thesis
Base Dataset: ASL Alphabet Dataset on Kaggle
The original dataset consists of labeled RGB images of hand gestures representing:
- 26 letters (A–Z)
- 3 special classes: space, delete, and nothing
- Contains labeled images of hand gestures for A–Z, plus space, delete, and nothing
- Training images: ~87,000
- Test images: ~29,000
- Each image: 200x200 pixels, RGB
Custom Modifications: To enhance model compatibility and reduce preprocessing overhead, the dataset was modified as follows:
- Resized all images to 64×64 pixels for faster CNN training.
- Cleaned and standardized folder structure for easier loading.
- Generated a subset of unique images per class for faster visualization and validation.
- Applied ImageDataGenerator for real-time augmentation (rotation, zoom, flipping, shifting).
Final structure:
/asl_alphabet_train /A /B ... /Z /space /delete /nothing
Frameworks & Libraries Used
- TensorFlow / Keras
- NumPy, Pandas, Matplotlib, Seaborn
- OpenCV
- scikit-learn
Model Design
- Custom Sequential CNN with:
- Multiple convolutional layers (ReLU activation)
- MaxPooling for feature reduction
- Dropout + L2 regularization to reduce overfitting
- Batch Normalization for stable training
- Fully connected Dense layers for classification
- Optimizer: Adam
- Loss: Categorical Crossentropy
- Metrics: Accuracy, F1-score, Precision
Data Augmentation:
- Random rotation
- Horizontal flipping
- Zoom and width/height shift
Implemented usingImageDataGeneratorto improve generalization.
A custom-designed Sequential CNN built from scratch using TensorFlow/Keras.
Focused on lightweight design, regularization, and high accuracy through data augmentation.
Layers Overview
- Convolutional + ReLU
- MaxPooling
- Dropout + L2 Regularization
- Batch Normalization
- Dense Output Layer (Softmax, 29 classes)
Training Setup
- Optimizer: Adam
- Loss: Categorical Crossentropy
- Epochs: 30
- Image Size: 64×64×3
- Batch Size: 32
Frameworks Used TensorFlow, Keras, NumPy, Pandas, OpenCV, Matplotlib, Seaborn, scikit-learn
- Built using Keras Sequential API
- Includes heavy use of data augmentation
- Trained on resized ASL dataset
- Reached high accuracy and low overfitting
📊 This notebook demonstrates understanding of CNN fundamentals and image preprocessing pipelines.
- Uses a pre-trained VGG-16 network on ImageNet
- Top layers fine-tuned for ASL classification
- Retains convolutional base to leverage pre-learned visual features
- Employs same preprocessing pipeline (augmentation, resizing, normalization)
📊 This notebook demonstrates the use of transfer learning for improved generalization and reduced training time.
| Feature | Custom CNN | VGG-16 Transfer Learning |
|---|---|---|
| Model Type | Sequential (built from scratch) | Pre-trained (Transfer Learning) |
| Parameters | ~1.2M | ~15M |
| Training Time | Faster (due to fewer layers) | Slower (heavier model) |
| Accuracy | 96–98% | 97–99% |
| F1-Score | ~0.95 | ~0.97 |
| Overfitting | Slight (mitigated with dropout) | Minimal due to pre-trained base |
| Use Case | Lightweight, deployable model | High accuracy for research & production |
| Complexity | Lower | Higher (fine-tuning required) |
📌 The results showed that while both models achieved excellent accuracy, VGG-16 performed slightly better on unseen data — demonstrating the power of transfer learning.
| Metric | Custom CNN | VGG-16 |
|---|---|---|
| Training Accuracy | 98% | 99% |
| Validation Accuracy | 96% | 98% |
| F1-Score | 0.95 | 0.97 |
| Loss (Validation) | 0.18 | 0.09 |
You can install the required packages manually:
You can install the required packages manually: ... pip install tensorflow keras numpy pandas opencv-python matplotlib seaborn scikit-learn ...
1.Clone this repository:
git clone https://github.com/your-username/asl-cnn.git
cd asl-cnn
jupyter notebook custom-cnn-using-data-augmentation.ipynb
2.Open the notebook
jupyter notebook custom-cnn-using-data-augmentation.ipynb
3.Run all cells to:
•Load and preprocess dataset
•Train the CNN model
•Evaluate results and visualize performance
•Transfer Learning (VGG-16) yields slightly better generalization.
•Custom CNN provides a balance between performance and computational efficiency.
•Augmentation and normalization were key to achieving stable training.
•Both models successfully recognize ASL gestures with near-human accuracy.
•Extend to real-time ASL recognition (video streams).
•Experiment with other architectures (ResNet50, EfficientNet).
•Deploy via a web app or mobile app using TensorFlow Lite.
•Build a multi-language sign recognition model.
Muhammad Magdy Sobhy
- AI & Deep Learning Enthusiast | Computer Vision Researcher
📫 Links:
Passionate about building AI systems that enhance accessibility and human–computer interaction.