This repository presents a biomedical engineering project focused on blood smear microscopy image classification using image processing and deep learning.
The project classifies white blood cell microscopy images into four categories and also supports a secondary binary classification task based on nuclear morphology.
This project is intended for educational and portfolio purposes. It is not a medical diagnostic tool and should not be used for clinical decision-making.
The objective of this project is to support automated blood cell analysis through biomedical image processing and convolutional neural networks.
- Eosinophil
- Lymphocyte
- Monocyte
- Neutrophil
- Mononuclear cells
- Polynuclear cells
This project is relevant to biomedical engineering because it addresses automated microscopy image interpretation, a task associated with hematology support systems, digital pathology, and AI-assisted clinical workflows.
This project uses the Blood Cell Images dataset by Paul Mooney, available on Kaggle.
The dataset contains approximately 12,500 augmented blood cell images distributed across four white blood cell classes:
EOSINOPHILLYMPHOCYTEMONOCYTENEUTROPHIL
Expected local dataset structure:
data/
└── dataset2-master/
└── dataset2-master/
└── images/
├── TRAIN/
│ ├── EOSINOPHIL/
│ ├── LYMPHOCYTE/
│ ├── MONOCYTE/
│ └── NEUTROPHIL/
└── TEST/
├── EOSINOPHIL/
├── LYMPHOCYTE/
├── MONOCYTE/
└── NEUTROPHIL/
The full dataset is not included in this repository. Please download it from the original dataset source and place it locally inside the data/ folder.
The dataset structure was verified with the following image counts:
| Split | Eosinophil | Lymphocyte | Monocyte | Neutrophil | Total |
|---|---|---|---|---|---|
| Train | 2497 | 2483 | 2478 | 2499 | 9957 |
| Test | 623 | 620 | 620 | 624 | 2487 |
| Total | 3120 | 3103 | 3098 | 3123 | 12444 |
The implemented workflow includes:
- Loading microscopy images from the dataset folders.
- Resizing images to
80 x 60pixels. - Converting images from BGR to RGB.
- Normalizing pixel values to the range
[0, 1]. - Encoding labels for multiclass or binary classification.
- Training a convolutional neural network using TensorFlow/Keras.
- Evaluating performance with accuracy, loss curves, classification reports, and confusion matrices.
The CNN model includes:
- Convolutional layers
- Batch normalization
- Max pooling
- Dropout regularization
- Dense classification layers
- Softmax output
The model is implemented in:
src/model.py
.
├── README.md
├── LICENSE
├── requirements.txt
├── .gitignore
├── assets/
│ ├── figures/
│ └── results/
├── data/
│ └── README.md
├── docs/
│ ├── dataset_notes.md
│ ├── methodology.md
│ └── project_summary.md
├── models/
├── notebooks/
├── outputs/
│ └── README.md
└── src/
├── README.md
├── __init__.py
├── check_dataset.py
├── config.py
├── data_loader.py
├── dataset_utils.py
├── main.py
├── model.py
├── predict.py
├── smoke_test.py
├── train.py
└── visualization.py
Clone the repository and install the required dependencies:
pip install -r requirements.txtIf OpenCV fails on Windows, reinstall it with:
pip uninstall opencv-python opencv-contrib-python -y
pip install opencv-pythonpython src/check_dataset.py --data-dir data/dataset2-masterpython src/smoke_test.py --data-dir data/dataset2-master --task multiclass --samples-per-class 5python src/main.py --data-dir data/dataset2-master --task multiclass --epochs 20python src/main.py --data-dir data/dataset2-master --task nuclear --epochs 20python src/predict.py --model outputs/trained_model.keras --image path/to/image.jpegAfter training, the project generates:
outputs/
├── accuracy_curve.png
├── loss_curve.png
├── confusion_matrix.png
├── classification_report.txt
├── dataset_summary.json
├── training_logs.npy
└── trained_model.keras
- Biomedical image processing
- Microscopy image analysis
- Deep learning with CNNs
- TensorFlow/Keras model training
- Image preprocessing with OpenCV
- Model evaluation with confusion matrices and classification reports
- Healthcare-oriented AI workflow
- Dataset structure verified.
- Image loading and preprocessing pipeline verified.
- CNN training pipeline implemented.
- Output generation configured for learning curves, classification reports, and confusion matrices.
- Add Grad-CAM or saliency map visualizations.
- Compare CNN results with transfer learning architectures.
- Add a lightweight web demo for image inference.
- Improve model reproducibility with fixed seeds and experiment tracking.
- Evaluate model performance on external blood smear datasets.
This repository is licensed under the MIT License.
Dataset rights belong to the original dataset creators. Please refer to the original dataset page for usage rights and redistribution conditions.


