This repository contains our solution to the 3-MD-4040 2026 ZooCAM Challenge, a private Kaggle competition organized within the Deep Learning course at CentraleSupélec.
Our team Accord Italie France achieved 1st place on the final leaderboard.
The goal of the challenge was to classify plankton organisms from images captured by a ZooCAM imaging system.
Automatic plankton classification is a difficult task due to the large variability in morphology and imaging conditions. Modern imaging devices generate extremely large datasets that require automated analysis using machine learning and deep learning techniques.
The dataset provided for the challenge contained:
- 1,215,213 images
- ~1,093,000 training images
- 86 classes
The test labels were not available to participants.
The dataset was highly imbalanced:
- Largest class: ~300,000 images
- Smallest class: 73 images
To address this, we used:
- WeightedRandomSampler
- sampling weight controlled by α ∈ [0.25, 0.5]
This helped reduce the dominance of the largest classes during training.
Images had extremely heterogeneous resolutions:
- minimum: 5 × 5
- maximum: 1288 × 1288
Images were therefore rescaled to a fixed resolution during preprocessing to allow batch training with CNN architectures.
Submissions were evaluated using Macro F1-score, which gives equal importance to each class, including rare plankton species.
Competition statistics:
- 39 participants
- 14 teams
- 639 total submissions
Our team achieved:
| Leaderboard | Score |
|---|---|
| Public leaderboard | 0.80506 |
| Private leaderboard | 0.79164 |
🏆 Final rank: 1st place
Notably, we achieved this result with only 15 submissions, indicating strong offline validation and careful experimentation.
Our solution is based on training multiple convolutional neural networks from scratch and combining them through a weighted ensemble.
We trained the following architectures:
- ResNet50
- EfficientNet-B3
- ConvNeXt-Tiny
All models were trained from scratch on the ZooCAM dataset.
Training setup:
- Loss: CrossEntropyLoss
- Label smoothing between 0.05 and 0.1
- WeightedRandomSampler for class imbalance
- Strong data augmentation
- Validation-based model selection
Each model prediction was improved using Test Time Augmentation, allowing more robust predictions by averaging outputs across multiple augmented versions of the same image.
The final prediction is obtained through a weighted ensemble of three models.
For each sample we compute the weighted sum of the logits produced by:
- ResNet50
- EfficientNet-B3
- ConvNeXt-Tiny
The final class prediction is obtained after applying softmax to the aggregated logits.
This ensemble strategy significantly improved performance compared to individual models.
- analysis/
- outputs/
- src/torchtmpl/
- config-*.yaml
This folder contains exploratory analysis of the dataset used to better understand its characteristics before training.
The analyses include:
- computation of dataset mean and standard deviation (images are grayscale)
- image size distribution analysis
- class distribution analysis to highlight the strong class imbalance
These analyses helped guide preprocessing, sampling strategies, and training design.
This folder contains the CSV prediction files generated during inference and used for Kaggle submissions.
Each file corresponds to the predictions produced by a specific model or ensemble configuration.
This directory contains the main training and inference framework used in the project.
Implementation of all the deep learning architectures tested during the competition, including the final models used in the ensemble:
- ResNet50
- EfficientNet-B3
- ConvNeXt-Tiny
Handles the dataset pipeline, including:
- dataset creation
- dataloaders
- preprocessing
- data augmentation transforms
Entry points for running the training and testing pipelines.
These scripts load the configuration files, initialize models, and start training or inference.
Utilities for training optimization:
- loss functions
- optimizers
- learning rate schedulers
Training utilities including:
- single epoch training loop
- validation/testing routines
- model checkpoint saving
- logging utilities
- Test Time Augmentation (TTA) implementation
Implements the ensemble strategy used for the final submission.
The final prediction is obtained by computing a weighted sum of the logits produced by multiple models before applying softmax.
Configuration files describing training and testing setups.
Each model has its own configuration file specifying:
- model architecture
- hyperparameters
- optimizer and scheduler
- training settings
- inference parameters
https://www.youtube.com/@GiorgioBono
For a local experimentation, you start by setting up the environment :
python3 -m virtualenv venv
source venv/bin/activate
python -m pip install .
Then you can run a training, by editing the yaml file, then
python -m torchtmpl.main config-file.yaml train
And for testing
python main.py config-file.yaml test
Training and validation metrics were tracked using Weights & Biases (wandb).
This allowed us to:
- monitor training dynamics
- compare models
- track hyperparameter experiments
- analyze validation performance
Some example training and validation curves are shown below.