Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions .github/ISSUE_TEMPLATE/bug-report.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
name: Bug Report
description: Problems with MegaDetector-Classifier
labels: [bug]
body:
- type: markdown
attributes:
value: |
Thank you for submitting a Bug Report!

- type: checkboxes
attributes:
label: Search before asking
description: >
Please search the [issues](https://github.com/microsoft/MegaDetector-Classifier/issues) to see if a similar bug report already exists.
options:
- label: >
I have searched the MegaDetector-Classifier [issues](https://github.com/microsoft/MegaDetector-Classifier/issues) and found no similar bug report.
required: true

- type: textarea
attributes:
label: Bug
description: Provide console output with error messages and/or screenshots of the bug.
placeholder: |
💡 ProTip! Include as much information as possible (error messages, screenshots, logs, tracebacks, etc.) to receive the most helpful response.
validations:
required: true

- type: textarea
attributes:
label: Environment
description: Please specify the software and hardware you used to produce the bug.
placeholder: |
- PytorchWildlife: 1.3.0
- OS: Ubuntu 22.04
- Python: 3.10.0
- CUDA: 12.1 (or CPU)
validations:
required: false

- type: textarea
attributes:
label: Minimal Reproducible Example
description: >
This is referred to by community members as creating a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example).
placeholder: |
```
# Code to reproduce your issue here
```
validations:
required: false

- type: textarea
attributes:
label: Additional
description: Anything else you would like to share?

- type: checkboxes
attributes:
label: Are you willing to submit a PR?
description: >
(Optional) We encourage you to submit a [Pull Request](https://github.com/microsoft/MegaDetector-Classifier/pulls) (PR) to help contribute to MegaDetector-Classifier for everyone, especially if you have a good understanding of how to implement a fix or feature.
options:
- label: Yes I'd like to help by submitting a PR!
48 changes: 48 additions & 0 deletions .github/ISSUE_TEMPLATE/feature-request.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
name: Feature Request
description: Suggest an enhancement for MegaDetector-Classifier
labels: [enhancement]
body:
- type: markdown
attributes:
value: |
Thank you for submitting a MegaDetector-Classifier Feature Request!

- type: checkboxes
attributes:
label: Search before asking
description: >
Please search the [issues](https://github.com/microsoft/MegaDetector-Classifier/issues) to see if a similar feature request already exists.
options:
- label: >
I have searched the MegaDetector-Classifier [issues](https://github.com/microsoft/MegaDetector-Classifier/issues) and found no similar feature request.
required: true

- type: textarea
attributes:
label: Description
description: A short description of your feature.
placeholder: |
What new feature would you like to see in MegaDetector-Classifier?
validations:
required: true

- type: textarea
attributes:
label: Use case
description: |
Describe the use case of your feature request. It will help us understand and prioritize the feature request.
placeholder: |
How would this feature be used, and who would use it?

- type: textarea
attributes:
label: Additional
description: Anything else you would like to share?

- type: checkboxes
attributes:
label: Are you willing to submit a PR?
description: >
(Optional) We encourage you to submit a [Pull Request](https://github.com/microsoft/MegaDetector-Classifier/pulls) (PR) to help contribute to MegaDetector-Classifier for everyone, especially if you have a good understanding of how to implement a fix or feature.
options:
- label: Yes I'd like to help by submitting a PR!
32 changes: 32 additions & 0 deletions .github/ISSUE_TEMPLATE/question.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: Question
description: Ask a MegaDetector-Classifier question
labels: [question]
body:
- type: markdown
attributes:
value: |
Thank you for asking a general question!

- type: checkboxes
attributes:
label: Search before asking
description: >
Please search the [issues](https://github.com/microsoft/MegaDetector-Classifier/issues) to see if a similar question already exists.
options:
- label: >
I have searched the MegaDetector-Classifier [issues](https://github.com/microsoft/MegaDetector-Classifier/issues) and found no similar question.
required: true

- type: textarea
attributes:
label: Question
description: What is your question?
placeholder: |
💡 ProTip! Include as much information as possible to receive the most helpful response.
validations:
required: true

- type: textarea
attributes:
label: Additional
description: Anything else you would like to share?
28 changes: 28 additions & 0 deletions .github/workflows/deploy-docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
name: Deploy MkDocs site

on:
push:
branches:
- main
paths:
- 'docs/**'
- 'mkdocs.yml'
- 'docs-requirements.txt'

jobs:
deploy:
runs-on: ubuntu-latest
permissions:
contents: write
steps:
- uses: actions/checkout@v4

- uses: actions/setup-python@v5
with:
python-version: '3.10'

- name: Install MkDocs dependencies
run: pip install -r docs-requirements.txt

- name: Deploy to GitHub Pages
run: mkdocs gh-deploy --force
24 changes: 24 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
__pycache__/
*.py[cod]
*$py.class
*.egg-info/
dist/
build/
*.egg
.eggs/
*.so
.env
.venv
env/
venv/
.tox/
.coverage
htmlcov/
.pytest_cache/
*.log
.DS_Store
Thumbs.db
Brewfile
site/
archive/
*.code-workspace
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2026 Microsoft

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
165 changes: 162 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,162 @@
# Repository setup required :wave:

Please visit the website URL :point_right: for this repository to complete the setup of this repository and configure access controls.
# MegaDetector-Classifier

Microsoft AI for Good Lab's open-source classification fine-tuning tool — train custom species classifiers on your own camera-trap datasets and deploy them through PyTorch-Wildlife.

[![License](https://img.shields.io/github/license/microsoft/MegaDetector-Classifier)](https://github.com/microsoft/MegaDetector-Classifier/blob/main/LICENSE)
[![Python 3.9+](https://img.shields.io/badge/python-3.9%2B-blue.svg)](https://www.python.org/downloads/)
[![PyTorch-Wildlife](https://img.shields.io/badge/PyTorch--Wildlife-ecosystem-green.svg)](https://github.com/microsoft/Biodiversity)

MegaDetector-Classifier is part of the [microsoft/Biodiversity](https://github.com/microsoft/Biodiversity) ecosystem and is powered by the [PyTorch-Wildlife](https://github.com/microsoft/PytorchWildlife) framework. It is free, open-source, and available under the MIT license.

## Part of the Biodiversity Ecosystem

MegaDetector-Classifier is one tool in a larger open-source ecosystem from the Microsoft AI for Good Lab. Each project lives in its own repository, with the [microsoft/Biodiversity](https://github.com/microsoft/Biodiversity) umbrella tying them together.

| Repository | Description |
|---|---|
| [microsoft/Biodiversity](https://github.com/microsoft/Biodiversity) | The umbrella repository — documentation hub for the AI for Good Lab's biodiversity work |
| [microsoft/MegaDetector](https://github.com/microsoft/MegaDetector) | Animal, human, and vehicle detection for camera-trap images |
| [microsoft/PytorchWildlife](https://github.com/microsoft/PytorchWildlife) | The collaborative deep learning framework that hosts MegaDetector, species classifiers, and demo notebooks |
| [microsoft/MegaDetector-Acoustic](https://github.com/microsoft/MegaDetector-Acoustic) | Bioacoustic AI for audio-based wildlife detection and classification |
| [microsoft/MegaDetector-Classifier](https://github.com/microsoft/MegaDetector-Classifier) | **This repo** — classification fine-tuning for camera-trap species identification |
| [microsoft/SPARROW](https://github.com/microsoft/SPARROW) | Solar-Powered Acoustic and Remote Recording Observation Watch — AI-enabled edge device for field recording |
| [SPARROW-Studio](https://github.com/microsoft/Biodiversity/tree/main/SPARROW-Studio) | Desktop application wrapping all AI for Good Lab models in a graphical interface |

## Overview

MegaDetector-Classifier is a training toolkit for fine-tuning ResNet-based species classifiers on custom camera-trap image datasets. The output weights integrate directly with the [PyTorch-Wildlife](https://github.com/microsoft/PytorchWildlife) framework, making it straightforward to deploy a classifier trained on your own data.

**Key capabilities:**
- ResNet-18 and ResNet-50 classifier training using PyTorch Lightning
- Three data-splitting strategies designed for camera-trap realities: random, location-based, and sequence-based
- YAML-based configuration — no code changes required for most use cases
- Demo data included for immediate testing without your own dataset

**Designed for:**
- Conservation practitioners adapting existing classifiers to new geographic regions
- Researchers adding new species to the PyTorch-Wildlife model zoo
- Projects running MegaDetector detection upstream and needing a matched classifier downstream

## Installation

### Using pip

```bash
git clone https://github.com/microsoft/MegaDetector-Classifier
cd MegaDetector-Classifier
pip install -r requirements.txt
```

### Using conda

```bash
git clone https://github.com/microsoft/MegaDetector-Classifier
cd MegaDetector-Classifier
conda env create -f environment.yaml
conda activate PT_Finetuning
```

**Requirements:** Python 3.9+

## Quick Start

1. Configure `configs/config.yaml` — set `dataset_root`, `annotation_dir`, `num_classes`, and `split_type`
2. Run training:

```bash
python main.py
```

Output weights are saved to the `weights/` directory and can be loaded directly into PyTorch-Wildlife.

## Data Preparation

### Data Structure

Images should be stored in a single flat directory (no nested subdirectories). An `annotations.csv` file — placed outside the images directory — maps each image to its class:

```plaintext
MegaDetector-Classifier/
├── data/
│ ├── imgs/ # All images stored here (flat)
│ └── annotation_example.csv # Annotations file
└── configs/config.yaml
```

### Annotation File Format

The CSV must contain three columns:

| Column | Description | Example |
|---|---|---|
| `path` | Relative path to the image | `imgs/leopard_001.jpg` |
| `classification` | Integer class ID | `0` |
| `label` | Human-readable class name | `leopard` |

### Data Splitting

MegaDetector-Classifier supports three splitting strategies, selected via `split_type` in `config.yaml`:

| Strategy | When to use | Extra column required |
|---|---|---|
| `random` | Balanced class distribution; not recommended for camera-trap bursts | None |
| `location` | Keeps all images from one camera location in the same split | `Location` |
| `sequence` | Groups burst images within 30-second windows before splitting | `Photo_time` (YYYY-MM-DD HH:MM:SS) |

> **Camera-trap note:** Random splitting is not recommended because burst images of the same animal can appear in both training and validation sets, causing artificially high validation accuracy. Use `location` or `sequence` splitting instead.

### Demo Data

Download demo data to test the pipeline without your own dataset:

```bash
# Download and extract
wget https://zenodo.org/records/15376499/files/demo_data_clf.zip
unzip demo_data_clf.zip -d data/
```

Then set `dataset_root: ./data/imgs` in `configs/config.yaml` and run `python main.py`.

## Repository Structure

```
MegaDetector-Classifier/
├── main.py # Training entry point
├── requirements.txt # pip dependencies
├── environment.yaml # conda environment
├── configs/
│ └── config.yaml # Training configuration
└── src/
├── algorithms/
│ └── plain.py # Training algorithm (PyTorch Lightning)
├── datasets/
│ └── custom.py # Custom dataset loader
├── models/
│ └── plain_resnet.py # ResNet-18/50 classifier
└── utils/
├── batch_detection_cropping.py # MegaDetector crop integration
├── data_splitting.py # Random / location / sequence splits
└── utils.py # Shared utilities
```

## Citation

If you use MegaDetector-Classifier in your research, please cite the PyTorch-Wildlife paper:

```bibtex
@misc{hernandez2024pytorchwildlife,
title={Pytorch-Wildlife: A Collaborative Deep Learning Framework for Conservation},
author={Andres Hernandez and Zhongqi Miao and Luisa Vargas and Sara Beery and Rahul Dodhia and Juan Lavista},
year={2024},
eprint={2405.12930},
archivePrefix={arXiv},
}
```

You can also cite this software directly using the [`citation.cff`](citation.cff) file in this repository.

## Contributing

Issues, feature requests, and pull requests are welcome at [microsoft/MegaDetector-Classifier/issues](https://github.com/microsoft/MegaDetector-Classifier/issues).

For framework-level changes (PyTorch-Wildlife API, models, datasets), see [microsoft/PytorchWildlife](https://github.com/microsoft/PytorchWildlife). For ecosystem-wide questions, see the [microsoft/Biodiversity](https://github.com/microsoft/Biodiversity) umbrella.
Loading