Automatically generate bounding box or oriented box labels for object detection datasets using Grounding DINO and SAM2.
python3 -m venv .venv
source .venv/bin/activate# 1. Install dependencies
pip install -r requirements.txt
# 2. Clone and install SAM2
git clone https://github.com/facebookresearch/segment-anything-2.git
cd segment-anything-2 && pip install -e . && cd ..python auto_label.py \
--input ./output \
--output ./yolo_dataset \
--prompts prompts.yaml \
--bbox-format yolo \
--sample-rate 10
or for yolo obb use
python auto_label.py \
--input ./output \
--output ./yolo_dataset_obb \
--prompts prompts.yaml \
--bbox-format obb \
--sample-rate 10
- Use
--sample-rate Nto process every Nth image (default: 10, use 1 for all images)
yolo detect train data=yolo_dataset/dataset.yaml model=yolov8n.pt epochs=100 imgsz=640
---
## Extracting Frames from MCAP Files
```bash
python extract_data.py
By default, reads from ./data and writes to ./output. Override with environment variables:
DATA_ROOT=./my_data OUTPUT_ROOT=./my_output python extract_data.pyInput:
data/
├── class_name_1/
│ └── recording.mcap
└── class_name_2/
└── recording.mcap
Output:
output/
├── class_name_1/
│ └── rgb/
└── class_name_2/
└── rgb/
The output is ready for auto_label.py --input ./output.
| File | Description |
|---|---|
extract_data.py |
Extract RGB/depth frames from MCAP files. |
auto_label.py |
Main auto-labeling script. Generates YOLO or OBB labels from images. |
prompts.yaml |
Text prompts corresponding to each class. Required for Grounding DINO. |
view_dataset.py |
Optional visualization tool for verifying labels. |
python auto_label.py \
--input ./output \
--output ./yolo_dataset \
--prompts prompts.yaml \
--bbox-format yolo| Option | Description |
|---|---|
--bbox-format yolo |
Axis-aligned YOLO boxes |
--bbox-format obb |
Rotated oriented bounding boxes |
--device cuda |
Use GPU if available (defaults to CPU) |
After running auto_label.py:
yolo_dataset/
├── images/
│ ├── train/
│ └── val/
├── labels/
│ ├── train/
│ └── val/
└── dataset.yaml
python view_dataset.py --data ./yolo_dataset --prompts ./prompts.yamlZero-shot bounding box for obb seems to be around 80% accurate and 70% for yolo_obb. This is directly impacted by the quality of your prompt. Regardless, this GUI visualizer allows you to both view the labelled data from the auto labeler and fix the bounding boxes.
A juypter notebook has been provided as an example which can be uploaded to google colab along with a zip of the dataset generated by auto_label.py. Otherwise the model can be trained locally with the following commands:
yolo detect train \
data=yolo_dataset/dataset.yaml \
model=yolov8n.pt \
epochs=100 \
imgsz=640| Model | Description |
|---|---|
yolov8n.pt |
Nano (fastest) |
yolov8s.pt |
Small |
yolov8m.pt |
Medium |
yolov8l.pt |
Large |
yolov8x.pt |
Extra Large (most accurate) |
- Ensure
prompts.yamlmatches all class names in your dataset - By default, every 10th image is processed; adjust
files[::10]inauto_label.pyif needed - GPU acceleration is highly recommended

