_____ ____ ____ ______ ________ _______
| __ \ / __ \| _ \ / __ \ \ / / ____|__ __|
| |__) | | | | |_) | | | \ \ / /| |__ | |
| _ /| | | | _ <| | | |\ \/ / | __| | |
| | \ \| |__| | |_) | |__| | \ / | |____ | |
|_| \_\\____/|____/ \____/ \/ |______| |_|
vet your robot datasets — before you waste the training run
You spent an evening teleoperating a robot. Before you spend a GPU-day training on those episodes, spend 30 seconds making sure they aren't lying to you. This is what a lying dataset looks like:
$ robovet doctor ./my_dataset
FAIL DATA-104 1 episode where metadata 'length' disagrees with the parquet
row count — the classic signature of a corrupted episode map.
FAIL STATS-302 1 stat block disagrees with the actual data — every training
run normalizes with these numbers.
WARN TIME-202 Loading this dataset requires tolerance_s ≥ 7.7e-03
(77× the default). Worst: episode 2, 7.29 ms off the grid.
FAIL META-502 Σ episode lengths = 1086 but info.json total_frames = 1037 —
the metadata contradicts itself before a single file is read.
5 fail · 4 warn · 23 pass
UNSAFE TO TRAIN — fix the FAILs first. (exit code 1 — CI-gate it)
pip install "robovet[video]"
robovet demo ./demo # builds a fake dataset with 10 real-world defects
robovet doctor ./demo # catches all of them, tells you which episode, exits 1
robovet fix ./demo --apply # repairs the metadata problems (.bak backups)
robovet doctor ./demo # the metadata FAILs are goneWant to see what healthy looks like? robovet demo ./d --clean builds the
same dataset with zero defects. There's a v3 flavor too: robovet demo ./d3 --v3.
① You just finished recording. Run robovet doctor ./my_task. Green
means train. Red means it tells you exactly which episodes are broken and
why — in plain English, with the issue number it reproduces. Most metadata
problems are one robovet fix ./my_task --apply away (it backs everything
up as .bak first).
② You found a dataset on the Hub and don't want to download 4 GB to find out it's broken.
pip install "robovet[hub]"
robovet doctor hf://lerobot/svla_so100_pickplaceThis pulls only the meta/ folder (usually under 1 MB) and cross-checks
the dataset's own ledger: does the episode↔frame index math add up, do the
counters match, are the per-episode stats stale, do the video time windows
fit. The nastiest corruption class (lerobot#2401) is visible from metadata
alone. To be clear about what this can't see: values, timestamps and video
decoding still need the files, so a remote pass says META CLEAN, never
CLEAN. (--meta-only also works on local paths when you want a one-second
pre-check.)
③ You want bad data blocked before it reaches your team's training runs.
robovet doctor exits 1 on any FAIL, so CI can gate dataset merges the same
way Codecov gates coverage:
name: robovet
on: [push, pull_request]
jobs:
vet:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: "3.12" }
- run: pip install "robovet[video]"
- run: robovet doctor ./datasets/my_task # FAIL blocks the merge④ You want to drop your worst episodes before training.
robovet score ./my_task --worst 10 # the 10 episodes to look at first
robovet score ./my_task --csv scores.csvEvery episode gets a 0–100 score from cheap, fast signals computed in one pass: jerky motion, long idle stretches, gripper chatter, weird durations, saturated actions, exact duplicates. It's a triage list, not a judge — look at the flagged episodes yourself before deleting anything. (The 2026 curation papers — rinse, Demo-SCORE, QoQ — all argue for exactly this kind of cheap smoothness-first pass before any expensive policy-based filtering.)
And when something goes wrong mid-training, start here:
| You hit | Look at | You get |
|---|---|---|
ValueError: timestamps … tolerance_s on load |
TIME-202 | the exact minimal tolerance_s, and which episode is worst |
| wrong frames / IndexError after a v2→v3 conversion | DATA-104/105 + META-501 | which episodes' ledgers lie, cross-checked three ways |
| TorchCodec/AV1 decode errors | VIDEO-403 | per-camera codec tiers and what to re-encode |
loss=NaN out of nowhere |
DATA-107 + STATS-302 | NaN/Inf locations and stale normalization stats |
Robot learning's bottleneck moved from models to data, and the data is
quietly broken. An April 2026 audit of 10 popular open robot datasets found
floating-point drift that breaks video decoding after ~45 episodes, a
v2.1→v3.0 conversion bug that silently scrambles which frames belong to
which episode (training "works" — on jumbled sequences), and datasets that
only load with tolerance_s cranked to 100× the default. Hugging Face's own
cleanup of community datasets found 111 of 240 failed validation — and
that pipeline is internal; you can't run it on yours. Meanwhile everyone
agrees a well-curated 500-demo fine-tune beats a sloppy one 10× the size.
The missing piece is tooling, and that's what this is.
Every check maps to a documented, real-world failure — the lerobot issue numbers are right there in the table below.
| Group | Catches | Maps to |
|---|---|---|
STRUCT-0xx |
missing/invalid metadata, dangling episodes, orphan files | lerobot#761 (no validator for hand-rolled conversions) |
DATA-1xx |
episode↔frame mapping corruption, schema drift, NaN/Inf, dead dims | lerobot#2401 (silent v2.1→v3.0 corruption) |
TIME-2xx |
off-grid timestamps with the exact tolerance_s you'd need, non-monotonic time, cumulative FP drift |
lerobot#933, lerobot#3177 |
STATS-3xx |
stored normalization stats that disagree with the data, broken quantile stats (q01/q99) | HF docs warning; phospho repair post; lerobot#2189 |
META-5xx |
the dataset's ledger contradicting itself — works without downloading the data | lerobot#2401 class, caught from metadata alone |
VIDEO-4xx |
video/parquet frame-count desync — including per-episode windows inside shared v3 files, codec tiers (h264 ✓ / AV1 info — it's lerobot's own default / mpeg4-hevc warn), fps mismatch | Correll-lab postmortem; phospho notes |
robovet fix is dry-run by default. With --apply it only rewrites
metadata — episode lengths, normalization stats, info.json counters. It backs
up every file it touches as .bak, it never modifies parquet or video
payloads, and it preserves everything it doesn't understand: your quantile
keys, image-stat blocks, episode tags. A repair tool must never be the thing
that deletes your data, and the test suite enforces every one of these
promises. Frame surgery (trimming desynced tails, re-gridding timestamps) is
planned under the same rules.
- v2.0/v2.1 and v3.x are both fully supported for diagnosis (each has its
own fixture and tests; v3 gets per-episode video alignment inside shared
files plus per-episode stats checks).
fixcurrently repairs v2.x metadata; v3 stats regeneration is planned. - robovet doesn't merge, split or delete episodes —
lerobotdoes that natively now. This tool does what the official stack doesn't: deep validation, metadata repair, and quality triage. - Local-first. Your data never leaves your disk.
from robovet import load_dataset, run_doctor, score_dataset
ds = load_dataset("./my_dataset")
rep = run_doctor(ds) # rep.exit_code, rep.results, rep.counts
sc = score_dataset(ds, scan=rep.scan) # reuses the same single IO passApache-2.0. Issues and broken-dataset war stories are very welcome — if your dataset breaks in a way robovet doesn't catch, that's exactly the bug report we want.