A small project scaffold and pipeline for working with Endo-FM (foundation model for endoscopy video analysis).
This repository provides a lightweight orchestration around the Endo-FM codebase to:
- extract temporal frames around annotated timestamps (automatic cropping),
- filter out low-quality frames (blurred, noisy, empty), and
- run inference with a pretrained Endo-FM model to produce features or predictions.
The intent is to produce a clean set of artifacts your application can consume (images, QC report, model outputs) and a single JSON summary that points to these artifacts.
- Top-level
code/contains pipeline scripts:automatic_croping.py,quality_control_of_images.py,model.py, and the orchestrationmain.py. - The official Endo-FM implementation is included under
code/Endo-FM/(submodule/clone). It contains model definitions, configs, and utilities. - A Windows-friendly conda file
code/Endo-FM/environment-windows.yamlis provided; for training or full reproducibility we recommend WSL2 / a Linux environment.
code/— pipeline scripts and the Endo-FM code (undercode/Endo-FM/).inputs/— place raw uploads or videos here (tracked via.gitkeep).outputs/— pipeline outputs and run artifacts (created per run).docs/— project documentation and usage notes.
-
Prepare environment
-
Best: use WSL2 (Ubuntu) or a Linux server for compatibility with training scripts and bash helpers.
-
For Windows-only/testing, create the provided Windows environment:
cd code\Endo-FM conda env create -f environment-windows.yaml conda activate endofm-windows
-
-
Place the pretrained checkpoint
- Download your Endo-FM checkpoint and save it to
code/Endo-FM/checkpoints/. - Example path used by the pipeline:
code/Endo-FM/checkpoints/endofm.pth.
- Download your Endo-FM checkpoint and save it to
-
Run the pipeline
python code\main.py --video inputs\case1.mp4 --timestamps inputs\case1_timestamps.txt \ --out-root outputs\run1 --window-sec 1 --checkpoint code\Endo-FM\checkpoints\endofm.pth --device cpu --move-rejected
The script executes three stages (cropping → QC → model inference) and writes
pipeline_summary.jsoninto the--out-rootdirectory. That summary is the single file your app should read to locate outputs.
After a run you will find (example outputs/run1):
frames/— extracted original and cropped images per timestampframes/crop_results.json— metadata with frame indices, crop rectangles and saved filenamesqc_report.json— per-group QC report listing accepted/rejected frames and reasonsmodel_output.json— model features or logits for each grouprejected/— (optional) moved rejected imagespipeline_summary.json— single-source-of-truth JSON pointing to the above files
Read pipeline_summary.json to locate and serve artifacts to your application.
- The Endo-FM repository under
code/Endo-FM/is the original project; use itsenvironment.yamlfor reproduction on Linux, or the providedenvironment-windows.yamlfor Windows convenience. - The pipeline scripts favor subprocess-based orchestration to avoid import/path surprises with the Endo-FM code. If you'd like tighter integration (in-process inference), we can adapt
model.pyto be importable. - Thresholds used by QC are conservative defaults — tune them in
code/quality_control_of_images.pyor pass CLI overrides where supported. - Always verify the checkpoint you downloaded matches the model architecture/config; mismatches may still load but yield unexpected outputs.
- Add a small
docs/USAGE-APP.mddescribing how an external app should readpipeline_summary.jsonand stream artifacts. - Convert the pipeline to run in-process (no subprocess calls) for lower latency.
- Add unit tests for the QC heuristics and a small end-to-end smoke test using a short sample video.
If you'd like one of those, tell me which and I'll implement it next.
Licensed for internal hackathon use.