diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 36109b1d..6c1c2766 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -46,6 +46,10 @@ To contribute your code to the library, you'll want to [fork the repo](https://d Detailed development instructions, such as how to run the tests, are available in [DEVELOPMENT.md](DEVELOPMENT.md). +### Documentation and example images + +You can help grow the **Issue Examples Gallery** in the docs by adding compelling example images. Images are stored in the [cleanlab/assets](https://github.com/cleanlab/assets/tree/master/cleanvision/example_issue_images) repo (not in CleanVision). Add images there and update `docs/source/issue_examples.rst` with new ``.. figure::`` blocks and captions (dataset, label). See the "Adding more examples" section on that page for details. + --- If you have any questions about contributing to CleanVision, feel free to diff --git a/docs/source/index.rst b/docs/source/index.rst index 8c70e80b..b3f8a05a 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -139,9 +139,10 @@ Additional Resources :name: _tutorials How to Use CleanVision - tutorials/torchvision_dataset.ipynb - tutorials/huggingface_dataset.ipynb - Frequently Asked Questions + tutorials/torchvision_dataset.ipynb + tutorials/huggingface_dataset.ipynb + Frequently Asked Questions + Issue Examples Gallery .. _api-reference: .. toctree:: diff --git a/docs/source/issue_examples.rst b/docs/source/issue_examples.rst new file mode 100644 index 00000000..a96f4abd --- /dev/null +++ b/docs/source/issue_examples.rst @@ -0,0 +1,127 @@ +.. image:: https://raw.githubusercontent.com/cleanlab/assets/master/cleanlab/cleanvision_logo_open_source_transparent.png + :width: 800 + :alt: CleanVision + +Issue Examples Gallery +====================================== + +This page showcases a **continuously growing** list of compelling examples for each type of image issue detected by CleanVision. Images are organized by issue type; each caption includes dataset, label, and other metadata where available. + +**Image storage:** Example images are stored in the `cleanlab/assets `_ repository (not in the CleanVision repo). To add more examples, open a PR there with new images—e.g. from running CleanVision on new datasets, `Kaggle `_, or posts like `CleanVision for image data cleaning `_. + +.. contents:: + :local: + :backlinks: none + +Blurry +------ + +Images that are out of focus or have motion blur, making it difficult to extract meaningful features. + +.. figure:: https://raw.githubusercontent.com/cleanlab/assets/master/cleanvision/example_issue_images/blurry.png + :alt: Blurry image example + :width: 300 + + Example of blurry image detected by CleanVision. Add more examples from your datasets via the assets repo. + +Dark +---- + +Images that are underexposed or have very low brightness, making it difficult to discern content. + +.. figure:: https://raw.githubusercontent.com/cleanlab/assets/master/cleanvision/example_issue_images/dark.jpg + :alt: Dark image example + :width: 300 + + Example of dark/underexposed image. Dataset/label can be added in caption when contributing new images. + +Light +----- + +Images that are overexposed or too bright, causing loss of detail in bright areas. + +.. figure:: https://raw.githubusercontent.com/cleanlab/assets/master/cleanvision/example_issue_images/light.jpg + :alt: Light image example + :width: 300 + + Example of light/overexposed image. Add compelling examples with dataset and label in caption. + +Grayscale +--------- + +Images that lack color information. May be intentionally grayscale or incorrectly converted. + +.. figure:: https://raw.githubusercontent.com/cleanlab/assets/master/cleanvision/example_issue_images/grayscale.jpg + :alt: Grayscale image example + :width: 300 + + Example of grayscale image. Contribute more with metadata (dataset, label). + +Low Information +--------------- + +Images that contain very little visual information—e.g. simple graphics, blank images, or non-photographic content. + +.. figure:: https://raw.githubusercontent.com/cleanlab/assets/master/cleanvision/example_issue_images/low_information.png + :alt: Low information image example + :width: 300 + + Example of low-information image. Add examples from Caltech-256, CUB-200-2011, CIFAR-10, etc. with captions. + +Odd Aspect Ratio +---------------- + +Images with unusual width-to-height ratios that may cause issues in standard model training pipelines. + +.. figure:: https://raw.githubusercontent.com/cleanlab/assets/master/cleanvision/example_issue_images/odd_aspect_ratio.jpg + :alt: Odd aspect ratio example + :width: 300 + + Example of odd aspect ratio. Include dataset and label when adding more. + +Odd Size +-------- + +Images with unusual dimensions that may cause processing issues. + +.. figure:: https://raw.githubusercontent.com/cleanlab/assets/master/cleanvision/example_issue_images/odd_size.png + :alt: Odd size image example + :width: 300 + + Example of odd size image. Add more with metadata in caption. + +Exact Duplicates +---------------- + +Images that are pixel-identical copies of each other. + +.. figure:: https://raw.githubusercontent.com/cleanlab/assets/master/cleanvision/example_issue_images/exact_duplicates.png + :alt: Exact duplicate pair example + :width: 400 + + Example pair of exact duplicates. When adding separate image pairs, use captions like: Dataset: Caltech-256 | Label: class_name. + +Near Duplicates +--------------- + +Images that are highly similar but not pixel-identical—e.g. different lighting, crops, or slight edits. + +.. figure:: https://raw.githubusercontent.com/cleanlab/assets/master/cleanvision/example_issue_images/near_duplicates.png + :alt: Near duplicate pair example + :width: 400 + + Example pair of near duplicates. Add pairs with Dataset/Label (e.g. CUB-200-2011, Food-101) in captions. + +Adding more examples +-------------------- + +To grow this gallery: + +1. **Store images** in `cleanlab/assets `_ under ``cleanvision/example_issue_images/``. Use descriptive names (e.g. ``blurry_food101_burrito.jpg``, ``dark_2.jpg``). +2. **Update this file** (``docs/source/issue_examples.rst``) in the CleanVision repo: add a ``.. figure::`` block pointing to the raw URL ``https://raw.githubusercontent.com/cleanlab/assets/master/cleanvision/example_issue_images/`` and a caption with dataset, label, and any other metadata. +3. **Sources for images:** Run CleanVision on public datasets (e.g. Food-101, CIFAR-10, Caltech-256, CUB-200-2011), search `Kaggle `_, or use examples from blog posts and tutorials. We only need the **most compelling** examples per issue type, not every issue from every dataset. + +.. raw:: html + + + Star CleanVision