qapyq

^(CapPic)
AI-assisted media curator for large image/video datasets. Streamlined captioning, cropping, masking for LoRA/diffusion training workflows.

Features

🧿 Media Viewer: Display and navigate images and videos
- Quick-starting desktop application built with Qt
- Runs smoothly with a million files
- Modular interface that lets you place windows on different monitors
- Open multiple tabs
- Zoom/pan and fullscreen mode
- Gallery with thumbnails and captions ^?
- Semantic image sorting with text prompts ^?
- Compare two images ^?
- Measure size, area and pixel distances ^?
- Slideshow ^?
🎨 Image/Mask Editor: Prepare media for training
- Crop and save parts of images ^?
- Scale images, optionally using AI upscale models ^?
- Crop and scale videos, trimmed to exact frame count
- Dynamic save paths with template variables ^?
- Manually edit masks with multiple layers ^?
- Generate masks with AI models ^?
- Record masking operations into macros ^?
- VAE-encode images and check their latent representation ^?
📜 Captioning: Describe media with text
- Edit captions manually with drag-and-drop support ^?
- Save multiple captions in per-media JSON files ^?
- Multi-Edit Mode: Edit captions across multiple files simultaneously ^?
- Focus Mode: Add the same tags to many files quickly ^?
- Tag grouping, merging, sorting, filtering and replacement rules ^?
- Colored text highlighting
- Autocomplete with tags from your groups and CSV files ^?
- CLIP Token Counter ^?
- Automated captioning with support for grounding ^?
- Dynamic prompts with templates and text transformations ^?
- Multi-turn conversations with VLMs ^?
- Further refinement with LLMs
📐 Stats/Filters: Summarize your data and get an overview
- List all tags, media resolutions, masked regions, or size of concept folders ^?
- Filter media and create subsets ^?
- Combine and chain filters
- Export the summaries as CSV
🚀 Batch Processing: Process whole folders at once
- Flexible batch captioning, tagging and transformation ^?
- Batch scaling of images
- Batch masking with user-defined macros
- Batch cropping of images using your macros
- Copy, move and rename files, create symlinks, ZIP captions for backups
🔮 AI Assistance:
- Support for state-of-the-art captioning and masking models
- Model and sampling settings, GPU acceleration with CPU offload support
- On-the-fly NF4 and INT8 quantization
- Run inference locally and/or on multiple remote machines over SSH ^?
- Separate inference subprocess isolates potential crashes and allows complete VRAM cleanup

Supported Models

These are the supported architectures with links to the original models.
Find more specialized finetuned models on huggingface.co.

Tagging
Models for generating keyword captions for images and videos.
- JoyTag
- PixAI Tagger (onnx)
- WD (onnx) (eva02 recommended)
Captioning
Models for generating complete-sentence captions for images.
Those with 🎦 support video captioning.
- Florence-2
- Gemma 3 (GGUF), Gemma 4 (GGUF)
- InternVL2, InternVL2.5, InternVL2.5-MPO, InternVL3, InternVL3.5 🎦 (Github Format)
- JoyCaption
- MiniCPM-V-2.6 (GGUF), MiniCPM-o-2.6 (GGUF), MiniCPM-V-4 (GGUF)
- Molmo
- Moondream2 (GGUF)
- Ovis1.6, Ovis2, Ovis2.5
- Qwen2-VL, Qwen2.5-VL, Qwen3-VL 🎦 (Instruct/Thinking)
LLM
Models for transforming existing captions/tags.
- Any model in GGUF format with embedded chat template (llama-cpp backend).
Upscaling
Models for resizing images to higher resolutions.
- Architectures supported by the spandrel backend.
- Find more models at openmodeldb.info.
Masking
Models for generating greyscale masks.
- Box Detection
  - YOLO/Adetailer detection models
    - Search for YOLO models on huggingface.co.
  - Florence-2
  - Qwen3-VL
- Segmentation / Background Removal
  - InSPyReNet (Plus_Ultra)
  - RMBG-2.0
  - Florence-2
Embedding
Models for sorting images by their similarity to a prompt.
- CLIP
- SigLIP
- SigLIP (ONNX), SigLIP2-giant-opt (ONNX)
  (recommended: largest text model + fp16 vision model)
VAE
Models for previewing the latent representation of images.
- Flux.1, Flux.2, SD1.5, SDXL, Qwen

Setup

Download this repository or clone it with git:
- git clone https://github.com/FennelFetish/qapyq.git
Run setup.sh on Linux, setup.bat on Windows.
- Packages are installed into a virtual environment.

The setup script will ask you which components to install:

On Linux, it lets you to choose between installed Python versions.
FlashAttention is optional for most models but recommended for speed.
You can choose to install only the GUI and media processing packages without AI assistance.
When installing on a headless server for remote inference, you can choose to install only the backend.

If the setup scripts didn't work for you, but you manually got it running, please share your solution and raise an issue.

Dependencies

Requires Python 3.10 or later

Python 3.14 can run the GUI, but may cause issues with setup and slow inference.
- Consider installing a lower Python version (3.12 for widest compatibility).

External Dependencies

To run GGUF models with llama-cpp-python you may need to install CUDA runtime libraries
(e.g. libcudart12, libcublas12 from Ubuntu's package manager).
For exporting videos you'll need ffmpeg added to your PATH environment variable.

Compute Platform

During setup, select the compute platform that matches your system. In combination with the Python version, the platform affects the version and availability of prebuilt wheels:

Platform	`torch`	`onnxruntime` ¹	`llama-cpp-python` ²	`flash_attn` ³
CUDA 12.6	2.11	`onnxruntime-gpu` Python 3.10 - 3.14	for CUDA 12.4 Linux: Python 3.10 - 3.14	Python 3.10 - 3.14
CUDA 12.8	2.11	`onnxruntime-gpu` Python 3.10 - 3.14	for CUDA 12.4 Linux: Python 3.10 - 3.14	Python 3.10 - 3.14
CUDA 13.0	2.11	`onnxruntime-gpu` ^(nightly) Python 3.11 - 3.14	for CUDA 12.4 Linux: Python 3.10 - 3.14	Python 3.10 - 3-14
ROCm 6.4	2.9	`onnxruntime-rocm` Python 3.10 and 3.12	🚫	🚫
ROCm 7.2	2.11	`onnxruntime-migraphx` Python 3.10 and 3.12	🚫	🚫
CPU	2.11	`onnxruntime` Python 3.10 - 3.14	0.3.19 Python 3.10 - 3.13	🚫

¹ For running WD/PixAI tagging models, YOLO detection and semantic sorting
² For running GGUF models
³ Improves inference speed

Startup

Linux: run.sh
Windows: run.bat or run-console.bat

You can open files or folders directly in qapyq by associating the file types with the respective run script in your OS. For shortcuts, icons are available in the qapyq/res folder.

Update

If you cloned the repository with git, simply use git pull to update.
If you downloaded the repository as a zip archive, download it again and replace the installed files.

To update the installed packages in the virtual environment, run the setup script again.

New dependencies may be added. If the program fails to start or crashes, run the setup script to install the missing packages.

User Guide

More information is available in the Wiki.
Use the page index on the right side to navigate and find topics.
Or click on the ^? in the feature list above.

How to:

Setup and configure AI models: Model Setup
Use qapyq: User Guide
Caption with qapyq: Captioning
Use qapyq's features in a workflow: Tips and Workflows

If you have questions, please ask in the Discussions.

Name		Name	Last commit message	Last commit date
Latest commit History 631 Commits
batch		batch
caption		caption
gallery		gallery
host		host
infer		infer
lib		lib
requirements		requirements
res		res
stats		stats
test		test
tools		tools
ui		ui
user		user
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
config.py		config.py
main.py		main.py
main_host.py		main_host.py
main_inference.py		main_inference.py
main_setup.py		main_setup.py
run-console.bat		run-console.bat
run-host.bat		run-host.bat
run-host.sh		run-host.sh
run.bat		run.bat
run.sh		run.sh
setup.bat		setup.bat
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

qapyq

Features

Supported Models

Setup

Dependencies

Requires Python 3.10 or later

External Dependencies

Compute Platform

Startup

Update

User Guide

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

qapyq

Features

Supported Models

Setup

Dependencies

Requires Python 3.10 or later

External Dependencies

Compute Platform

Startup

Update

User Guide

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages