Easy Reinforcement Learning for Diffusion and Flow-Matching Models
We have added the latest FLUX2-Klein series! Follow the commands to start:
# Clone the repo with submodule `diffusers`
git clone --recursive https://github.com/X-GenGroup/Flow-Factory.git
cd Flow-Factory
# Fetch the source code of `diffusers==0.37.0.dev`
git submodule update --init --recursive
# Install `diffusers==0.37.0.dev`
cd diffusers
pip install -e .
# Install Flow-Factory
cd ..
pip install -e .| Task | Model | Model Size | Model Type |
|---|---|---|---|
| Text-to-Image | FLUX.1-dev | 13B | flux1 |
| Z-Image-Turbo | 12B | z-image | |
| Qwen-Image | 20B | qwen-image | |
| Qwen-Image-2512 | 20B | qwen-image | |
| Image-to-Image | FLUX.1-Kontext-dev | 13B | flux1-kontext |
| Image(s)-to-Image | Qwen-Image-Edit-2509 | 20B | qwen-image-edit-plus |
| Qwen-Image-Edit-2511 | 20B | qwen-image-edit-plus | |
| Text-to-Image & Image(s)-to-Image | FLUX.2-dev | 30B | flux2 |
| FLUX.2-klein-4B | 4B | flux2-klein | |
| FLUX.2-klein-9B | 9B | flux2-klein | |
| FLUX.2-klein-base-4B | 4B | flux2-klein | |
| FLUX.2-klein-base-9B | 9B | flux2-klein | |
| Text-to-Video | Wan2.1-T2V-1.3B | 1.3B | wan2_t2v |
| Wan2.1-T2V-14B | 14B | wan2_t2v | |
| Wan2.2-T2V-A14B | A14B | wan2_t2v | |
| Image-to-Video | Wan2.1-I2V-14B-480P | 14B | wan2_i2v |
| Wan2.1-I2V-14B-480P | 14B | wan2_i2v | |
| Wan2.1-I2V-14B-720P | 14B | wan2_i2v | |
| Wan2.2-I2V-A14B | A14B | wan2_i2v |
| Algorithm | trainer_type |
|---|---|
| GRPO | grpo |
| GRPO-Guard | grpo-guard |
| GDPO | gdpo |
| DiffusionNFT | nft |
See Algorithm Guidance for more information.
git clone https://github.com/Jayce-Ping/Flow-Factory.git
cd Flow-Factory
pip install -e .Optional dependencies, such as deepspeed, are also available. Install them with:
pip install -e .[deepspeed]To use Weights & Biases or SwanLab to log experimental results, install extra dependencies via pip install -e .[wandb] or pip install -e .[swanlab].
After installation, set corresponding arguments in the config file:
run_name: null # Run name (auto: {model_type}_{finetune_type}_{timestamp})
project: "Flow-Factory" # Project name for logging
logging_backend: "wandb" # Options: wandb, swanlab, noneThese trackers allow you to visualize both training samples and metric curves online:
Start training with the following simple command:
ff-train examples/grpo/lora/flux.yamlThe unified structure of dataset is:
|---- dataset
|----|--- train.txt / train.jsonl
|----|--- test.txt / test.jsonl (optional)
|----|--- images (optional)
|----|---| image1.png
|----|---| ...
|----|--- videos (optional)
|----|---| video1.mp4
|----|---| ...
For text-to-image and text-to-video tasks, the only required input is the prompt in plain text format. Use train.txt and test.txt (optional) with following format:
A hill in a sunset.
An astronaut riding a horse on Mars.
Example: dataset/pickscore
Each line represents a single text prompt. Alternatively, you can use train.jsonl and test.jsonl in the following format:
{"prompt": "A hill in a sunset."}
{"prompt": "An astronaut riding a horse on Mars."}Example: dataset/t2is
negative_prompt is also supported:
{"prompt": "A hill in a sunset.", "negative_prompt": "low quality, blurry, distorted, poorly drawn"}
{"prompt": "An astronaut riding a horse on Mars.", "negative_prompt": "low quality, blurry, distorted, poorly drawn"}Example: dataset/t2is_neg
For tasks involving conditioning images, use train.jsonl and test.jsonl in the following format:
{"prompt": "A hill in a sunset.", "image": "path/to/image1.png"}
{"prompt": "An astronaut riding a horse on Mars.", "image": "path/to/image2/png"}Example: dataset/sharegpt4o_image_mini
The default root directory for images is dataset_dir/images, and for videos, it is dataset_dir/videos. You can override these locations by setting the image_dir and video_dir variables in the config file:
data:
dataset_dir: "path/to/dataset"
image_dir: "path/to/image_dir" # (default to "{dataset_dir}/images")
video_dir: "path/to/video_dir" # (default to "{dataset_dir}/videos")For models like FLUX.2-dev and Qwen-Image-Edit-2511 that are able to accept multiple images as conditions, use the images key with a list of image paths:
{"prompt": "A hill in a sunset.", "images": ["path/to/condition_image_1_1.png", "path/to/condition_image_1_2.png"]}
{"prompt": "An astronaut riding a horse on Mars.", "images": ["path/to/condition_image_2_1.png", "path/to/condition_image_2_2.png"]}{"prompt": "A hill in a sunset.", "video": "path/to/video1.mp4"}
{"prompt": "An astronaut riding a horse on Mars.", "videos": ["path/to/video2.mp4", "path/to/video3.mp4"]}Flow-Factory provides a flexible reward model system that supports both built-in and custom reward models for reinforcement learning.
Flow-Factory supports two types of reward models:
- Pointwise Reward: Computes independent scores for each sample (e.g., aesthetic quality, text-image alignment).
- Pairwise Reward: Computes rewards based on the pairwise comparison within the group. This is a special case of the following Groupwise Reward.
- Groupwise Reward: Computes rewards that requires the all samples in a group (e.g., ranking-based score or pairwise comparison).
The following reward models are pre-registered and ready to use:
| Name | Type | Description | Reference |
|---|---|---|---|
PickScore |
Pointwise | CLIP-based aesthetic scoring model | PickScore |
PickScore_Rank |
Groupwise | Ranking-based reward using PickScore | PickScore |
CLIP |
Pointwise | Image-text cosine similarity | CLIP |
Simply specify the reward model name in your config file:
rewards:
name: "aesthetic" # Alias for this reward model
reward_model: "PickScore" # Reward model type or a path like 'my_package.rewards.CustomReward'
batch_size: 16
device: "cuda"
dtype: bfloat16Refer to Rewards Guidance for more information about advanced usage, such as creating a custom reward model.
This repository is based on diffusers, accelerate and peft. We thank them for their contributions to the community!!!


