Unified Thinker: A General Reasoning Modular Core for Image Generation

Sashuai Zhou^1,2*, Qiang Zhou^2*, Jijin Hu^2*, Hanqing Yang^2*, Yue Cao³, Junpeng Ma⁴,
Yinchao Ma², Jun Song^2†, Tiezheng Ge², Cheng Yu², Bo Zheng², Zhou Zhao^1†

¹Zhejiang University ²Alibaba Group ³Nanjing University ⁴Fudan University
^* Equal contribution ^† Corresponding authors

Unified Thinker is a task-agnostic reasoning core for general image generation. It decouples a trainable Thinker (MLLM) from an image Generator (e.g., diffusion models), enabling executable planning that bridges the persistent reasoning–execution gap in reasoning-driven image generation and editing.

News

Paper is now available.
[Planned] Code / checkpoints / HieraReason-40K will be released soon. Stay tuned.

Highlights

Decoupled Thinker–Generator design: upgrade reasoning without retraining the entire generator.
Unified planning format across T2I (creation) and I2I (edit-only modification).
HieraReason-40K: hierarchical reasoning traces + executable enhanced prompts for cold start.
Dual-phase RL with generator-in-the-loop to align plans with actual visual outcomes.
Cross-generator transfer: Thinker can be plugged into different diffusion backbones.

Project Status

This repository currently serves as the project homepage.

Training & inference code
Model checkpoints (Thinker / Generator adapters)
HieraReason-40K data & processing scripts
Reproduction scripts for benchmarks

If you would like to be notified when releases happen, please watch this repo.

Method Overview

Thinker (MLLM)
Input: instruction (+ optional reference image)
Output: structured reasoning trace + executable visual specification (enhanced prompt)

Generator (Diffusion model)
Input: enhanced prompt/spec (+ optional reference image for editing)
Output: final image

Training:

Stage 1 — Joint Supervised Fine-Tuning
- Teach the Thinker the planning interface using HieraReason-40K
- Align Generator to the enhanced prompts
Stage 2 — Dual-Phase Reinforcement Learning
- Phase 2.1 (Thinker RL): select plans that yield better images under constraint-based rewards
- Phase 2.2 (Generator RL): improve execution fidelity with stochastic rollouts + relative advantages

Citation

If you find this work useful, please cite:

@misc{zhou2026unifiedthinker,
      title={Unified Thinker: A General Reasoning Modular Core for Image Generation}, 
      author={Sashuai Zhou and Qiang Zhou and Jijin Hu and Hanqing Yang and Yue Cao and Junpeng Ma and Yinchao Ma and Jun Song and Tiezheng Ge and Cheng Yu and Bo Zheng and Zhou Zhao},
      year={2026},
      eprint={2601.03127},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2601.03127}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src		src
Readme.md		Readme.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unified Thinker: A General Reasoning Modular Core for Image Generation

News

Highlights

Project Status

Method Overview

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Unified Thinker: A General Reasoning Modular Core for Image Generation

News

Highlights

Project Status

Method Overview

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages