Skip to content

Collab-Gen/UnifiedThinker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Unified Thinker: A General Reasoning Modular Core for Image Generation

Sashuai Zhou1,2*, Qiang Zhou2*, Jijin Hu2*, Hanqing Yang2*, Yue Cao3, Junpeng Ma4,
Yinchao Ma2, Jun Song2†, Tiezheng Ge2, Cheng Yu2, Bo Zheng2, Zhou Zhao1†

1Zhejiang University    2Alibaba Group    3Nanjing University    4Fudan University
* Equal contribution   Corresponding authors

arXiv Models Coming Soon Data Coming Soon

Unified Thinker is a task-agnostic reasoning core for general image generation. It decouples a trainable Thinker (MLLM) from an image Generator (e.g., diffusion models), enabling executable planning that bridges the persistent reasoning–execution gap in reasoning-driven image generation and editing.

pipeline

News

  • Paper is now available.
  • [Planned] Code / checkpoints / HieraReason-40K will be released soon. Stay tuned.

Highlights

  • Decoupled Thinker–Generator design: upgrade reasoning without retraining the entire generator.
  • Unified planning format across T2I (creation) and I2I (edit-only modification).
  • HieraReason-40K: hierarchical reasoning traces + executable enhanced prompts for cold start.
  • Dual-phase RL with generator-in-the-loop to align plans with actual visual outcomes.
  • Cross-generator transfer: Thinker can be plugged into different diffusion backbones.

Project Status

This repository currently serves as the project homepage.

  • Training & inference code
  • Model checkpoints (Thinker / Generator adapters)
  • HieraReason-40K data & processing scripts
  • Reproduction scripts for benchmarks

If you would like to be notified when releases happen, please watch this repo.


Method Overview

Thinker (MLLM)
Input: instruction (+ optional reference image)
Output: structured reasoning trace + executable visual specification (enhanced prompt)

Generator (Diffusion model)
Input: enhanced prompt/spec (+ optional reference image for editing)
Output: final image

Training:

  1. Stage 1 — Joint Supervised Fine-Tuning
    • Teach the Thinker the planning interface using HieraReason-40K
    • Align Generator to the enhanced prompts
  2. Stage 2 — Dual-Phase Reinforcement Learning
    • Phase 2.1 (Thinker RL): select plans that yield better images under constraint-based rewards
    • Phase 2.2 (Generator RL): improve execution fidelity with stochastic rollouts + relative advantages

Citation

If you find this work useful, please cite:

@misc{zhou2026unifiedthinker,
      title={Unified Thinker: A General Reasoning Modular Core for Image Generation}, 
      author={Sashuai Zhou and Qiang Zhou and Jijin Hu and Hanqing Yang and Yue Cao and Junpeng Ma and Yinchao Ma and Jun Song and Tiezheng Ge and Cheng Yu and Bo Zheng and Zhou Zhao},
      year={2026},
      eprint={2601.03127},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2601.03127}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages