Zhipei Xu*, Xuanyu Zhang*, Xing Zhou, Jian Zhang
School of Electronic and Computer Engineering, Peking University
💡 We also have other Copyright Protection projects that may interest you ✨.
FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models [ICLR 2025]
Zhipei Xu, Xuanyu Zhang, Runyi Li, Zecheng Tang, Qing Huang, Jian Zhang
![]()
![]()
![]()
EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection [CVPR 2024]
Xuanyu Zhang, Runyi Li, Jiwen Yu, Youmin Xu, Weiqi Li, Jian Zhang
![]()
![]()
![]()
OmniGuard: Hybrid Manipulation Localization via Augmented Versatile Deep Image Watermarking [CVPR 2025]
Xuanyu Zhang, Zecheng Tang, Zhipei Xu, Runyi Li, Youmin Xu, Bin Chen, Feng Gao, Jian Zhang
![]()
![]()
![]()
- [2025.05.21] 🔥 We have released AvatarShield: Visual Reinforcement Learning for Human-Centric Video Forgery Detection. We present Human-Centric Video Forgery Detection, constructing the FakeHumanVid dataset and the AvatarShield framework. Check out the paper. The code and dataset are coming soon
The rapid advancement of Artificial Intelligence Generated Content (AIGC) technologies, particularly in video generation, has led to unprecedented creative capabilities but also increased threats to information integrity, identity security, and public trust. Existing detection methods, while effective in general scenarios, lack robust solutions for human-centric videos, which pose greater risks due to their realism and potential for legal and ethical misuse. Moreover, current detection approaches often suffer from poor generalization, limited scalability, and reliance on labor-intensive supervised fine-tuning. To address these challenges, we propose AvatarShield, the first interpretable MLLM-based framework for detecting human-centric fake videos, enhanced via Group Relative Policy Optimization (GRPO). Through our carefully designed accuracy detection reward and temporal compensation reward, it effectively avoids the use of high-cost text annotation data, enabling precise temporal modeling and forgery detection. Meanwhile, we design a dual-encoder architecture, combining high-level semantic reasoning and low-level artifact amplification to guide MLLMs in effective forgery detection. We further collect FakeHumanVid, a large-scale human-centric video benchmark that includes synthesis methods guided by pose, audio, and text inputs, enabling rigorous evaluation of detection methods in real-world scenes. Extensive experiments show that AvatarShield significantly outperforms existing approaches in both in-domain and cross-domain detection, setting a new standard for human-centric video forensics.
@article{xu2025avatarshield,
title={AvatarShield: Visual Reinforcement Learning for Human-Centric Video Forgery Detection},
author={Xu, Zhipei and Zhang, Xuanyu and Zhou, Xing and Zhang, Jian},
journal={arXiv preprint arXiv:2505.15173},
year={2025}
}