🚀 ParseFixer @ CVPR 2026 DataMFM Document Parsing Challenge

Official Implementation of "ParseFixer: An Agentic Framework for Document Parsing via Selective Multimodal Correction"

LeKai Yu¹, Hao Liu¹, Kun Wang¹, Zhiran Li¹, Ruping Cao¹, Fan Liu², Yupeng Hu¹

¹ Shandong University ² Southeast University

📌 Introduction

This repository contains the official implementation of ParseFixer, our solution for DataMFM Challenge Track 1: Document Parsing.

The task requires recovering document-level Markdown files from document page images while preserving visible text, tables, formulas, layout structure, and natural reading order.

ParseFixer is an agentic document parsing framework based on backbone parsing + selective multimodal correction. Instead of regenerating every page with a single large multimodal model, ParseFixer first uses a stable full-page parser to produce initial Markdown outputs and then repairs only unreliable pages or local elements through a verify-and-rollback correction process.

This repo includes:

full-page backbone parsing scripts;
Markdown normalization and post-processing utilities;
page-level and element-level selective correction pipeline;
table/formula repair and merge utilities;
document-level aggregation and submission packaging scripts;
prompt templates used for page, table, and formula correction.

📰 News

[2026/6] Initial release of the repository.

✨ Highlights

Stable full-page parsing. ParseFixer uses MinerU2.5 Pro as the full-page backbone parser to recover text, tables, formulas, and reading order from document page images.
Selective multimodal correction. Instead of rewriting all pages, ParseFixer repairs only high-risk pages or defective local regions.
Page-level and element-level routing. ASC performs page-level screening and then routes local blocks to table repair, formula repair, or keep actions.
Submission-ready aggregation. Page-level Markdown files are merged into document-level Markdown files and packed into a flat submission.zip structure.

🧠 Method Overview

ParseFixer consists of three main stages:

Full-Page Backbone Parsing (FBP).
MinerU2.5 Pro parses each document page image and generates initial page-level Markdown outputs together with layout-aware parsing information.
Agentic Selective Correction (ASC).
ASC diagnoses parsing failures using page-level quality checks and element-level format checks. It selectively repairs abnormal pages, malformed tables, and invalid or missing formulas.
Fusion and Submission.
Verified corrections are inserted back with minimal changes. The final page-level Markdown files are normalized, aggregated according to page order, and zipped into the official submission format.

📊 Results

Leaderboard Result

Our team zed achieved an Overall score of 61.78 and ranked 3rd in DataMFM Challenge Track 1: Document Parsing.

Rank	Team	Text ED	Table TEDS	Formula CDM	Reading Order	Overall
1	Zhiheng	0.16	82.42	19.3	75.44	65.37
2	durgasandeep	0.17	85.20	15.73	64.74	62.08
3	zed (Ours)	0.10	80.55	0.94	75.48	61.78
4	dennis	0.13	81.82	0.41	72.92	60.49
5	cdefg	0.16	80.77	2.71	70.91	59.64
6	anmspro	0.18	82.03	0.51	72.62	59.34
7	ytttttt	0.16	91.05	0.62	60.74	59.01
8	sig	0.18	84.99	0.41	63.31	57.74
9	Wind_Rain_Tower	0.15	59.56	0.58	61.59	51.62
10	HHHHHHHHHH	0.27	65.47	0.65	53.80	48.26

⚙️ Installation

1. Clone the repository

git clone https://github.com/iLearn-Lab/CVPRW26-ParseFixer.git
cd CVPRW26-ParseFixer

2. Create environment

Our environment is built based on the official OmniDocBench environment. Please first follow the environment setup instructions in the OmniDocBench repository.

A typical setup is:

git clone https://github.com/opendatalab/OmniDocBench.git
cd OmniDocBench

conda create -n parsefixer python=3.10 -y
conda activate parsefixer

pip install -e .

Then install the additional dependencies required by this repository:

cd ../CVPRW26-ParseFixer
pip install -r requirements.txt

Note: Since the official evaluation involves document parsing metrics such as formula and table evaluation, we recommend following the OmniDocBench environment configuration as closely as possible. If environment conflicts occur, please refer to the official OmniDocBench repository first.

3. Install / prepare external parsers

ParseFixer uses MinerU2.5 Pro as the full-page document parsing backbone. Please download the model weights from Hugging Face:

https://huggingface.co/opendatalab/MinerU2.5-Pro-2604-1.2B

If you use closed-source multimodal correction modules, please configure the corresponding API keys locally.

export GOOGLE_API_KEY="your_gemini_key"
export OPENAI_API_KEY="your_openai_key"

💽 Data Preparation

The official DataMFM Track 1 dataset can be downloaded from the following link:

Download DataMFM Track 1 Dataset

After downloading and extracting the dataset, please organize the page images as document folders, where each folder contains all page images of one document.

Expected structure:

data/
└── images/
    ├── <document_uuid_1>/
    │   ├── page_001.jpg
    │   ├── page_002.jpg
    │   └── ...
    ├── <document_uuid_2>/
    │   ├── page_001.jpg
    │   ├── page_002.jpg
    │   └── ...
    └── ...

The final submission should contain one Markdown file for each document:

submission.zip
├── <document_uuid_1>.md
├── <document_uuid_2>.md
└── ...

⚡ Inference

export GOOGLE_API_KEY="your_gemini_key"
export OPENAI_API_KEY="your_openai_key"

python run_parsefixer.py \
  --image-root /path/to/datasets/DataMFM/images \
  --mineru-model-path /path/to/models/MinerU2.5-Pro-2604-1.2B \
  --out-root /path/to/outputs/parsefixer/run_full \
  --page-model-provider gemini \
  --table-model-provider gemini \
  --formula-model-provider openai \
  --resume true

For a dry run without external repair models, set providers to none:

python run_parsefixer.py \
  --image-root /path/to/images \
  --mineru-model-path /path/to/MinerU2.5-Pro-2604-1.2B \
  --out-root /path/to/outputs/parsefixer/run_none \
  --page-model-provider none \
  --table-model-provider none \
  --formula-model-provider none

In this mode, FBP still runs, deterministic formula repair still works, but page/table/formula external fallback repair is skipped and rollback keeps MinerU output.

🧩 Prompt Templates

ParseFixer uses strict output-only prompts for controlled correction:

Page Repair Prompt: full-page image to Markdown under strict source-faithfulness constraints.
Table Repair Prompt: cropped table image to exactly one valid HTML <table>...</table> block.
Formula Repair Prompt: cropped formula region to raw LaTeX only.

All prompts follow the same principles:

output only the required format;
do not output explanations or analysis;
preserve visible source content faithfully;
do not summarize, translate, polish, or hallucinate;
reject unsupported additions through verification and rollback.

💾 Models and External Resources

ParseFixer may use the following external resources:

Resource	Usage	Required
MinerU2.5 Pro	Full-page backbone parsing and localized crop-level re-parsing	Yes
Gemini 2.5 Pro	Page-level re-parsing and fallback table correction	Optional
GPT-5.5	Fallback formula correction	Optional
DataMFM Track 1 dataset	Official challenge evaluation data	Yes

No additional manually annotated document parsing labels are used beyond the released challenge resources.

📚 Citation

If you find this project useful for your research, please consider citing:

@article{yu2026parsefixer,
  title={ParseFixer: An Agentic Framework for Document Parsing via Selective Multimodal Correction},
  author={Yu, LeKai and Liu, Hao and Wang, Kun and Li, Zhiran and Cao, Ruping and Liu, Fan and Hu, Yupeng},
  journal={arXiv preprint arXiv:2606.11977},
  year={2026}
}

📬 Contact

If you have any questions, feel free to contact:

LeKai Yu: kyleyue70@gmail.com
Hao Liu: liuh90210@gmail.com

🤝 Acknowledgement

The dataset preparation and organization in this repository follow OmniDocBench.
Thanks to all collaborators and contributors of this project.

📄 License

This project is released under the Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
configs		configs
metrics/cdm		metrics/cdm
parsefixer		parsefixer
result		result
scripts		scripts
signatures		signatures
skills		skills
src		src
tools		tools
LICENSE		LICENSE
README.md		README.md
pdf_validation.py		pdf_validation.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_parsefixer.py		run_parsefixer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 ParseFixer @ CVPR 2026 DataMFM Document Parsing Challenge

Official Implementation of "ParseFixer: An Agentic Framework for Document Parsing via Selective Multimodal Correction"

📌 Introduction

📰 News

✨ Highlights

🧠 Method Overview

📊 Results

Leaderboard Result

⚙️ Installation

1. Clone the repository

2. Create environment

3. Install / prepare external parsers

💽 Data Preparation

⚡ Inference

🧩 Prompt Templates

💾 Models and External Resources

📚 Citation

📬 Contact

🤝 Acknowledgement

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 ParseFixer @ CVPR 2026 DataMFM Document Parsing Challenge

Official Implementation of "ParseFixer: An Agentic Framework for Document Parsing via Selective Multimodal Correction"

📌 Introduction

📰 News

✨ Highlights

🧠 Method Overview

📊 Results

Leaderboard Result

⚙️ Installation

1. Clone the repository

2. Create environment

3. Install / prepare external parsers

💽 Data Preparation

⚡ Inference

🧩 Prompt Templates

💾 Models and External Resources

📚 Citation

📬 Contact

🤝 Acknowledgement

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages