Skip to content

iLearn-Lab/CVPRW26-ParseFixer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ ParseFixer @ CVPR 2026 DataMFM Document Parsing Challenge

Official Implementation of "ParseFixer: An Agentic Framework for Document Parsing via Selective Multimodal Correction"

Challenge arXiv Project Page License

LeKai Yu1, Hao Liu1, Kun Wang1, Zhiran Li1, Ruping Cao1, Fan Liu2, Yupeng Hu1

1 Shandong University 2 Southeast University


πŸ“Œ Introduction

This repository contains the official implementation of ParseFixer, our solution for DataMFM Challenge Track 1: Document Parsing.

The task requires recovering document-level Markdown files from document page images while preserving visible text, tables, formulas, layout structure, and natural reading order.

ParseFixer is an agentic document parsing framework based on backbone parsing + selective multimodal correction. Instead of regenerating every page with a single large multimodal model, ParseFixer first uses a stable full-page parser to produce initial Markdown outputs and then repairs only unreliable pages or local elements through a verify-and-rollback correction process.

This repo includes:

  • full-page backbone parsing scripts;
  • Markdown normalization and post-processing utilities;
  • page-level and element-level selective correction pipeline;
  • table/formula repair and merge utilities;
  • document-level aggregation and submission packaging scripts;
  • prompt templates used for page, table, and formula correction.

πŸ“° News

  • [2026/6] Initial release of the repository.

✨ Highlights

  • Stable full-page parsing. ParseFixer uses MinerU2.5 Pro as the full-page backbone parser to recover text, tables, formulas, and reading order from document page images.
  • Selective multimodal correction. Instead of rewriting all pages, ParseFixer repairs only high-risk pages or defective local regions.
  • Page-level and element-level routing. ASC performs page-level screening and then routes local blocks to table repair, formula repair, or keep actions.
  • Submission-ready aggregation. Page-level Markdown files are merged into document-level Markdown files and packed into a flat submission.zip structure.

🧠 Method Overview

ParseFixer pipeline

ParseFixer consists of three main stages:

  1. Full-Page Backbone Parsing (FBP).
    MinerU2.5 Pro parses each document page image and generates initial page-level Markdown outputs together with layout-aware parsing information.

  2. Agentic Selective Correction (ASC).
    ASC diagnoses parsing failures using page-level quality checks and element-level format checks. It selectively repairs abnormal pages, malformed tables, and invalid or missing formulas.

  3. Fusion and Submission.
    Verified corrections are inserted back with minimal changes. The final page-level Markdown files are normalized, aggregated according to page order, and zipped into the official submission format.


πŸ“Š Results

Leaderboard Result

Our team zed achieved an Overall score of 61.78 and ranked 3rd in DataMFM Challenge Track 1: Document Parsing.

Rank Team Text ED Table TEDS Formula CDM Reading Order Overall
1 Zhiheng 0.16 82.42 19.3 75.44 65.37
2 durgasandeep 0.17 85.20 15.73 64.74 62.08
3 zed (Ours) 0.10 80.55 0.94 75.48 61.78
4 dennis 0.13 81.82 0.41 72.92 60.49
5 cdefg 0.16 80.77 2.71 70.91 59.64
6 anmspro 0.18 82.03 0.51 72.62 59.34
7 ytttttt 0.16 91.05 0.62 60.74 59.01
8 sig 0.18 84.99 0.41 63.31 57.74
9 Wind_Rain_Tower 0.15 59.56 0.58 61.59 51.62
10 HHHHHHHHHH 0.27 65.47 0.65 53.80 48.26

βš™οΈ Installation

1. Clone the repository

git clone https://github.com/iLearn-Lab/CVPRW26-ParseFixer.git
cd CVPRW26-ParseFixer

2. Create environment

Our environment is built based on the official OmniDocBench environment. Please first follow the environment setup instructions in the OmniDocBench repository.

A typical setup is:

git clone https://github.com/opendatalab/OmniDocBench.git
cd OmniDocBench

conda create -n parsefixer python=3.10 -y
conda activate parsefixer

pip install -e .

Then install the additional dependencies required by this repository:

cd ../CVPRW26-ParseFixer
pip install -r requirements.txt

Note: Since the official evaluation involves document parsing metrics such as formula and table evaluation, we recommend following the OmniDocBench environment configuration as closely as possible. If environment conflicts occur, please refer to the official OmniDocBench repository first.

3. Install / prepare external parsers

ParseFixer uses MinerU2.5 Pro as the full-page document parsing backbone. Please download the model weights from Hugging Face:

https://huggingface.co/opendatalab/MinerU2.5-Pro-2604-1.2B

If you use closed-source multimodal correction modules, please configure the corresponding API keys locally.

export GOOGLE_API_KEY="your_gemini_key"
export OPENAI_API_KEY="your_openai_key"

πŸ’½ Data Preparation

The official DataMFM Track 1 dataset can be downloaded from the following link:

Download DataMFM Track 1 Dataset

After downloading and extracting the dataset, please organize the page images as document folders, where each folder contains all page images of one document.

Expected structure:

data/
└── images/
    β”œβ”€β”€ <document_uuid_1>/
    β”‚   β”œβ”€β”€ page_001.jpg
    β”‚   β”œβ”€β”€ page_002.jpg
    β”‚   └── ...
    β”œβ”€β”€ <document_uuid_2>/
    β”‚   β”œβ”€β”€ page_001.jpg
    β”‚   β”œβ”€β”€ page_002.jpg
    β”‚   └── ...
    └── ...

The final submission should contain one Markdown file for each document:

submission.zip
β”œβ”€β”€ <document_uuid_1>.md
β”œβ”€β”€ <document_uuid_2>.md
└── ...

⚑ Inference

export GOOGLE_API_KEY="your_gemini_key"
export OPENAI_API_KEY="your_openai_key"

python run_parsefixer.py \
  --image-root /path/to/datasets/DataMFM/images \
  --mineru-model-path /path/to/models/MinerU2.5-Pro-2604-1.2B \
  --out-root /path/to/outputs/parsefixer/run_full \
  --page-model-provider gemini \
  --table-model-provider gemini \
  --formula-model-provider openai \
  --resume true

For a dry run without external repair models, set providers to none:

python run_parsefixer.py \
  --image-root /path/to/images \
  --mineru-model-path /path/to/MinerU2.5-Pro-2604-1.2B \
  --out-root /path/to/outputs/parsefixer/run_none \
  --page-model-provider none \
  --table-model-provider none \
  --formula-model-provider none

In this mode, FBP still runs, deterministic formula repair still works, but page/table/formula external fallback repair is skipped and rollback keeps MinerU output.


🧩 Prompt Templates

ParseFixer uses strict output-only prompts for controlled correction:

  • Page Repair Prompt: full-page image to Markdown under strict source-faithfulness constraints.
  • Table Repair Prompt: cropped table image to exactly one valid HTML <table>...</table> block.
  • Formula Repair Prompt: cropped formula region to raw LaTeX only.

All prompts follow the same principles:

  • output only the required format;
  • do not output explanations or analysis;
  • preserve visible source content faithfully;
  • do not summarize, translate, polish, or hallucinate;
  • reject unsupported additions through verification and rollback.

πŸ’Ύ Models and External Resources

ParseFixer may use the following external resources:

Resource Usage Required
MinerU2.5 Pro Full-page backbone parsing and localized crop-level re-parsing Yes
Gemini 2.5 Pro Page-level re-parsing and fallback table correction Optional
GPT-5.5 Fallback formula correction Optional
DataMFM Track 1 dataset Official challenge evaluation data Yes

No additional manually annotated document parsing labels are used beyond the released challenge resources.


πŸ“š Citation

If you find this project useful for your research, please consider citing:

@article{yu2026parsefixer,
  title={ParseFixer: An Agentic Framework for Document Parsing via Selective Multimodal Correction},
  author={Yu, LeKai and Liu, Hao and Wang, Kun and Li, Zhiran and Cao, Ruping and Liu, Fan and Hu, Yupeng},
  journal={arXiv preprint arXiv:2606.11977},
  year={2026}
}

πŸ“¬ Contact

If you have any questions, feel free to contact:

  • LeKai Yu: kyleyue70@gmail.com
  • Hao Liu: liuh90210@gmail.com

🀝 Acknowledgement

  • The dataset preparation and organization in this repository follow OmniDocBench.
  • Thanks to all collaborators and contributors of this project.

πŸ“„ License

This project is released under the Apache 2.0 License.


About

[CVPRW26] Official Implementation for "ParseFixer: An Agentic Framework for Document Parsing via Selective Multimodal Correction"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors