Official Implementation of "ParseFixer: An Agentic Framework for Document Parsing via Selective Multimodal Correction"
LeKai Yu1, Hao Liu1, Kun Wang1, Zhiran Li1, Ruping Cao1, Fan Liu2, Yupeng Hu1
1 Shandong University 2 Southeast University
This repository contains the official implementation of ParseFixer, our solution for DataMFM Challenge Track 1: Document Parsing.
The task requires recovering document-level Markdown files from document page images while preserving visible text, tables, formulas, layout structure, and natural reading order.
ParseFixer is an agentic document parsing framework based on backbone parsing + selective multimodal correction. Instead of regenerating every page with a single large multimodal model, ParseFixer first uses a stable full-page parser to produce initial Markdown outputs and then repairs only unreliable pages or local elements through a verify-and-rollback correction process.
This repo includes:
- full-page backbone parsing scripts;
- Markdown normalization and post-processing utilities;
- page-level and element-level selective correction pipeline;
- table/formula repair and merge utilities;
- document-level aggregation and submission packaging scripts;
- prompt templates used for page, table, and formula correction.
- [2026/6] Initial release of the repository.
- Stable full-page parsing. ParseFixer uses MinerU2.5 Pro as the full-page backbone parser to recover text, tables, formulas, and reading order from document page images.
- Selective multimodal correction. Instead of rewriting all pages, ParseFixer repairs only high-risk pages or defective local regions.
- Page-level and element-level routing. ASC performs page-level screening and then routes local blocks to table repair, formula repair, or keep actions.
- Submission-ready aggregation. Page-level Markdown files are merged into document-level Markdown files and packed into a flat
submission.zipstructure.
ParseFixer consists of three main stages:
-
Full-Page Backbone Parsing (FBP).
MinerU2.5 Pro parses each document page image and generates initial page-level Markdown outputs together with layout-aware parsing information. -
Agentic Selective Correction (ASC).
ASC diagnoses parsing failures using page-level quality checks and element-level format checks. It selectively repairs abnormal pages, malformed tables, and invalid or missing formulas. -
Fusion and Submission.
Verified corrections are inserted back with minimal changes. The final page-level Markdown files are normalized, aggregated according to page order, and zipped into the official submission format.
Our team zed achieved an Overall score of 61.78 and ranked 3rd in DataMFM Challenge Track 1: Document Parsing.
| Rank | Team | Text ED | Table TEDS | Formula CDM | Reading Order | Overall |
|---|---|---|---|---|---|---|
| 1 | Zhiheng | 0.16 | 82.42 | 19.3 | 75.44 | 65.37 |
| 2 | durgasandeep | 0.17 | 85.20 | 15.73 | 64.74 | 62.08 |
| 3 | zed (Ours) | 0.10 | 80.55 | 0.94 | 75.48 | 61.78 |
| 4 | dennis | 0.13 | 81.82 | 0.41 | 72.92 | 60.49 |
| 5 | cdefg | 0.16 | 80.77 | 2.71 | 70.91 | 59.64 |
| 6 | anmspro | 0.18 | 82.03 | 0.51 | 72.62 | 59.34 |
| 7 | ytttttt | 0.16 | 91.05 | 0.62 | 60.74 | 59.01 |
| 8 | sig | 0.18 | 84.99 | 0.41 | 63.31 | 57.74 |
| 9 | Wind_Rain_Tower | 0.15 | 59.56 | 0.58 | 61.59 | 51.62 |
| 10 | HHHHHHHHHH | 0.27 | 65.47 | 0.65 | 53.80 | 48.26 |
git clone https://github.com/iLearn-Lab/CVPRW26-ParseFixer.git
cd CVPRW26-ParseFixerOur environment is built based on the official OmniDocBench environment. Please first follow the environment setup instructions in the OmniDocBench repository.
A typical setup is:
git clone https://github.com/opendatalab/OmniDocBench.git
cd OmniDocBench
conda create -n parsefixer python=3.10 -y
conda activate parsefixer
pip install -e .Then install the additional dependencies required by this repository:
cd ../CVPRW26-ParseFixer
pip install -r requirements.txtNote: Since the official evaluation involves document parsing metrics such as formula and table evaluation, we recommend following the OmniDocBench environment configuration as closely as possible. If environment conflicts occur, please refer to the official OmniDocBench repository first.
ParseFixer uses MinerU2.5 Pro as the full-page document parsing backbone. Please download the model weights from Hugging Face:
https://huggingface.co/opendatalab/MinerU2.5-Pro-2604-1.2B
If you use closed-source multimodal correction modules, please configure the corresponding API keys locally.
export GOOGLE_API_KEY="your_gemini_key"
export OPENAI_API_KEY="your_openai_key"The official DataMFM Track 1 dataset can be downloaded from the following link:
Download DataMFM Track 1 Dataset
After downloading and extracting the dataset, please organize the page images as document folders, where each folder contains all page images of one document.
Expected structure:
data/
βββ images/
βββ <document_uuid_1>/
β βββ page_001.jpg
β βββ page_002.jpg
β βββ ...
βββ <document_uuid_2>/
β βββ page_001.jpg
β βββ page_002.jpg
β βββ ...
βββ ...The final submission should contain one Markdown file for each document:
submission.zip
βββ <document_uuid_1>.md
βββ <document_uuid_2>.md
βββ ...export GOOGLE_API_KEY="your_gemini_key"
export OPENAI_API_KEY="your_openai_key"
python run_parsefixer.py \
--image-root /path/to/datasets/DataMFM/images \
--mineru-model-path /path/to/models/MinerU2.5-Pro-2604-1.2B \
--out-root /path/to/outputs/parsefixer/run_full \
--page-model-provider gemini \
--table-model-provider gemini \
--formula-model-provider openai \
--resume trueFor a dry run without external repair models, set providers to none:
python run_parsefixer.py \
--image-root /path/to/images \
--mineru-model-path /path/to/MinerU2.5-Pro-2604-1.2B \
--out-root /path/to/outputs/parsefixer/run_none \
--page-model-provider none \
--table-model-provider none \
--formula-model-provider noneIn this mode, FBP still runs, deterministic formula repair still works, but page/table/formula external fallback repair is skipped and rollback keeps MinerU output.
ParseFixer uses strict output-only prompts for controlled correction:
- Page Repair Prompt: full-page image to Markdown under strict source-faithfulness constraints.
- Table Repair Prompt: cropped table image to exactly one valid HTML
<table>...</table>block. - Formula Repair Prompt: cropped formula region to raw LaTeX only.
All prompts follow the same principles:
- output only the required format;
- do not output explanations or analysis;
- preserve visible source content faithfully;
- do not summarize, translate, polish, or hallucinate;
- reject unsupported additions through verification and rollback.
ParseFixer may use the following external resources:
| Resource | Usage | Required |
|---|---|---|
| MinerU2.5 Pro | Full-page backbone parsing and localized crop-level re-parsing | Yes |
| Gemini 2.5 Pro | Page-level re-parsing and fallback table correction | Optional |
| GPT-5.5 | Fallback formula correction | Optional |
| DataMFM Track 1 dataset | Official challenge evaluation data | Yes |
No additional manually annotated document parsing labels are used beyond the released challenge resources.
If you find this project useful for your research, please consider citing:
@article{yu2026parsefixer,
title={ParseFixer: An Agentic Framework for Document Parsing via Selective Multimodal Correction},
author={Yu, LeKai and Liu, Hao and Wang, Kun and Li, Zhiran and Cao, Ruping and Liu, Fan and Hu, Yupeng},
journal={arXiv preprint arXiv:2606.11977},
year={2026}
}If you have any questions, feel free to contact:
- LeKai Yu:
kyleyue70@gmail.com - Hao Liu:
liuh90210@gmail.com
- The dataset preparation and organization in this repository follow OmniDocBench.
- Thanks to all collaborators and contributors of this project.
This project is released under the Apache 2.0 License.
