Skip to content

ByronWilliamsCPA/DeQA-Doc

Repository files navigation

DeQA-Doc: Document Image Quality Assessment with Pseudo-Label Scaling

Fork of Junjie-Gao19/DeQA-Doc -- the VQualA 2025 DIQA Challenge Championship solution -- extended with a confidence-weighted pseudo-labeling pipeline for scaling beyond the 5,000 human-labeled DIQA-5000 training set.

What This Fork Adds

The original DeQA-Doc trains on only 5,000 human-labeled document images. This fork adds infrastructure to scale to 100K+ images via pseudo-labeling with multi-signal uncertainty quantification:

  • 4-signal uncertainty fusion -- Mahalanobis OOD distance, cross-model JSD (SigLIP2 vs DeQA), aleatoric variance, and prediction entropy
  • Tiered acceptance -- auto-accept, low-weight, VLM veto, hard-reject decisions per sample
  • Cross-model validation -- detects where SigLIP2-IQA-Base-86M disagrees with DeQA public models
  • Active learning -- BALD-based sample selection for efficient human annotation
  • Validation safeguards -- bootstrap CI, harm checks, distribution drift monitoring

All pseudo-labels output in the existing DeQA training JSON format -- no training code changes required.

Uncertainty Pipeline (DeQA-Score/src/uncertainty/)

Module Purpose
ood_wrapper.py Mahalanobis-distance OOD detector from SigLIP2 embeddings
gaussian_to_discrete.py Convert SigLIP2 (mu, sigma-sq) to 5-level quality probabilities
discrete_metrics.py JSD, KL divergence, entropy, BALD for discrete distributions
cross_validator.py Compare SigLIP2 vs DeQA model predictions
fusion.py 4-signal uncertainty fusion with per-dimension thresholds
vlm_validator.py Tier-2 VLM veto via Qwen3-VL-8B (OpenRouter)
pseudo_label.py End-to-end pipeline orchestrator
format_training_data.py Convert to SingleDataset-compatible training JSON
active_learning.py BALD-based annotation queue generation
validation.py Bootstrap CI, harm checks, distribution drift

CLI Scripts

# Run full pseudo-labeling pipeline
python scripts/run_pseudo_label.py \
    --siglip2-results predictions/siglip2_iqa.json \
    --embeddings embeddings/unlabeled.npy \
    --ood-params embeddings/ood_params_4400.npz \
    --deqa-specialist predictions/specialist_labels.jsonl \
    --output-dir Data-DeQA-Score/pseudo/ \
    --per-dimension

# Select samples for human annotation
python scripts/run_active_learning.py \
    --pseudo-label-dir Data-DeQA-Score/pseudo/ \
    --sacred-test-ids artifacts/sacred_test_ids.json \
    --output-queue artifacts/annotation_queue.json \
    --k 1000

# Validate OOD detector calibration
python scripts/validate_ood_checkpoint.py \
    --ood-params embeddings/ood_params_4400.npz \
    --test-embeddings embeddings/diqa5000_test_all.npy

Research Results (results/)

  • tier1_ood_detector/ -- Mahalanobis OOD detector methodology and calibration
  • vlm_model_selection/ -- VLM model comparison for Tier-2 cross-validation (selected Qwen3-VL-8B)

See research/INDEX.md for the full research index, experiment registry, hypothesis backlog, and data inventory.

Free Model Monitor (results/vlm_teacher_eval/full_eval/check_free_models.py)

Daily check for free vision models on OpenRouter. Queries the /models API for :free models with image input, diffs against previous runs to detect newly appeared or removed models, and optionally auto-launches DIQA-5000 evaluation for new models.

cd DeQA-Score
PYTHONPATH=./:$PYTHONPATH .venv/bin/python \
    ../results/vlm_teacher_eval/full_eval/check_free_models.py          # dry run
    ../results/vlm_teacher_eval/full_eval/check_free_models.py --run    # evaluate new/incomplete
    ../results/vlm_teacher_eval/full_eval/check_free_models.py --json   # machine-readable output
    ../results/vlm_teacher_eval/full_eval/check_free_models.py --all    # include completed models

State is tracked in free_models_state.json so successive runs can report additions and removals.

Technical Report Series (research/papers/)

A 10-paper arXiv-style technical report series documenting the full research program. Each paper includes a figure generation script, a living research agenda, and a 5-model consensus peer review. Generated from the comprehensive evaluation in results/vlm_teacher_eval/full_eval/VLM_TEACHER_EVALUATION.md.

# Title Words Focus
0 VQualA 2025 DIQA Challenge: A Competition Analysis 3,939 Competition landscape, all 7 team solutions
1 VLM Benchmark for Document Image Quality Assessment 6,232 7 VLMs vs human MOS on DIQA-5000
2 Cross-Domain Generalization of VLM Quality Assessors 5,407 13 OOD categories, ID vs OOD performance
3 Prompt Engineering for VLM-Based Quality Assessment 3,501 7-arm prompt comparison, regression to mean
4 Embedding-Space OOD Detection for Document Quality Pipelines 4,980 Mahalanobis distance, AUROC 0.9963
5 Off-the-Shelf NR-IQA Models on Document Images 3,011 26 NR-IQA models, domain gap analysis
6 DeQA Quality Scores Predict OCR Accuracy 5,017 1,200 images x 4 OCR engines
7 Iterative Pseudo-Labeling Pipeline for Domain Expansion 6,079 End-to-end pipeline design (capstone)
8 Training SigLIP2-IQA-Base: A Lightweight Document IQA Model 3,720 86M-param student, MainScore 0.886
9 Training HyperIQA++: CNN Fine-Tuning for Document IQA 3,124 CNN baseline, generalization gap analysis
# Generate all paper figures
python research/papers/generate_all.py

# Generate figures for specific papers
python research/papers/generate_all.py --paper 1 6 8

License: CC BY-SA 4.0, Copyright 2025 Byron Williams

Original DeQA-Doc

Paper: DeQA-Doc: Adapting DeQA-Score to Document Image Quality Assessment

Authors: Junjie Gao, Runze Liu, Yingzhe Peng, Shujian Yang, Jin Zhang, Kai Yang, Zhiyuan You

Achievement: Championship in VQualA 2025 DIQA Challenge

The system predicts quality scores across three dimensions -- overall quality, sharpness, and color fidelity -- using discrete quality levels (excellent/good/fair/poor/bad) with soft-label distribution learning.

Two model backends:

  • mPLUG-Owl2-7B -- trained via the DeQA-Score codebase in DeQA-Score/
  • Qwen2.5-VL-7B -- trained via LLaMA-Factory with patched files in Llamafactory/

Installation

cd DeQA-Score
pip install -e .          # inference only
pip install -e ".[train]" # training
pip install -e ".[dev]"   # development (pytest, ruff)

Pre-trained Models

Training and Inference

# mPLUG-Owl2
sh scripts/train.sh           # full fine-tuning
sh scripts/train_lora.sh      # LoRA fine-tuning
sh scripts/infer.sh            # inference
sh scripts/diqa_eval.sh        # format results for DIQA evaluation

# Qwen2.5-VL (requires LLaMA-Factory installation)
llamafactory-cli train examples/train_full/qwen2.5_vl_diqa_sft.yaml
sh scripts/infer_qwen.sh

Testing

cd DeQA-Score
.venv/bin/python -m pytest tests/uncertainty/ -v  # 122 tests

Acknowledgements

  • DeQA-Score -- the foundation this work builds on
  • DeQA-Doc -- the original DIQA adaptation and challenge winner

Citation

If you use this work, please cite the original DeQA-Doc paper:

@inproceedings{deqadoc,
  title={{DeQA-Doc}: Adapting {DeQA-Score} to Document Image Quality Assessment},
  author={Gao, Junjie and Liu, Runze and Peng, Yingzhe and Yang, Shujian and Zhang, Jin and Yang, Kai and You, Zhiyuan},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop},
  year={2025},
}

About

Fork of DeQA-Doc (VQualA 2025 DIQA Champion) extended with a confidence-weighted pseudo-labeling pipeline for scaling beyond 5K human-labeled images via 4-signal uncertainty quantification and active learning.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors