OCR Workbench is a ready-to-use framework for easily comparing popular OCR libraries in Python. It abstracts away the individual setup and usage details of each library, allowing you to focus on evaluating results on your data rather than spending time implementing each method yourself.
Simply provide a collection of PDF files and compare all libraries using a single script.
Currently, the following libraries are supported:
- Docling including tesseract, EasyOCR, RapidOCR, surya, Granite
- MinerU
- Marker
- Azure Document Intelligence
- LightOnOCR-2-1B
- Chandra OCR 2
This selection focuses on benchmarking open source libraries against proprietary Azure Document Intelligence.
Features
- Running experiments with the above libraries via a single script
- Automatic conversion of all provided PDF files in
data/inputinto markdown using all methods - Hardware acceleration using CPU, MPS and CUDA
- Time and cpu-memory tracking for each method
Using our script, we produced OCR outputs for four example documents and compared them in terms of OCR quality as well as resource consumption.
The following publicly available PDFs were used:
- Information About Coca-Cola Volume Growth
- Handwriting Sample from NIST Special Database 19 (the sample image was saved as a PDF file)
- 2020 Annual Report Midwest Food Bank
- RKI: Epidemiologisches Bulletin (German)
Speed is measured on a Macbook Air M4 for CPU and MPS, and on an NVIDIA RTX 5090 GPU (32 GB VRAM).
Memory usage is measured only once for CPU.
OCR quality is subjectively graded and compared based on the markdown output stored in data/output/<ocr-method>/<file-name>.md.
The following table summarizes the comparison of all methods on all PDFs. Extraction quality (Excellent, Very good, Good, Medium, Poor) and GPU extraction speed in seconds are shown, as well as cost per page when running on an NVIDIA RTX 5090 GPU hosted on runpod.io for 89ct/hour. We did not carry out any specific runtime optimizations for any method.
| OCR-Library | Coca-Cola | NIST Handwriting | World Food Bank | RKI Bulletin (German) | Cost / page |
|---|---|---|---|---|---|
| LightOnOCR-2-1B | 🏆 Excellent (192s) | 🏆 Excellent (23s) | 🏆 Excellent (191s) | 🏆 Excellent (416s) | 0.5 ct |
| Chandra OCR 2 | 🏆 Excellent (376s) | 🏆 Excellent (120s) | 🏆 Excellent (476s) | 🏆 Excellent (863s) | 1.4 ct |
| Document Intelligence | 🟢 Very good (8s) | 🟢 Good (5s) | 🟢 Very good (14s) | 🏆 Excellent (10s) | 1 ct |
| Docling - suryaocr | 🟢 Very good (31s) | 🔴 Poor (8s) | 🟢 Good (49s) | 🟢 Very good (270s) | 0.17 ct |
| Docling - RapidOCR | 🟢 Good (12s) | 🔴 Poor (4s) | 🟢 Good (28s) | 🟢 Very good (52s) | 0.06 ct |
| MinerU | 🟢 Good (42s) | 🔴 Poor (16s) | 🟢 Good (60s) | 🟢 Very good (88s) | 0.17 ct |
| marker | 🟢 Good (29) | 🔴 Poor (5s) | 🟢 Good (35s) | 🟡 Medium (143s) | 0.11 ct |
| Docling - Granite | 🔴 Poor (343s) | 🟡 Medium (108s) | 🔴 Poor (171s) | 🟢 Good (597s) | 1.16 ct |
| Docling - EasyOCR | 🟡 Medium (37s) | 🔴 Poor (7s) | 🟡 Medium (41s) | 🟢 Very good (95s) | 0.11 ct |
| Docling - Tesseract | 🔴 Poor (32s) | 🔴 Poor (10s) | 🔴 Poor (50s) | 🟢 Good (101s) | 0.13 ct |
We can see that the open weights models LightOnOCR-2-1B and Chandra OCR 2 yield the best results. In the case of LightOnOCR-2-1B this is impressive, since it means that an open weights model can be used to save costs without sacrificing on extraction quality! The proprietary Azure Document Intelligence yields second-best results and has the highest speed, at least when compared against an NVIDIA RTX 5090 GPU. RapidOCR was the fastest open source alternative and therefore the cheapest model in our experiments.
Especially when extraction speed does not matter as much (e.g., in offline computation settings), open source methods can be a much cheaper alternative. Moreover, more powerful GPUs (H100 or better) could be used to catch up with Document Intelligence speed. Frameworks like vLLM and specifically compiled modules like FlashAttention can then be used to further optimize inference speed.
Details about the qualitative evaluation can be found in the qualitative evaluation details.
Scanned documents are essentially collections of images stored in PDF format. While easy to view and share, their text content is not machine-readable by default. Extracting this text requires specialized machine learning techniques. Optical Character Recognition (OCR) converts text in images into structured, machine-readable data that can be reliably used for tasks such as document search, analysis, and automated processing.
In this repository, we compare open source OCR engines against proprietary ones. We also include VLM based approaches.
Since the dependencies of different OCR libraries can have conflicts, we use a separate Python environment per OCR library. Each one uses uv as a dependency manager. Make sure to install uv before moving on.
Set up the respective environment using:
cd <environment-name>
uv syncwhere <environment-name> is one of docling_environment, marker_environment, mineru_environment, azure_environment, lighton_environment.
Docling with tesseract
If you want to run docling with tesseract, tesseract needs to be installed:
# Ubuntu:
sudo apt install tesseract-ocr-all
# Via brew on Mac OS:
brew install tesseract
brew install tesseract-langAdditionally, the correct path for the tesseract data directory needs to be set in docling_environment/config.json.
See https://tesseract-ocr.github.io/tessdoc/Installation.html for an explanation.
If you do not wish to use tesseract, simply remove it from ocr_engines in docling_environment/config.json.
Azure Document Intelligence
In order to use Azure Document Intelligence, you need to set up an account on Microsoft Azure and create a Document Intelligence resource. Then place the endpoint URL and API key in docint_environment/config.json.
Place some PDF files to be parsed in data/input.
Then run:
bash run_ocr_experiments.sh -a <accelerator> -e <environment>where <accelerator> is one of cpu, mps, cuda and <environment> is one of docling, marker, mineru, docint.
By default, the script processes all PDF files in data/input.
If you want to run a single experiment instead, run the following:
bash run_ocr_experiments.sh -i <input-file> -a <accelerator> -e <environment>The markdown output is stored in data/output/<ocr-method>/<file-name>.md.
In order to visualize CPU memory over time, run:
source <environment>/.venv/bin/activate
mprof plot data/output/<ocr-method>/<file-name>_mem_cpu.datFor almost every environment, there exists an additional config.json file with some preconfigured defaults. You can change it based on your needs.
| OCR-Library | Extraction Quality | Speed [seconds] | CPU Memory Usage |
|---|---|---|---|
| Docling - Tesseract | Poor (misses text, confuses table entries, doesn't read checkboxes correctly) | CPU: 34 MPS: 34 GPU: 32 |
3 GB |
| Docling - EasyOCR | Medium (reads text well, confuses some table entries, doesn't read checkboxes correctly) | CPU: 129 MPS: 52 GPU: 37 |
11.4 GB |
| Docling - RapidOCR | Good (reads text well, gets table entries correct, doesn't read checkboxes correctly) | CPU: 161 MPS: 160 GPU: 12 |
6.4 GB |
| Docling - suryaocr | Very good (reads text well, gets table entries correct, gets most checkboxes correct) | CPU: 369 MPS: 337 GPU: 31 |
3.7 GB |
| Docling - Granite | Poor (misses great share of text, gets table entries correct, misses checkboxes) | CPU: 1564 MPS: 313 GPU: 343 |
5.8 GB |
| marker | Good (reads text well, confuses some table entries, gets most checkboxes correct) | CPU: 229 MPS: 212 GPU: 29 |
11.8 GB |
| MinerU | Good (reads text well, gets table entries correct, doesn't read checkboxes correctly) | CPU: 160 MPS: 50 GPU: 42 |
4.3 GB |
| Document Intelligence | Very good (reads text well, gets table entries correct, gets some checkboxes correct) | 8 | 70 MB (processing happens in cloud) |
| LightOnOCR-2-1B | Excellent (reads text well, gets table entries correct, gets all checkboxes correct) | CPU: 1828 MPS: 1993 GPU: 192 |
15.7 GB |
| Chandra OCR 2 | Excellent (reads text well, gets table entries correct, gets all checkboxes correct) | CPU: X MPS: X GPU: 376 |
OOM on MacBook Air M4 |
| OCR-Library | Extraction Quality | Speed [seconds] | CPU Memory Usage |
|---|---|---|---|
| Docling - Tesseract | Poor (mistakes most text for images) | CPU: 3 MPS: 4 GPU: 10 |
1.4 GB |
| Docling - EasyOCR | Poor (mistakes most text for images) | CPU: 12 MPS: 6 GPU: 7 |
11.2 GB |
| Docling - RapidOCR | Poor (mistakes most text for images) | CPU: 18 MPS: 16 GPU: 4 |
2.9 GB |
| Docling - suryaocr | Poor (mistakes most text for images) | CPU: 48 MPS: 46 GPU: 8 |
1.8 GB |
| Docling - Granite | Medium (recognizes around half the text correctly) | CPU: 164 MPS: 7 GPU: 108 |
1.5 GB |
| marker | Poor (mistakes half of the form for image, reads out remaining text well) | CPU: 31 MPS: 39 GPU: 5 |
7.8 GB |
| MinerU | Poor (misses text and makes mistakes, does not align captions with content well) | CPU: 25 MPS: 16 GPU: 16 |
4.6 GB |
| Document Intelligence | Good (reads all handwriting and text well, does not align captions with contents well) | 5 | 46 MB (processing happens in cloud) |
| LightOnOCR-2-1B | Excellent (reads all handwriting and text well, aligns all form contents perfectly) | CPU: 170 MPS: 168 GPU: 23 |
12.4 GB |
| Chandra OCR 2 | Excellent (reads all handwriting and text well, aligns all form contents perfectly) | CPU: X MPS: X GPU: 120 |
OOM on MacBook Air M4 |
| OCR-Library | Extraction Quality | Speed [seconds] | CPU Memory Usage |
|---|---|---|---|
| Docling - Tesseract | Poor (reads text well, mistakes table of content for image, gets double column layout mostly correct, mistakes tables for images) | CPU: 30 MPS: 29 GPU: 50 |
8 GB |
| Docling - EasyOCR | Medium (reads text well, gets table of contents mostly correct, gets double column layout mostly correct, gets tables mostly correct) | CPU: 166 MPS: 54 GPU: 41 |
13 GB |
| Docling - RapidOCR | Good (reads text well, gets table of contents mostly correct, gets double column layout mostly correct, gets tables correct) | CPU: 227 MPS: 200 GPU: 28 |
6.5 GB |
| Docling - suryaocr | Good (reads text well, gets table of contents mostly correct, gets double column layout mostly correct, gets tables correct) | CPU: 370 MPS: 358 GPU: 49 |
7.8 GB |
| Docling - Granite | Poor (reads text well but often swallows it, gets table of contents correct, gets double layout sometimes correct, misses tables) | CPU: 838 MPS: 127 GPU: 171 |
1.8 GB |
| marker | Good (reads text well, misses page numbers in table of contents, gets double column layout mostly correct, gets table entries correct) | CPU: 193 MPS: 168 GPU: 35 |
11.2 GB |
| MinerU | Good (reads text well, gets table of contents mostly correct, gets double column layout mostly correct, gets tables correct) | CPU: 263 MPS: 130 GPU: 60 |
4.4 GB |
| Document Intelligence | Very good (reads text well, gets table of contents correct, gets double column layout mostly correct, gets table entries correct) | 14 | 64 MB (processing happens in cloud) |
| LightOnOCR-2-1B | Excellent (reads text well, gets table of contents correct, gets double column layout correct, gets table entries correct) | CPU: 2440 MPS: 1881 GPU: 191 |
15 GB |
| Chandra OCR 2 | Excellent (reads text well, gets table of contents correct, gets double column layout correct, gets table entries correct) | CPU: X MPS: X GPU: 476 |
OOM on MacBook Air M4 |
| OCR-Library | Extraction Quality | Speed [seconds] | CPU Memory Usage |
|---|---|---|---|
| Docling - Tesseract | Good (reads text well, gets table of contents correct, does not structure table well) | CPU: 70 MPS: 68 GPU: 101 |
7.1 GB |
| Docling - EasyOCR | Very good (reads text well, gets table of contents correct, gets table entries correct) | CPU: 241 MPS: 108 GPU: 95 |
14 GB |
| Docling - RapidOCR | Very good (reads text well, gets table of contents correct, gets table entries correct) | CPU: 542 MPS: 511 GPU: 52 |
6.4 GB |
| Docling - suryaocr | Very good (reads text well, gets table of contents correct, gets table entries correct) | CPU: 712 MPS: 704 GPU: 270 |
8.2 GB |
| Docling - Granite | Good (reads text well, gets table of contents correct, misses some tables) | CPU: 153 MPS: 152 GPU: 597 |
1.7 GB |
| marker | Medium (reads text well, gets table of contents correct, mixes up table entries) | CPU: 479 MPS: 617 GPU: 143 |
12.1 GB |
| MinerU | Very good (reads text well, gets table of contents partially correct, gets table entries correct) | CPU: 544 MPS: 156 GPU: 88 |
4.8 GB |
| Document Intelligence | Excellent (reads text well, gets table of contents correct, gets table entries correct) | 10 | 96 MB (processing happens in cloud) |
| LightOnOCR-2-1B | Excellent (reads text well, gets table of contents correct, gets table entries correct) | CPU: 4187 MPS: 3987 GPU: 416 |
14.9 GB |
| Chandra OCR 2 | Excellent (reads text well, gets table of contents correct, gets table entries correct) | CPU: X MPS: X GPU: 863 |
OOM |