OCR Studio is a React + FastAPI app for image/PDF OCR, with a CLI for
terminal-based OCR. It uses GLM-OCR via mlx-vlm on Apple Silicon.
- GLM-OCR prompt modes:
plain_ocr,table,formula - Image OCR and multi-page PDF OCR
- Output formats for PDF:
json,markdown,html,docx - CLI for running OCR directly from the terminal (no server required)
- Web UI with React frontend and FastAPI backend
- Apple Silicon friendly (no CUDA/NVIDIA requirement)
Two ways to use OCR Studio:
CLI: ./ocr-studio image|pdf --> mlx-vlm (in-process)
Web: React UI --> FastAPI API --> mlx_vlm.server (port 8080)
A single virtualenv (.venv) holds both mlx-vlm and the backend/CLI
dependencies. The install script handles everything:
./scripts/install-mlx-vlm.sh installThis creates .venv and installs mlx-vlm, PyMuPDF, python-docx,
markdown, Pillow, and other required packages.
The CLI loads the model in-process — no separate server to start:
# OCR a single image
./ocr-studio image photo.png
# OCR with table mode
./ocr-studio image scan.png --mode table
# Batch OCR multiple images to a directory
./ocr-studio image *.png --output results/
# OCR a PDF to markdown
./ocr-studio pdf document.pdf --format markdown --output result.md
# OCR a PDF to DOCX
./ocr-studio pdf document.pdf --format docx --output result.docx
# Quiet mode — only OCR output, no progress on stderr
./ocr-studio -q image photo.png > result.txtSee CLI Reference below for full details.
The .venv environment from step 1 already includes most dependencies.
Install the remaining backend packages into it:
.venv/bin/python -m pip install -r backend/requirements.txtConfigure the app:
cp .env.example .envStart all services:
./scripts/start-local.sh startCheck status:
./scripts/start-local.sh statusFrontend: http://localhost:3000
Backend: http://localhost:8000
Backend env keys:
GLM_OCR_API_URL(defaulthttp://localhost:8080/chat/completions)GLM_OCR_MODEL(defaultmlx-community/GLM-OCR-bf16)GLM_OCR_API_KEY(optional)
./ocr-studio <command> [options]
| Global Flag | Description |
|---|---|
-q, --quiet |
Suppress progress messages on stderr |
./ocr-studio image <files...> [--mode MODE] [--model MODEL] [--output PATH]
| Flag | Description | Default |
|---|---|---|
files |
One or more image files (positional) | required |
--mode |
plain_ocr, table, or formula |
plain_ocr |
--model |
HuggingFace model ID | mlx-community/GLM-OCR-bf16 |
--output |
File or directory (stdout if omitted) | stdout |
When --output is a directory, each input file produces a .txt file in that
directory. When it is a file path, all results are written to that single file.
./ocr-studio pdf <file> [--mode MODE] [--format FMT] [--model MODEL] [--output PATH] [--dpi DPI]
| Flag | Description | Default |
|---|---|---|
file |
PDF file (positional) | required |
--mode |
plain_ocr, table, or formula |
plain_ocr |
--format |
json, markdown, html, or docx |
markdown |
--model |
HuggingFace model ID | mlx-community/GLM-OCR-bf16 |
--output |
Output file (stdout if omitted; required for docx) |
stdout |
--dpi |
PDF rendering resolution | 144 |
.env.example:
# API
API_HOST=0.0.0.0
API_PORT=8000
# Frontend
FRONTEND_PORT=3000
# GLM OCR proxy
GLM_OCR_API_URL=http://localhost:8080/chat/completions
GLM_OCR_MODEL=mlx-community/GLM-OCR-bf16
GLM_OCR_API_KEY=
# Upload
MAX_UPLOAD_SIZE_MB=100Frontend proxy target (optional):
- By default Vite proxies
/apitohttp://localhost:8000 - Override with
VITE_PROXY_TARGETif backend is elsewhere
Form fields:
image(required)mode(plain_ocr|table|formula)
Response:
{
"success": true,
"text": "...",
"raw_text": "...",
"image_dims": { "w": 1024, "h": 768 },
"metadata": { "mode": "plain_ocr" }
}Form fields:
pdf_file(required)mode(plain_ocr|table|formula)output_format(json|markdown|html|docx)dpi(optional int)
Backend sanity check:
python3 -m py_compile backend/*.py
python3 -m py_compile cli/ocr_cli.pyBackend tests (if virtualenv is set up):
.venv/bin/python -m pytest backend/tests -qFrontend build:
cd frontend
npm install
npm run buildFull local smoke test:
./scripts/smoke-test-local.shStart/stop local app services:
./scripts/start-local.sh check
./scripts/start-local.sh start
./scripts/start-local.sh status
./scripts/start-local.sh stopInstall mlx-vlm and CLI dependencies into .venv:
./scripts/install-mlx-vlm.sh install
./scripts/install-mlx-vlm.sh check- CLI: first run is slow. The model loads into memory on each invocation (~10-30s). Subsequent image/page processing is fast. For batch work, pass multiple files in one command to amortize the load time.
- Web: first MLX request can be slow due to warm-up/shader compilation.
- During model warm-up, OCR requests may return
503briefly. Retry after a few seconds. - If OCR endpoint is unavailable,
/healthstill responds but OCR calls return upstream errors. - If you see large bundle warnings in Vite, they are non-blocking for production build.
Licensed under MIT. See LICENSE.