PaddleOCR-VL-1.6 (0.9B) serverless worker for RunPod. Supports document parsing: OCR, table recognition, formula recognition, chart recognition, text spotting, and seal recognition.
runpod-paddleocr/
├── .runpod/
│ ├── hub.json # RunPod Hub metadata & config
│ └── tests.json # Hub test cases
├── Dockerfile # Docker build instructions (CUDA 12.6)
├── requirements.txt # Python dependencies
├── .dockerignore # Build exclusions
├── handler.py # RunPod serverless handler
├── test_input.json # Local testing fixture
├── README.md # This file
└── docs/
└── PADDLEOCR_SERVERLESS_PLAN.md # Architecture & design plan
- Docker
- Docker Hub account (or any container registry)
- RunPod account
- NVIDIA GPU (16GB+ VRAM recommended, 24GB+ for large documents)
# Build image
docker build --platform linux/amd64 -t yourdockerhub/runpod-paddleocr:v1 .
# Push to registry
docker push yourdockerhub/runpod-paddleocr:v1- Go to the RunPod Hub and search for PaddleOCR-VL-1.6
- Click Deploy and configure your endpoint settings
- Choose a preset or customize environment variables
- Click Deploy Endpoint — RunPod handles the build and deployment automatically
# Build image
docker build --platform linux/amd64 -t yourdockerhub/runpod-paddleocr:v1 .
# Push to registry
docker push yourdockerhub/runpod-paddleocr:v1Then in the RunPod Console:
- Go to Serverless → New Endpoint
- Click Import from Docker Registry
- Enter:
docker.io/yourdockerhub/runpod-paddleocr:v1 - Configure:
- GPU: L4 / A5000 / 3090 / A6000 (24GB+ recommended)
- Container Disk: 20 GB
- Execution Timeout: 300 seconds
- Active Workers: 0 (or 1 for zero cold start)
- FlashBoot: Enabled
- Add any environment variables as needed
- Click Deploy Endpoint
{
"input": {
"image": "https://example.com/document.png",
"tasks": "ocr,table,formula",
"output_format": "markdown"
}
}By default, image blocks (photos, embedded images) are skipped and referenced as image placeholders. To extract every text including content inside images:
{
"input": {
"pdf": "https://example.com/document.pdf",
"tasks": "auto",
"output_format": "markdown",
"use_ocr_for_image_block": true,
"format_block_content": true
}
}Add "use_seal_recognition": true if the document contains stamps or seals.
Alternatively, set these as environment variables on your endpoint for permanent defaults:
USE_OCR_FOR_IMAGE_BLOCK=trueFORMAT_BLOCK_CONTENT=trueUSE_SEAL_RECOGNITION=true
| Parameter | Type | Default | Description |
|---|---|---|---|
image |
string | - | URL or base64 data URI of an image |
images |
string[] | - | Array of image URLs/base64 |
pdf |
string | - | URL or base64 data URI of a PDF |
tasks |
string | auto |
Comma-separated tasks: ocr, table, formula, chart, spotting, seal, or auto |
output_format |
string | json |
Output format: json, markdown, or both |
max_new_tokens |
int | 512 |
Maximum generation tokens |
use_ocr_for_image_block |
bool | false |
Extract text from image blocks instead of skipping them |
format_block_content |
bool | true |
Format block content as Markdown (vs raw output) |
use_seal_recognition |
bool | false |
Enable seal/stamp text recognition |
{
"status": "completed",
"results": [
{
"source": "https://example.com/document.png",
"pages": [
{
"page": 0,
"json": { ... },
"markdown": "# Document Title\n\n...",
"source": "https://example.com/document.png",
"source_type": "image"
}
]
}
],
"total_sources": 1,
"pipeline_version": "v1.6"
}All configuration is done via environment variables. Set them in the RunPod console when creating/editing your endpoint.
| Variable | Default | Description |
|---|---|---|
PIPELINE_VERSION |
v1.6 |
PaddleOCR-VL pipeline version (v1.5, v1.6) |
DEFAULT_TASKS |
auto |
Default task list: ocr, table, formula, chart, spotting, seal, auto |
OUTPUT_FORMAT |
json |
Default output format: json, markdown, both |
MAX_NEW_TOKENS |
512 |
Maximum tokens for text generation |
HF_TOKEN |
- | HuggingFace token for gated/private models |
MODEL_CACHE_DIR |
- | Custom model cache path (e.g., /runpod-volume/huggingface-cache) |
USE_OCR_FOR_IMAGE_BLOCK |
false |
Extract text from image blocks (overridable per-job) |
FORMAT_BLOCK_CONTENT |
true |
Format block content as Markdown (overridable per-job) |
USE_SEAL_RECOGNITION |
false |
Enable seal/stamp recognition (overridable per-job or via seal task) |
| Variable | Default | Description |
|---|---|---|
CONCURRENT_WORKERS |
1 |
Number of concurrent requests per worker. Increase for small/rapid OCR jobs. Monitor GPU memory. |
# Test with test_input.json
python handler.py
# Test with custom input
python handler.py --test_input '{"input": {"image": "https://example.com/doc.png"}}'import requests
import json
url = "https://api.runpod.ai/v2/YOUR_ENDPOINT_ID/runsync"
headers = {
"Authorization": "Bearer YOUR_RUNPOD_API_KEY",
"Content-Type": "application/json"
}
payload = {
"input": {
"image": "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png",
"tasks": "ocr,table,formula",
"output_format": "markdown"
}
}
resp = requests.post(url, headers=headers, json=payload)
result = resp.json()
print(json.dumps(result, indent=2))To use RunPod's model caching for faster cold starts:
- When creating your endpoint, set Model to
PaddlePaddle/PaddleOCR-VL-1.6 - Set
HF_TOKENenv variable if required - The handler auto-detects cached models at
/runpod-volume/huggingface-cache/
| GPU | VRAM | Suitability |
|---|---|---|
| L4 / A5000 / RTX 3090 | 24 GB | Good for most documents |
| A6000 / A40 | 48 GB | Heavy multi-page documents |
| A100 | 80 GB | Batch processing |
- Results are returned inline (max 10 MB for
/run, 20 MB for/runsync) - For large outputs, configure S3 environment variables in RunPod console
- Temporary files are cleaned up automatically after each job