RunPod PaddleOCR-VL Serverless Worker

RunPod PaddleOCR-VL Serverless Worker

PaddleOCR-VL-1.6 (0.9B) serverless worker for RunPod. Supports document parsing: OCR, table recognition, formula recognition, chart recognition, text spotting, and seal recognition.

Project Structure

runpod-paddleocr/
├── .runpod/
│   ├── hub.json            # RunPod Hub metadata & config
│   └── tests.json          # Hub test cases
├── Dockerfile              # Docker build instructions (CUDA 12.6)
├── requirements.txt        # Python dependencies
├── .dockerignore           # Build exclusions
├── handler.py              # RunPod serverless handler
├── test_input.json         # Local testing fixture
├── README.md               # This file
└── docs/
    └── PADDLEOCR_SERVERLESS_PLAN.md  # Architecture & design plan

Quick Start

Prerequisites

Docker
Docker Hub account (or any container registry)
RunPod account
NVIDIA GPU (16GB+ VRAM recommended, 24GB+ for large documents)

Build & Deploy

# Build image
docker build --platform linux/amd64 -t yourdockerhub/runpod-paddleocr:v1 .

# Push to registry
docker push yourdockerhub/runpod-paddleocr:v1

Deploy from RunPod Hub (Recommended)

Go to the RunPod Hub and search for PaddleOCR-VL-1.6
Click Deploy and configure your endpoint settings
Choose a preset or customize environment variables
Click Deploy Endpoint — RunPod handles the build and deployment automatically

Manual Deploy from Docker Registry

# Build image
docker build --platform linux/amd64 -t yourdockerhub/runpod-paddleocr:v1 .

# Push to registry
docker push yourdockerhub/runpod-paddleocr:v1

Then in the RunPod Console:

Go to Serverless → New Endpoint
Click Import from Docker Registry
Enter: docker.io/yourdockerhub/runpod-paddleocr:v1
Configure:
- GPU: L4 / A5000 / 3090 / A6000 (24GB+ recommended)
- Container Disk: 20 GB
- Execution Timeout: 300 seconds
- Active Workers: 0 (or 1 for zero cold start)
- FlashBoot: Enabled
Add any environment variables as needed
Click Deploy Endpoint

API Usage

Request Format

{
  "input": {
    "image": "https://example.com/document.png",
    "tasks": "ocr,table,formula",
    "output_format": "markdown"
  }
}

Full Scan (Extract All Text)

By default, image blocks (photos, embedded images) are skipped and referenced as image placeholders. To extract every text including content inside images:

{
  "input": {
    "pdf": "https://example.com/document.pdf",
    "tasks": "auto",
    "output_format": "markdown",
    "use_ocr_for_image_block": true,
    "format_block_content": true
  }
}

Add "use_seal_recognition": true if the document contains stamps or seals.

Alternatively, set these as environment variables on your endpoint for permanent defaults:

USE_OCR_FOR_IMAGE_BLOCK=true
FORMAT_BLOCK_CONTENT=true
USE_SEAL_RECOGNITION=true

Parameters

Parameter	Type	Default	Description
`image`	string	-	URL or base64 data URI of an image
`images`	string[]	-	Array of image URLs/base64
`pdf`	string	-	URL or base64 data URI of a PDF
`tasks`	string	`auto`	Comma-separated tasks: `ocr`, `table`, `formula`, `chart`, `spotting`, `seal`, or `auto`
`output_format`	string	`json`	Output format: `json`, `markdown`, or `both`
`max_new_tokens`	int	`512`	Maximum generation tokens
`use_ocr_for_image_block`	bool	`false`	Extract text from image blocks instead of skipping them
`format_block_content`	bool	`true`	Format block content as Markdown (vs raw output)
`use_seal_recognition`	bool	`false`	Enable seal/stamp text recognition

Response Format

{
  "status": "completed",
  "results": [
    {
      "source": "https://example.com/document.png",
      "pages": [
        {
          "page": 0,
          "json": { ... },
          "markdown": "# Document Title\n\n...",
          "source": "https://example.com/document.png",
          "source_type": "image"
        }
      ]
    }
  ],
  "total_sources": 1,
  "pipeline_version": "v1.6"
}

Environment Variables

All configuration is done via environment variables. Set them in the RunPod console when creating/editing your endpoint.

Model Configuration

Variable	Default	Description
`PIPELINE_VERSION`	`v1.6`	PaddleOCR-VL pipeline version (`v1.5`, `v1.6`)
`DEFAULT_TASKS`	`auto`	Default task list: `ocr`, `table`, `formula`, `chart`, `spotting`, `seal`, `auto`
`OUTPUT_FORMAT`	`json`	Default output format: `json`, `markdown`, `both`
`MAX_NEW_TOKENS`	`512`	Maximum tokens for text generation
`HF_TOKEN`	-	HuggingFace token for gated/private models
`MODEL_CACHE_DIR`	-	Custom model cache path (e.g., `/runpod-volume/huggingface-cache`)
`USE_OCR_FOR_IMAGE_BLOCK`	`false`	Extract text from image blocks (overridable per-job)
`FORMAT_BLOCK_CONTENT`	`true`	Format block content as Markdown (overridable per-job)
`USE_SEAL_RECOGNITION`	`false`	Enable seal/stamp recognition (overridable per-job or via `seal` task)

Performance

Variable	Default	Description
`CONCURRENT_WORKERS`	`1`	Number of concurrent requests per worker. Increase for small/rapid OCR jobs. Monitor GPU memory.

Local Testing

# Test with test_input.json
python handler.py

# Test with custom input
python handler.py --test_input '{"input": {"image": "https://example.com/doc.png"}}'

Example Client (Python)

import requests
import json

url = "https://api.runpod.ai/v2/YOUR_ENDPOINT_ID/runsync"
headers = {
    "Authorization": "Bearer YOUR_RUNPOD_API_KEY",
    "Content-Type": "application/json"
}
payload = {
    "input": {
        "image": "https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png",
        "tasks": "ocr,table,formula",
        "output_format": "markdown"
    }
}

resp = requests.post(url, headers=headers, json=payload)
result = resp.json()
print(json.dumps(result, indent=2))

Model Caching (RunPod)

To use RunPod's model caching for faster cold starts:

When creating your endpoint, set Model to PaddlePaddle/PaddleOCR-VL-1.6
Set HF_TOKEN env variable if required
The handler auto-detects cached models at /runpod-volume/huggingface-cache/

Recommended GPU Tiers

GPU	VRAM	Suitability
L4 / A5000 / RTX 3090	24 GB	Good for most documents
A6000 / A40	48 GB	Heavy multi-page documents
A100	80 GB	Batch processing

Notes

Results are returned inline (max 10 MB for /run, 20 MB for /runsync)
For large outputs, configure S3 environment variables in RunPod console
Temporary files are cleaned up automatically after each job

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RunPod PaddleOCR-VL Serverless Worker

Project Structure

Quick Start

Prerequisites

Build & Deploy

Deploy from RunPod Hub (Recommended)

Manual Deploy from Docker Registry

API Usage

Request Format

Full Scan (Extract All Text)

Parameters

Response Format

Environment Variables

Model Configuration

Performance

Local Testing

Example Client (Python)

Model Caching (RunPod)

Recommended GPU Tiers

Notes

About

Uh oh!

Releases 13

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.runpod		.runpod
builder		builder
docs		docs
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
handler.py		handler.py
requirements.txt		requirements.txt
test_input.json		test_input.json

Folders and files

Latest commit

History

Repository files navigation

RunPod PaddleOCR-VL Serverless Worker

Project Structure

Quick Start

Prerequisites

Build & Deploy

Deploy from RunPod Hub (Recommended)

Manual Deploy from Docker Registry

API Usage

Request Format

Full Scan (Extract All Text)

Parameters

Response Format

Environment Variables

Model Configuration

Performance

Local Testing

Example Client (Python)

Model Caching (RunPod)

Recommended GPU Tiers

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 13

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages