Add use_llm support to marker_server for OpenWebUI integration by bankjaneo · Pull Request #944 · datalab-to/marker

bankjaneo · 2025-11-15T06:23:39Z

Summary

Add use_llm support to marker_server API to enable OpenWebUI's "Use LLM" toggle for enhanced PDF processing with Large Language Models.

Changes

Add use_llm: bool = False field to CommonParams model in marker/scripts/server.py
Update /marker/upload endpoint to accept use_llm form parameter
LLM service configuration (API keys, models) is read from environment variables set in docker-compose

How It Works

When OpenWebUI sends "use_llm": true in the request:

The server accepts and passes the use_llm flag through the conversion pipeline
ConfigParser.get_llm_service() checks if use_llm=True and returns the configured LLM service
The converter uses the LLM service for higher quality OCR processing

Supported LLM Providers

Gemini (default)
OpenAI
Claude (Anthropic)
Ollama (local)
Vertex AI
Azure OpenAI

Environment Configuration

Users configure LLM credentials in their docker-compose.yml:

environment:
  - USE_LLM=true
  - OPENAI_API_KEY=your_key_here
  - OPENAI_MODEL=gpt-4o-mini

Or with other providers:

Gemini: GEMINI_API_KEY, GEMINI_MODEL_NAME
Claude: CLAUDE_API_KEY, CLAUDE_MODEL_NAME
Ollama: OLLAMA_BASE_URL, OLLAMA_MODEL

Testing

The feature can be tested:

Via OpenWebUI by toggling the "Use LLM" option (with credentials configured)
Via API: POST /marker/upload with use_llm=true form parameter
Via JSON: POST /marker with {"use_llm": true, ...}

🤖 Generated with Claude Code

- Add Dockerfile.gpu for NVIDIA CUDA-enabled deployment - Add Dockerfile.cpu for lightweight CPU-only deployment - Add docker-compose.yml with both GPU and CPU service configurations - Add docker-entrypoint.sh with comprehensive environment variable support - Add .dockerignore to optimize build context - Add README-DOCKER.md with complete documentation Features: - Persistent model caching via volume mounts - Configurable LLM services (OpenAI, Gemini, Claude, Azure, Ollama) - OpenWebUI integration with configurable options - Health checks and automatic restarts - Support for all processing options (force OCR, paginate, output format, etc.)

…11BSiezjtC46DyRTT5HPYCv Add Docker support with GPU and CPU variants

Update package name from libgdk-pixbuf2.0-0 to libgdk-pixbuf-2.0-0 in the CPU Dockerfile. The old package name is not available in Debian Trixie (used by python:3.11-slim base image), causing build failures.

Change package name from libgdk-pixbuf2.0-0 to libgdk-pixbuf-2.0-0 in the GPU Dockerfile to match the CPU Dockerfile and ensure consistency across both build configurations.

…encies-01Vo4EvYC9FWfzBtbsxNvVx8 Claude/fix cpu dockerfile dependencies 01 vo4 ev yc9 f wfz btbsx nv vx8

Previously, the CPU-only Dockerfile was downloading the default PyTorch wheel which includes CUDA libraries and nvidia-* packages, unnecessarily increasing image size and build time. This change explicitly installs PyTorch from the CPU-specific wheel index before the main marker installation, preventing pip from downloading the CUDA-enabled version. Benefits: - Smaller image size (hundreds of MBs saved) - Faster build time - No unnecessary nvidia packages in CPU-only builds

Updated Dockerfile.gpu to support Blackwell architecture GPUs (RTX 5060 Ti and other RTX 50-series) which require CUDA 12.8 and sm_120 compute capability. Changes: - Upgrade base image from CUDA 12.1.0 to CUDA 12.8.0 - Install PyTorch with cu128 wheels explicitly before main installation - Ensures compatibility with newer GPUs like RTX 5060 Ti Without these changes, RTX 50-series GPUs would fail with "CUDA capability sm_120 is not compatible" errors.

…LCLvF5FA8Rak29nXf4vhbr Claude/incomplete description 01 lc lv f5 fa8 rak29n xf4vhbr

…ere's what changed: **Summary of changes:** 1. **GPU service (marker-gpu)**: Replaced named volumes with a bind mount: - Removed: `marker-models:/root/.cache/huggingface` and `marker-torch:/root/.cache/torch` - Added: `./datalab:/root/.cache/datalab` 2. **CPU service (marker-cpu)**: Applied the same fix: - Removed: `marker-models-cpu:/root/.cache/huggingface` and `marker-torch-cpu:/root/.cache/torch` - Added: `./datalab:/root/.cache/datalab` 3. **Cleaned up volumes section**: Removed all unused named volume definitions (`marker-models`, `marker-torch`, `marker-models-cpu`, `marker-torch-cpu`) **Why this fixes the issue:** The previous configuration used named Docker volumes, which store data in Docker's managed volume storage (typically at `/var/lib/docker/volumes/`). Your changes to use a bind mount (`./datalab:/root/.cache/datalab`) ensures that downloaded model files are directly saved to your local `./datalab` directory, making them easily accessible and persistent across container restarts.

…cleaner and more correct - only bind mounts are defined where needed in each service's volume configuration.

Fix cache folder mapping in docker-compose.yml (vibe-kanban)

- Add use_llm field to CommonParams model to accept LLM toggle from OpenWebUI - Update /marker/upload endpoint to accept use_llm form parameter - LLM service configuration (API keys, models) is read from environment variables - When use_llm=True, ConfigParser.get_llm_service() returns configured LLM service - Supports all LLM providers: Gemini, OpenAI, Claude, Ollama, Vertex AI, Azure OpenAI This enables OpenWebUI's "Use LLM" toggle to work with self-hosted marker, providing higher quality processing when LLM API credentials are configured in the docker-compose environment. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

github-actions · 2025-11-15T06:23:50Z

CLA Assistant Lite bot:
Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.

I have read the CLA Document and I hereby sign the CLA

0 out of 2 committers have signed the CLA.
❌ @claude
❌ @bankjaneo
_{You can retrigger this bot by commenting recheck in this Pull Request}

claude and others added 12 commits November 15, 2025 02:25

Merge pull request #1 from bankjaneo/claude/dockerize-marker-server-0…

778eaa9

…11BSiezjtC46DyRTT5HPYCv Add Docker support with GPU and CPU variants

Fix libgdk-pixbuf package name for Debian Trixie compatibility

79788ed

Update package name from libgdk-pixbuf2.0-0 to libgdk-pixbuf-2.0-0 in the CPU Dockerfile. The old package name is not available in Debian Trixie (used by python:3.11-slim base image), causing build failures.

Update libgdk-pixbuf package name in GPU Dockerfile for consistency

b6eb6e8

Change package name from libgdk-pixbuf2.0-0 to libgdk-pixbuf-2.0-0 in the GPU Dockerfile to match the CPU Dockerfile and ensure consistency across both build configurations.

Merge pull request #2 from bankjaneo/claude/fix-cpu-dockerfile-depend…

7874b4e

…encies-01Vo4EvYC9FWfzBtbsxNvVx8 Claude/fix cpu dockerfile dependencies 01 vo4 ev yc9 f wfz btbsx nv vx8

Merge pull request #3 from bankjaneo/claude/incomplete-description-01…

e2d5fa0

…LCLvF5FA8Rak29nXf4vhbr Claude/incomplete description 01 lc lv f5 fa8 rak29n xf4vhbr

Perfect! The volumes: {} section has been removed. The file is now …

2e9dfa7

…cleaner and more correct - only bind mounts are defined where needed in each service's volume configuration.

Merge pull request #4 from bankjaneo/vk/2193-fix-cache-folder

5247f0b

Fix cache folder mapping in docker-compose.yml (vibe-kanban)

bankjaneo closed this Mar 4, 2026

github-actions bot locked and limited conversation to collaborators Mar 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add use_llm support to marker_server for OpenWebUI integration#944

Add use_llm support to marker_server for OpenWebUI integration#944
bankjaneo wants to merge 12 commits intodatalab-to:masterfrom
bankjaneo:vk/fd90-add-use-llm-supp

bankjaneo commented Nov 15, 2025

Uh oh!

github-actions bot commented Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bankjaneo commented Nov 15, 2025

Summary

Changes

How It Works

Supported LLM Providers

Environment Configuration

Testing

Uh oh!

github-actions bot commented Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants