Skip to content

Kreuzberg.dev is a fast, polyglot document intelligence engine with a Rust core. It extracts structured data from 56+ document formats using streaming parsers and built-in OCR. Designed for RAG pipelines, batch workloads, and production deployments.

Notifications You must be signed in to change notification settings

kreuzberg-dev/.github

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 

Repository files navigation

πŸŒ‰ Kreuzberg

The fastest Document Intelligence engine for RAG Developers: Open-Source and Cloud

Linkedin- Banner

Kreuzberg is a polyglot document intelligence framework with a fast Rust core. We build tools that help developers extract, process, and understand documents at scale, from PDFs to Office files, images, archives, emails, in 50+ formats.

We're setting out to make high-performance document intelligence faster, cheaper, and more ecological.

What is Kreuzberg

1. Kreuzberg (Open Source MIT Licensed)

A polyglot document intelligence engine

  • βœ“ Rust core
  • βœ“ Bindings for Python, TypeScript/Node.js, Ruby, Go, Java, C#, PHP, Elixir
  • βœ“ OCR with table extraction
  • βœ“ Streaming parsers for multi-GB files
  • βœ“ Built-in chunking + embeddings for RAG
  • βœ“ CLI, REST API, Docker, MCP server
  • Read More here: https://kreuzberg.dev/

2. Kreuzberg Cloud (Coming Soon)

A fully managed document intelligence API. Same engine, zero setup.

Planned features:

  • Hosted REST API
  • Async jobs + webhooks
  • Built-in chunking for RAG pipelines
  • Premium OCR backends
  • Usage dashboard & analytics
  • Simple pay-as-you-go pricing

3. html-to-markdown library

High-performance HTML β†’ Markdown conversion powered by Rust. Shipping as a Rust crate, Python package, PHP extension, Ruby gem, Elixir Rustler NIF, Node.js bindings, WebAssembly, and standalone CLI with identical rendering behaviour.

Why Devs Choose Kreuzberg

  • Truly polyglot β€” Python, Rust, JS, Ruby, Go, Java, C#, PHP, Elixir.
  • High throughput β€” Optimized for batch workloads and multi-GB documents.
  • Memory efficient β€” Streaming architecture keeps RAM usage constant.
  • Flexible deployment β€” Use as library, CLI, Docker image, or REST API.
  • MIT license β€” Safe for enterprise, commercial use, and closed-source products.
  • Built for RAG β€” Native chunking + embeddings with full customization.

🌍 Community

Join our dev community, ask questions, and share what you’re building.

πŸ”§ Contribution Guide

Contributions are welcome! We follow a simple workflow:

  1. Open an issue to propose changes
  2. Submit a PR
  3. Maintainers review and merge

Please see CONTRIBUTING.md in the respective repos for detailed guidelines. Kreuzberg.dev repo https://github.com/kreuzberg-dev/kreuzberg

πŸ“œ License

All open-source code is MIT licensed. It’s permissive, enterprise-safe, and commercial-friendly.

❀️ Maintainers

Built with love in the heart of the creative and gritty district of Kreuzberg, Berlin

About

Kreuzberg.dev is a fast, polyglot document intelligence engine with a Rust core. It extracts structured data from 56+ document formats using streaming parsers and built-in OCR. Designed for RAG pipelines, batch workloads, and production deployments.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •