Skip to content
#

document-chunking

Here are 14 public repositories matching this topic...

Open-source toolkit for RAG chunking: convert Markdown, validate documents, visualize and optimize chunking strategies, and enrich results for LLM applications.

  • Updated May 27, 2026
  • Python

A lightweight Python library for metadata-rich document chunking in Retrieval-Augmented Generation (RAG) workflows. It leverages Azure AI Document Intelligence to enhance chunking by retaining hierarchical structure, page numbers, and bounding boxes for seamless integration with PDF viewers.

  • Updated Jan 11, 2025
  • Python

KChunker is a lightweight, ultra-fast document parsing and chunking engine designed for RAG systems. It intelligently structures native/scanned PDFs, Excel files, Word documents, and email trails by preserving layout hierarchy, extracting tables, and generating dense vector embeddings for local search databases (ChromaDB and FAISS)

  • Updated May 22, 2026
  • Python

Improve this page

Add a description, image, and links to the document-chunking topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the document-chunking topic, visit your repo's landing page and select "manage topics."

Learn more