🌲 Agentic RAG System

Vectorless • Reasoning-Based • Human-Like Retrieval

📌 Overview

A scalable, agentic document intelligence system built on top of a locally deployed PageIndex instance for document parsing, designed to process long documents and enable reasoning-driven retrieval instead of vector similarity search.

🧠 What This Project Actually Does

❌ Most RAG Systems	✅ This System
Chunk documents	Builds a structured tree index (like a Table of Contents)
Store embeddings	Uses LLM reasoning to navigate the tree
Retrieve by similarity	Fetches only what is needed, when needed

⚡ Core Idea

Instead of asking:

"Which chunk is similar?"

We ask:

"Where should I look, and why?"

🏗️ System Architecture

1. User Layer

Upload document (PDF, long text)
Ask questions via chat

2. Validation Layer

File validation
Security checks
Format normalization

3. Async Processing (Queue System)

Upload goes into queue → Background processing:

Parsing — handled by locally deployed PageIndex
Structuring
Index generation

4. 🌲 PageIndex Tree Generation (Locally Deployed)

Documents are parsed and indexed by a self-hosted PageIndex instance, which converts them into a hierarchical tree structure:

{
  "title": "Section",
  "summary": "...",
  "nodes": []
}

Storage:

Tree Nodes → MongoDB
Raw Pages → S3

5. 🧠 Agentic Query System (LLM Reasoning)

Step 1: Intent Understanding

LLM decides:

Is this a simple question?
Does it require document reasoning?

Step 2: 🌲 Tree Navigation (Core Innovation)

Instead of vector search:

Traverse tree like a human
Section → Subsection → Page
Use summaries to guide decisions

Step 3: ⚡ Smart Retrieval Strategy

Scenario	Action
Simple query	Answer directly
Node-level sufficient	Fetch structured nodes
Deep reasoning needed	Fetch raw pages from S3

Step 4: 🔍 Cross-Node Reasoning

Combine multiple nodes
Use cross-page context
Perform multi-step reasoning

6. Response Generation

Relevant Nodes + Raw Context → LLM → Final Answer

🚀 Why This Is Different

❌ Traditional RAG Problems	✅ This System Solves That
Chunking breaks context	No Vector DB
Embeddings miss true relevance	No Chunking
Hard to explain retrieval	Reasoning-Based Retrieval
Expensive at scale	Explainable (traceable path in tree)
	Human-like navigation

🧩 Key Features

🌲 Tree-Based Indexing
🧠 LLM as Decision Engine
⚡ Adaptive Data Fetching
🔄 Cross-Page Reasoning
📦 Scalable Processing

🛠️ Tech Stack

Layer	Technology
Backend	Node.js / Express
Document Parser	PageIndex (locally deployed)
Queue	BullMQ / RabbitMQ
Storage	S3 (raw documents), MongoDB (tree index)
LLM	OpenAI / local models
Architecture	Agentic workflow

📊 Conceptual Flow

User Upload → Queue → PageIndex (Local) → Tree Index Creation
                                                    ↓
                  User Query → LLM Reasoning → Tree Traversal
                                                    ↓
                                        Fetch Nodes / Raw Pages
                                                    ↓
                                              Final Answer

🔬 Inspiration & Core Dependency

PageIndex — used as the local document parsing engine
Agentic retrieval systems
Human expert document navigation patterns

💡 Positioning

❌ This is NOT	✅ This IS
A chatbot	🧠 A reasoning-first retrieval system
A simple RAG pipeline	for long, complex documents

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
src		src
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌲 Agentic RAG System

Vectorless • Reasoning-Based • Human-Like Retrieval

📌 Overview

🧠 What This Project Actually Does

⚡ Core Idea

🏗️ System Architecture

1. User Layer

2. Validation Layer

3. Async Processing (Queue System)

4. 🌲 PageIndex Tree Generation (Locally Deployed)

5. 🧠 Agentic Query System (LLM Reasoning)

6. Response Generation

🚀 Why This Is Different

🧩 Key Features

🛠️ Tech Stack

📊 Conceptual Flow

🔬 Inspiration & Core Dependency

💡 Positioning

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🌲 Agentic RAG System

Vectorless • Reasoning-Based • Human-Like Retrieval

📌 Overview

🧠 What This Project Actually Does

⚡ Core Idea

🏗️ System Architecture

1. User Layer

2. Validation Layer

3. Async Processing (Queue System)

4. 🌲 PageIndex Tree Generation (Locally Deployed)

5. 🧠 Agentic Query System (LLM Reasoning)

6. Response Generation

🚀 Why This Is Different

🧩 Key Features

🛠️ Tech Stack

📊 Conceptual Flow

🔬 Inspiration & Core Dependency

💡 Positioning

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages