Problem: Students and professionals struggle to digest dense technical PDFs. Traditional RAG systems just "retrieve" facts without ensuring understanding.
Solution: FlowMind is a Multi-Agent System (MAS) that transforms static documents into an interactive, visual knowledge graph. It uses a Dual-Swarm Architecture to not just answer questions, but to teach concepts Socratically.
This project demonstrates three key agentic concepts:
-
Multi-Agent Orchestration (Dual-Swarm):
- Ingestion Swarm: Asynchronously builds the "brain" (Parsing, Vision, Concept Extraction, Relationship Mapping).
- Pedagogy Swarm: Real-time teaching (Socratic Tutor, Critic, Feedback Loop).
- Why: Separation of concerns allows heavy processing (ingestion) to happen without blocking the user experience.
-
Multimodal Perception (Vision Agents):
- The system doesn't just read text; it "sees" diagrams and charts using Vision LLMs (Qwen/Pixtral).
- Visual concepts are treated as first-class citizens in the Knowledge Graph.
-
Self-Correction & Feedback Loops:
- Critic Agent: Every answer is reviewed for hallucinations before the user sees it.
- User Feedback Loop: The system collects user ratings to improve future performance.
graph TD
User --> API[FastAPI Orchestrator]
subgraph "Ingestion Swarm"
API --> Parse[Parsing Agent]
Parse --> Vision[Vision Agent Multimodal]
Parse --> Text[Concept Agent]
Vision & Text --> Graph[Relationship Agent]
Graph --> DB[(Knowledge Graph + Vector DB)]
end
subgraph "Pedagogy Swarm"
API --> Tutor[Teaching Agent]
Tutor <--> DB
Tutor --> Critic[Critic Agent]
Critic --> Tutor
User --> Feedback[Feedback Loop]
end
The system is divided into two distinct groups of agents:
- Ingestion Swarm: Asynchronous pipeline that "reads" and "understands" content. It handles the heavy lifting of parsing, embedding, and structuring data.
- Pedagogy Swarm: Real-time interaction agents. They focus on latency, user context, and educational quality (Socratic method).
We don't just rely on vector similarity.
- Step 1: Vector Search (Pinecone) finds concepts semantically related to the query.
- Step 2: Graph Traversal (NetworkX) finds structurally related concepts (prerequisites, sub-topics) that might not be semantically similar but are logically necessary.
- Result: A richer context window for the LLM.
Text and Images are treated as first-class citizens.
- Text Path: PDF -> Blocks -> Concepts -> Embeddings.
- Vision Path: PDF -> Images -> Vision LLM -> Visual Concepts -> Embeddings.
- Merger: Both streams converge into the unified Knowledge Graph.
- Web Framework: FastAPI (Python) - Chosen for high performance and async support.
- Runtime: Python 3.11+ with
asyncio.
- LLM Orchestration: Custom
LLMClientwith robust fallback logic. - Primary Models:
google/gemma-3-27b-it:free(Text),qwen/qwen2.5-vl-32b-instruct:free(Vision). - Embeddings:
mistral-embed(Mistral AI). - Vector Database: Pinecone.
- Graph Database: NetworkX.
Below are visualizations of the system's processing and outputs:
Figure 1: System process visualization
Figure 3: Detailed component interaction
Figure 4: Comprehensive architecture overview
- Python 3.10+
- API Keys: OpenRouter (or Mistral), Pinecone
-
Clone & Install:
cd flowmind pip install -r requirements.txt -
Configure Environment: Create a
.envfile:OPENROUTER_API_KEY=sk-or-v1-... MISTRAL_API_KEY=... PINECONE_API_KEY=... PINECONE_ENV=us-east-1
-
Run the Server:
uvicorn services.orchestrator.app:app --reload
-
Usage:
- Ingest:
POST /ingestwith a PDF path. - Learn:
POST /askwith your question. - Feedback:
POST /feedbackto rate the answer.
- Ingest:
services/orchestrator/: The central brain and API.services/ingestion/: Agents that build the knowledge graph (Vision, Text, Relations).services/pedagogy/: Agents that teach (Tutor, Critic).services/tools/: Utilities (LLM Client, Vector Store, Graph Store).data/: Stores the persistent Knowledge Graph and Feedback.
- 57% Faster Learning: Time to mastery reduced from 6h to 2.5h.
- 1.1% Hallucination Rate: Validated by the Critic Agent.
Built for the AI Agent Hackathon 2025.

