Skip to content

Sliky1/reportpilot-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

🤖 Multi-Agent Research Report System

An intelligent research report generation platform powered by LangGraph, RAG, and real-time web search — featuring multi-agent review loops, streaming output, and a fully interactive Streamlit UI.

Python Streamlit LangGraph License


📖 Overview

This system combines RAG (Retrieval-Augmented Generation), live web search, and a LangGraph-orchestrated multi-agent pipeline to automatically generate, review, and iteratively refine professional industry analysis reports.

A report goes through up to 4 rounds of AI review by different expert personas (industry analyst, investor, tech expert, editor). Each reviewer outputs structured JSON feedback. The writer agent performs targeted, diff-based revisions rather than rewriting the entire report — saving tokens and improving quality.


✨ Features

Core Pipeline

  • RAG Retrieval — Loads local documents (PDF, TXT, MD, DOCX) into a FAISS vector store; retrieves relevant chunks using MMR (Maximum Marginal Relevance) to maximize diversity and reduce redundancy
  • Live Web Search — Integrates Tavily Search API for real-time information; results are deduplicated by title and truncated to 500 chars per source
  • Multi-Agent Review Loop — 4 expert reviewer personas evaluate each draft and return structured JSON { verdict, score, issues, suggestions }; the pipeline routes back to the writer on REJECT or terminates on PASS
  • Diff-Based Rewriting — On revision rounds, the writer only addresses specific issues flagged by the reviewer instead of regenerating the full report
  • Long Report Summarization — Reports exceeding 1500 characters are compressed to a 300-word summary before being passed to the reviewer, avoiding hard truncation

Streaming & Animation

  • Throttled Streaming Output — Writer and reviewer nodes stream token-by-token; UI refreshes at 60ms intervals to balance smoothness and performance
  • CSS Animation System — All node status badges, cursors, fade-ins, and review cards are driven by @keyframes for fluid transitions
  • Animated Status Badges — Each pipeline node shows idle / running (pulsing) / complete states with CSS-animated HTML badges

Interactive UI

  • Pre-save Draft Editing — Before saving, users can expand an editor panel to manually modify the report content
  • Reviewer Score Dashboard — After generation, a visual score board displays each reviewer's score (1–10), PASS/REJECT verdict, and a progress bar
  • Custom Reviewer Personas — Users can override any of the 4 reviewer names and role descriptions from the sidebar; reset button restores defaults
  • History Search — Sidebar search box filters saved reports by keyword in real-time
  • Draft History Accordion — Previous draft rounds are collapsible, showing the draft content and structured review feedback side by side
  • Multi-format Download — Export the final report as .md, .txt, or .json (with metadata)

Reliability & Stability

  • LLM Auto-Retry — All LLM calls use exponential backoff retry (up to 2 retries: 1s → 2s) to handle transient API errors gracefully
  • Checkpoint Recovery — Generation progress is saved to temp_progress.json after each completed draft/review round; a recovery banner appears on page reload if an unfinished session is detected
  • Structured Output Parsing — Reviewer output is parsed as JSON with a regex fallback, eliminating raw JSON display bugs and unreliable string matching

🏗️ Architecture

User Input (Topic)
      │
      ▼
┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│rag_retriever│────▶│  researcher │────▶│   writer   │────▶│  reviewer   │
│(FAISS MMR)  │     │(Tavily API) │     │(DeepSeek)   │     │(DeepSeek)   │
└─────────────┘     └─────────────┘     └─────────────┘     └──────┬──────┘
                                              ▲                    │
                                              │    REJECT          │ PASS / round>=4
                                              └────────────────────┤
                                                                   ▼
                                                               Final Report

State fields: topic · local_knowledge · research · draft · feedback (formatted string) · feedback_parsed (structured dict) · revision_count


🚀 Quick Start

1. Clone & Install

git clone https://github.com/your-username/multi-agent-report-system.git
cd multi-agent-report-system
pip install -r requirements.txt

2. Configure Environment

Create a .env file in the project root:

DEEPSEEK_API_KEY=your_deepseek_api_key
TAVILY_API_KEY=your_tavily_api_key

3. Run

streamlit run app.py

📦 Requirements

streamlit>=1.32
langchain-deepseek
langchain-core
langchain-community
langchain-tavily
langchain-text-splitters
langgraph>=0.2
faiss-cpu
huggingface-hub
sentence-transformers
python-dotenv
pypdf
docx2txt

🔄 Replacing Default APIs

Replace DeepSeek → Other LLMs

# Option A: OpenAI GPT-4o
from langchain_openai import ChatOpenAI

@st.cache_resource
def init_llm():
    return ChatOpenAI(model="gpt-4o", temperature=0.7,
                      api_key=os.getenv("OPENAI_API_KEY"))

# Option B: Anthropic Claude
from langchain_anthropic import ChatAnthropic

@st.cache_resource
def init_llm():
    return ChatAnthropic(model="claude-3-5-sonnet-20241022", temperature=0.7,
                         api_key=os.getenv("ANTHROPIC_API_KEY"))

# Option C: Local Ollama (no API key needed)
from langchain_ollama import ChatOllama

@st.cache_resource
def init_llm():
    return ChatOllama(model="qwen2.5:14b", temperature=0.7)
LLM Package Env Variable
OpenAI langchain-openai OPENAI_API_KEY
Anthropic langchain-anthropic ANTHROPIC_API_KEY
Ollama (local) langchain-ollama (none required)
Azure OpenAI langchain-openai AZURE_OPENAI_API_KEY

Replace Tavily → Other Search APIs

# Option A: DuckDuckGo (free, no API key)
from langchain_community.tools import DuckDuckGoSearchRun

@st.cache_resource
def init_search():
    return DuckDuckGoSearchRun()

# Option B: SerpAPI (Google Search)
from langchain_community.utilities import SerpAPIWrapper

@st.cache_resource
def init_search():
    return SerpAPIWrapper(serpapi_api_key=os.getenv("SERPAPI_API_KEY"))

# Option C: Bing Search
from langchain_community.utilities import BingSearchAPIWrapper

@st.cache_resource
def init_search():
    return BingSearchAPIWrapper(bing_subscription_key=os.getenv("BING_API_KEY"))

Note: When using DuckDuckGo, update format_research() since it returns a plain string instead of a list of result dicts.


Replace Embedding Model

# Option A: OpenAI text-embedding-3-small
from langchain_openai import OpenAIEmbeddings

@st.cache_resource
def init_embeddings():
    return OpenAIEmbeddings(model="text-embedding-3-small",
                            api_key=os.getenv("OPENAI_API_KEY"))

# Option B: English-focused local model
from langchain_community.embeddings import HuggingFaceEmbeddings

@st.cache_resource
def init_embeddings():
    return HuggingFaceEmbeddings(model_name="BAAI/bge-base-en-v1.5")

📁 Project Structure

multi-agent-report-system/
├── app.py                  # Main application (single-file)
├── .env                    # API keys (not committed)
├── requirements.txt
├── faiss_index/            # Auto-generated vector store
│   ├── index.faiss
│   └── index.pkl
├── reports/                # Saved reports
│   ├── 20260313_120000_topic.md
│   └── 20260313_120000_topic_full.json
└── temp_progress.json      # Checkpoint file (auto-deleted on save)

🗺️ Roadmap

Near-term

  • Export to DOCX / PDF
  • Streaming reviewer display with loading state
  • Topic suggestions based on knowledge base content

Mid-term

  • Hybrid retrieval — FAISS vector search + BM25 keyword search
  • Citation tracing — annotate each paragraph with its source document or URL
  • Incremental knowledge base updates — re-index only new or modified files
  • Planner Agent — decompose complex topics into sub-questions for parallel retrieval

Long-term

  • Multi-modal input — parse charts and tables from uploaded images via vision model
  • Async pipeline — parallel writer + researcher nodes (~40% faster generation)
  • User accounts & database — SQLite → PostgreSQL with multi-user isolation
  • Automated quality metrics — ROUGE / BERTScore tracking over time
  • Knowledge graph — entity-relationship reasoning across documents

🤝 Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/your-feature)
  3. Commit your changes (git commit -m 'Add some feature')
  4. Push to the branch (git push origin feature/your-feature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License.


🙏 Acknowledgements

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages