Skip to content

Ewendawi/Paper-Background-Agent

Repository files navigation

Paper BG Agent

Intelligent multi-agent system for academic paper discovery, extraction, and analysis

About

Paper BG Agent is an AI-powered research assistant designed to help researchers understand the intellectual context of academic papers. By analyzing a paper's introduction, related work, and bibliographic references, the system produces structured summaries that explain problem evolution, competing paradigms, research motivations, and gaps in the literature.

Whether you're exploring a new research area, conducting a literature review, or trying to understand the intellectual genealogy of a paper, Paper BG Agent streamlines the process of discovering, importing, extracting, and analyzing academic papers.

Features

  • Paper Discovery: Search across multiple academic databases (Semantic Scholar, arXiv, DOI, OpenAlex) with a single query
  • Content Extraction: Automatically extract Introduction, Related Work, and other key sections from PDFs using GROBID
  • Deep Analysis: AI-powered analysis that reveals problem timelines, competing paradigms, motivations, and research gaps
  • Reference Management: Automatically discover and import cited papers from bibliographies
  • Multi-Modal Input: Import papers via DOI, arXiv ID, direct URL, title search, or keywords
  • Multiple Interfaces: Use via Web UI, MCP server or A2A protocol

Analysis Output

The analysis agent produces structured outputs including:

  1. Problem Evolution Timeline - How the research problem has evolved
  2. Paradigm Map - Competing approaches and their relationships
  3. Motivation and Implicit Critique - What each paper responds to
  4. Gaps, Silent Assumptions, and Open Questions - Missing perspectives
  5. Glossary - Key concepts explained for newcomers

Note: an example analysis output can be found in the assets/example/ directory.

Architecture Overview

Paper BG Agent uses a multi-agent orchestrator pattern with four specialized agents:

Pattern: Central orchestrator routes requests to specialized agents. Agents communicate via shared data stores, not direct calls.

flowchart TD
    User[User Web/MCP/A2A] --> Orchestrator

    subgraph Orch[Paper Analysis Orchestrator]
        Orchestrator[Intent Routing & Session State]
    end

    Orchestrator -->|Route by Intent| DAgent[Discovery Agent]
    Orchestrator -->|Route by Intent| EAgent[Extraction Agent]
    Orchestrator -->|Route by Intent| RAgent[Reference Agent]
    Orchestrator -->|Route by Intent| AAgent[Analysis Agent]

    subgraph Data[Shared Data Stores]
        Library[Library]
        Extractions[Content]
        Analysis[Results]
    end

    DAgent -->|Import Papers| Library
    DAgent -->|Set active_paper_id| Orchestrator

    RAgent -->|Read from Library| Library
    RAgent -->|Add More Papers| Library

    EAgent -->|Read Papers| Library
    EAgent -->|Write Extractions| Extractions
    EAgent -->|Set active_paper_id| Orchestrator

    AAgent -->|Read Extractions| Extractions
    AAgent -->|Write Analysis| Analysis

    style Orch fill:#e1f5fe
    style Data fill:#f3f4f6
Loading

Agent Capabilities

Agent Purpose Key Tools
Discovery Agent Search and import papers Online search, DOI/arXiv import, library search
Extraction Agent Parse and extract content GROBID integration, section extraction
Reference Agent Import cited papers Bibliography parsing, citation import
Analysis Agent Deep analysis and discussion Structured analysis

Architecture

Quick Start

Prerequisites

  • Python 3.11 or later
  • An LLM API key

Installation

Option 1: Manual

# Create and activate conda environment
conda create -n paper_bg python=3.11
conda activate paper_bg

# Install the package
pip install -e .

# Create a .env file with your API key
echo "GEMINI_API_KEY=your-api-key-here" > .env

GROBID Setup (required for PDF parsing):

# Option A: Run GROBID via Docker (recommended)
docker run -d --name grobid -p 8070:8070 lfoppiano/grobid:0.8.1

# Option B: Install GROBID natively
# Visit: https://grobid.readthedocs.io/en/latest/Install-Grobid/

Configure the GROBID server URL in config/grobid.json if using a custom port or host. When running the app inside Docker, set grobid_server to http://grobid:8070.

Option 2: Docker (Includes GROBID)

# Start all services (MCP server, A2A server, GROBID, ADK Web UI)
docker-compose up -d

Configuration

The system uses a .env file for configuration. Create it in the project root:

# .env file example
GEMINI_API_KEY=your-gemini-api-key-here
DEFAULT_PROVIDER=gemini
DEFAULT_MODEL=gemini/gemini-2.5-flash

Supported Providers:

  • gemini - Google Gemini (native ADK support)
  • others via Litellm: openai, anthropic, etc.

You can override defaults via CLI flags:

  • --provider gemini (or gemini, openai, etc.)
  • --model gemini/gemini-2.5-flash (or other model names)

Usage

MCP Server

Expose Paper BG Agent as an MCP server for use with Claude Desktop, Cline, and other MCP-compatible clients:

# stdio transport (for Claude Desktop, etc.)
paper-bg-agent --mode mcp --transport stdio

# HTTP transport (for web-based MCP clients)
paper-bg-agent --mode mcp --transport streamable-http --port 8071

MCP Tools Available:

  • paper_bg_chat - Chat with the orchestrator
  • paper_bg_combine_extraction_analysis - Combine extraction and analysis in one call

ADK Web UI

Interactive web interface for testing and debugging agents:

# Docker (recommended) - starts automatically on port 8087
docker-compose up -d

# Access at: http://localhost:8087

# Manual start (for development)
adk web 

Features:

  • Web-based chat interface with the Paper Analysis Orchestrator
  • Session management and conversation history
  • Real-time agent testing and debugging
  • Live agent reload for development (--reload flag)

ADK Web UI Screenshot

A2A Server (experimental)

Run an Agent-to-Agent protocol server for interoperability with other agent systems:

paper-bg-agent --mode a2a --port 8072

Agent Card Skills:

  • paper.discovery - Search and import papers
  • paper.extraction - Extract content from PDFs
  • paper.analysis - Analyze paper context

Project Structure

paper_bg/
├── paper_analysis_app/
│   ├── agents/              # Multi-agent implementations
│   │   ├── discovery_agent.py      # Paper search and import
│   │   ├── extraction_agent.py     # PDF content extraction
│   │   ├── reference_agent.py      # Citation import
│   │   └── analysis_agent.py       # Deep analysis & Q&A
│   ├── tools/               # Tool implementations for agents
│   ├── cli.py               # CLI entry point
│   ├── agent.py             # Session & orchestrator setup
│   ├── orchestrator.py      # Intent routing
│   ├── llm_config.py        # Multi-provider LLM configuration
│   ├── mcp_server.py        # MCP server implementation
│   └── a2a_server.py        # A2A server implementation
├── modules/                 # Core utility modules
│   ├── grobid.py            # GROBID PDF processing
│   ├── search_client.py     # Academic search client
│   ├── search_paper.py      # Paper search utilities
│   ├── paper_import.py      # PDF import management
│   ├── paper_metadata.py    # Metadata handling
│   └── input_detector.py    # Input type detection
├── config/                  # Configuration files
│   ├── config.json          # Main config (logging, search)
│   └── grobid.json          # GROBID client config
├── .env                     # Environment variables (API keys, provider settings)
├── data/                    # Data directories
│   ├── paper_source/        # Imported PDFs with metadata
│   ├── extractions/         # Extracted paper content
│   ├── analysis/            # Analysis outputs
│   └── temp_*/              # Temporary files
├── prompts/                 # Prompt templates
│   └── background_analysis.md  # Analysis framework
├── tests/                   # Test files
├── Dockerfile               # Docker image
└── docker-compose.yml       # Multi-container setup

Acknowledgments

Built with Google ADK for multi-agent orchestration.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors