analystOS 🤖📊

AI-powered research platform with Web UI + optional Notion automation

A production-ready research automation platform powered by OpenRouter (50+ AI models). Use the interactive web interface for on-demand research, or connect Notion for fully automated workflows.

✨ Two Ways to Use

🖥️ Web UI (Interactive Research)

1. Open the Streamlit web interface
2. Upload documents, add URLs, or enter a research query
3. Select your AI model (GPT-4, Claude, Gemini, etc.)
4. Get comprehensive research reports instantly
5. Chat with your research using RAG-powered Q&A

🔗 Notion Automation (Zero-Touch Mode)

1. Connect your Notion database
2. Add a project to Notion
3. Agent automatically detects the new entry
4. AI researches, scores, and evaluates
5. Full report published back to Notion

Use either mode independently, or both together.

🌟 Key Features

🔗 Notion Automation (Zero-Touch Research)

Add & Forget: Add projects to Notion, get full research reports automatically
Real-Time Monitoring: Watches your Notion database for new entries
Auto-Research Pipeline: Triggers deep research on new projects
AI Scoring: Automated due diligence scoring and evaluation
Direct Publishing: Reports published directly to your Notion pages

🔬 Interactive Research Suite

Multi-Format Document Processing: PDF, DOCX, TXT, Markdown with OCR support
DocSend Integration: Automated presentation analysis with stealth browsing
Advanced Web Scraping: Firecrawl-powered content extraction with sitemap discovery
Deep Research Mode: LangChain's Open Deep Research (ODR) framework integration
RAG-Powered Chat: Context-aware Q&A about research content using FAISS vector search

💰 Crypto Intelligence Hub

Live Market Data: Real-time cryptocurrency information via CoinGecko MCP
Interactive Analysis: AI-powered crypto insights and technical analysis
Dynamic Visualizations: Plotly and Altair-based charts and metrics
Portfolio Intelligence: Multi-coin comparisons and trend analysis

🧠 Advanced AI Integration

Multi-Model Support: GPT-5.2, Claude Opus 4.5, Gemini 3, Qwen, DeepSeek R1
OpenRouter API: Unified access to 50+ AI models
Free Tier Models: Qwen3, DeepSeek R1T Chimera for cost-effective research
Custom Research Prompts: Specialized prompts for due diligence and analysis

🔍 Entity Extraction (LangExtract)

Structured Extraction: Automatically extract people, organizations, funding rounds, metrics, and more
Multi-Source Support: Extract from documents, web content, and DocSend presentations
Source Grounding: All entities are linked to their source documents
Smart Caching: Results are cached to avoid re-extraction on unchanged content
AI-Enhanced Reports: Extracted entities are automatically included in research prompts

🏗️ Architecture

Core Components

├── main.py                     # Streamlit application entry point
├── src/
│   ├── controllers/            # Application orchestration
│   │   └── app_controller.py   # Main app controller with auth & routing
│   ├── pages/                  # Modular page implementations
│   │   ├── interactive_research.py    # Document processing & AI analysis
│   │   ├── notion_automation.py       # Notion CRM integration
│   │   ├── crypto_chatbot.py          # Crypto intelligence interface
│   │   └── voice_cloner_page.py       # Voice synthesis (experimental)
│   ├── services/               # External integrations
│   │   ├── odr_service.py             # Open Deep Research integration
│   │   ├── user_history_service.py    # Session management
│   │   └── crypto_analysis/           # Crypto data services
│   ├── core/                   # Core business logic
│   │   ├── research_engine.py         # Research automation
│   │   ├── rag_utils.py              # Vector search & embeddings
│   │   ├── scanner_utils.py          # Web discovery & parsing
│   │   └── docsend_client.py         # DocSend processing
│   └── models/                 # Data models & schemas
├── config/                     # Configuration files
│   ├── users.yaml             # User management
│   └── mcp_config.json        # MCP integrations
└── tests/                     # Comprehensive test suite

Technology Stack

Backend: Python 3.11+, Streamlit, FastAPI
AI/ML: OpenAI, LangChain, FAISS, sentence-transformers
Browser Automation: Selenium, Playwright
Document Processing: PyMuPDF, python-docx, Tesseract OCR
Data: pandas, numpy, Redis (optional)
Visualization: Plotly, Altair, Bokeh

🚀 Quick Start

1. Installation

Option A: Docker (Recommended)

git clone <repository-url>
cd ai-research-agent
cp .env.example .env
# Configure your API keys in .env
docker-compose up -d

Option B: Local Installation

# Clone repository
git clone <repository-url>
cd ai-research-agent

# Install Python dependencies
pip install -r requirements.txt

# Install browser dependencies
playwright install

# Install system dependencies (macOS)
brew install tesseract

# Install system dependencies (Ubuntu)
sudo apt-get install tesseract-ocr

# Install system dependencies (Windows)
# Download from: https://github.com/UB-Mannheim/tesseract/wiki

2. Configuration

Environment Variables (.env)

# Required: AI Model Access
OPENROUTER_API_KEY=your_openrouter_api_key_here

# Optional: Additional AI Providers
OPENAI_API_KEY=your_openai_key_here
ANTHROPIC_API_KEY=your_anthropic_key_here

# Optional: Notion Integration
NOTION_TOKEN=your_notion_integration_token

# Optional: Web Scraping
FIRECRAWL_API_URL=http://localhost:3002
FIRECRAWL_API_KEY=your_firecrawl_key

# Optional: Deep Research (ODR)
TAVILY_API_KEY=your_tavily_search_key

# Optional: Caching & Performance
REDIS_URL=redis://localhost:6379

# System Configuration
TESSERACT_CMD=/usr/bin/tesseract  # Adjust for your system

User Management (config/users.yaml)

users:
  admin:
    username: admin
    password_hash: $2b$12$... # Use generate_password.py script
    role: admin
  researcher:
    username: researcher  
    password_hash: $2b$12$...
    role: researcher

3. Run Application

# Local development
streamlit run main.py

# Production
streamlit run main.py --server.port 8501 --server.address 0.0.0.0

Visit http://localhost:8501 and login with your configured credentials.

📖 Usage Guide

Interactive Research Workflow

🔐 Authentication: Login with username/password
📋 Research Query: Define your research question or topic
📄 Document Upload: Upload PDF, DOCX, or text files (optional)
🌐 Web Content: Add specific URLs or enable sitemap crawling (optional)
📊 DocSend Processing: Process presentation decks with email authentication (optional)
🔬 Research Mode Selection:
- Classic Mode: Traditional AI analysis of provided sources
- Deep Research (ODR): Advanced multi-agent research with web search
⚙️ Configuration: Adjust research parameters (breadth, depth, tool calls)
🤖 AI Analysis: Generate comprehensive research reports
💬 Interactive Chat: Ask questions about the research using RAG

Notion Automation Workflow

🔗 Notion Setup: Configure Notion integration token
📊 Database Selection: Choose Notion database to monitor
⚡ Enable Monitoring: Start real-time database polling
🎯 Configure Triggers: Set up automated research workflows
📈 AI Scoring: Enable intelligent project evaluation
📝 Report Publishing: Automatic research report generation to Notion

Crypto Intelligence Workflow

🔌 MCP Connection: Connect to CoinGecko data source
💰 Coin Selection: Choose cryptocurrencies to analyze
📊 Technical Analysis: Generate interactive charts and metrics
🧠 AI Insights: Get AI-powered market analysis and predictions
📈 Portfolio Tracking: Monitor multiple coins and trends

🔧 Advanced Configuration

Deep Research (ODR) Setup

Deep Research uses LangChain's Open Deep Research framework for advanced multi-agent research:

# Install ODR dependencies (included in requirements.txt)
pip install langgraph langchain-community langchain-openai langchain-anthropic

# Get Tavily API key for web search
# Visit: https://tavily.com/
export TAVILY_API_KEY=your_tavily_key_here

ODR Parameters for Detailed Reports:

Breadth (6-15): Number of concurrent research units
Depth (4-8): Research iteration depth
Max Tool Calls (8-15): Web searches per iteration

Ultra-Comprehensive Mode: Breadth=10, Depth=6, Tools=12 (720 total operations)

Entity Extraction (LangExtract) Setup

Entity extraction automatically identifies people, organizations, funding rounds, metrics, and more from your research sources:

# Install langextract (pinned version for API stability)
pip install langextract==0.1.0

# Install libmagic (required system dependency)
# macOS:
brew install libmagic

# Ubuntu/Debian:
sudo apt-get install libmagic1

# Windows:
pip install python-magic-bin  # Includes bundled libmagic

Enable in .env:

LANGEXTRACT_ENABLED=true
LANGEXTRACT_MODEL=openai/gpt-4o
LANGEXTRACT_EXTRACTION_PASSES=2
LANGEXTRACT_MAX_CONCURRENT=3
LANGEXTRACT_MAX_CHUNK_SIZE=50000

Entity Types Extracted:

People: Names, titles, organizations, roles
Organizations: Companies, investors, partners, their roles
Funding: Rounds (Seed, Series A/B/C), amounts, dates
Metrics: MAU, revenue, growth rates, valuations
Technology: Tech stack, platforms, categories
Risk Factors: Identified risks with severity
Partnerships: Business alliances and collaborations

Model Configuration

The platform supports multiple AI providers and models:

# Available Models
AI_MODEL_OPTIONS = {
    # OpenAI Models
    "openai/gpt-5.2": "GPT-5.2",
    "openai/gpt-5.2-pro": "GPT-5.2 Pro",
    # Anthropic Models
    "anthropic/claude-sonnet-4.5": "Claude Sonnet 4.5",
    "anthropic/claude-opus-4.5": "Claude Opus 4.5",
    # Google Models
    "google/gemini-3": "Gemini 3",
    "google/gemini-2.5-pro": "Gemini 2.5 Pro",
    "google/gemini-2.5-flash": "Gemini 2.5 Flash",
    # Free Models
    "qwen/qwen3-30b-a3b:free": "Qwen3 30B (Free)",
    "qwen/qwen3-235b-a22b:free": "Qwen3 235B (Free)",
    "tngtech/deepseek-r1t-chimera:free": "DeepSeek R1T Chimera (Free)",
}

Performance Optimization

Redis Caching (Optional)

# Install Redis
docker run -d -p 6379:6379 redis:alpine

# Configure in .env
REDIS_URL=redis://localhost:6379

Browser Configuration

# Chrome/Chromium (Recommended for DocSend)
CHROME_BINARY=/usr/bin/google-chrome

# Firefox Alternative
FIREFOX_BINARY=/usr/bin/firefox

🔐 Security & Authentication

User Management

The platform uses role-based access control:

Admin: Full system access, user management
Researcher: Research features, limited configuration

Password Generation:

python -c "
import bcrypt
password = 'your_password_here'
hashed = bcrypt.hashpw(password.encode('utf-8'), bcrypt.gensalt())
print(hashed.decode('utf-8'))
"

Security Features

🔐 bcrypt Password Hashing: Secure password storage
🛡️ Session Management: Secure session tokens with expiration
📝 Audit Logging: Comprehensive action tracking
🚧 Rate Limiting: API abuse prevention
🔒 Input Validation: Sanitized user inputs
🔑 API Key Management: Environment variable security

🧪 Testing

Run Test Suite

# Run all tests
pytest

# Run specific test categories
pytest tests/test_research.py          # Research workflows
pytest tests/test_notion_connection.py # Notion integration
pytest tests/test_firecrawl.py        # Web scraping
pytest tests/test_odr_service.py      # Deep Research
pytest tests/test_browser_fix.py      # DocSend processing

# Run with coverage
pytest --cov=src tests/

Test Configuration

Tests require API credentials for integration testing:

# Add to .env for testing
NOTION_TOKEN=your_test_notion_token
FIRECRAWL_API_KEY=your_test_firecrawl_key
OPENROUTER_API_KEY=your_test_openrouter_key

🚀 Deployment

Docker Deployment

# docker-compose.yml
version: '3.8'
services:
  app:
    build: .
    ports:
      - "8501:8501"
    environment:
      - OPENROUTER_API_KEY=${OPENROUTER_API_KEY}
      - NOTION_TOKEN=${NOTION_TOKEN}
      - REDIS_URL=redis://redis:6379
    volumes:
      - ./logs:/app/logs
      - ./reports:/app/reports
    depends_on:
      - redis
      
  redis:
    image: redis:alpine
    ports:
      - "6379:6379"

AWS EC2 Deployment

# Launch EC2 instance (Ubuntu 20.04+)
# Install Docker and docker-compose
sudo apt update
sudo apt install docker.io docker-compose

# Clone and deploy
git clone <repository-url>
cd ai-research-agent
cp .env.example .env
# Configure .env with your API keys
sudo docker-compose up -d

# Set up monitoring (optional)
bash scripts/setup_monitoring.sh

Production Configuration

# Environment setup
export STREAMLIT_SERVER_PORT=80
export STREAMLIT_SERVER_ADDRESS=0.0.0.0
export STREAMLIT_SERVER_ENABLE_CORS=false
export STREAMLIT_GLOBAL_DEVELOPMENT_MODE=false

# Run with SSL (recommended)
streamlit run main.py \
  --server.port 443 \
  --server.sslCertFile /path/to/cert.pem \
  --server.sslKeyFile /path/to/key.pem

📊 API Integration Details

OpenRouter Integration

50+ AI Models: GPT, Claude, Gemini, Qwen, Llama, and more
Unified API: Single interface for multiple providers
Rate Limiting: Configurable requests per hour
Fallback Strategy: Automatic model switching on failures

Notion Integration

Database Monitoring: Real-time change detection
Automated Workflows: Research pipeline triggers
Content Publishing: Direct page creation and updates
Property Mapping: Custom field synchronization

Firecrawl Integration

Intelligent Scraping: AI-powered content extraction
Sitemap Discovery: Automated URL discovery
Batch Processing: Multiple URL handling
Rate Limiting: Respectful crawling practices

MCP (Model Context Protocol)

CoinGecko Integration: Live cryptocurrency data
Extensible Framework: Plugin architecture for new providers
Type Safety: Structured data models
Real-time Updates: Live market data streams

🤝 Contributing

Development Setup

# Fork and clone repository
git clone https://github.com/yourusername/ai-research-agent.git
cd ai-research-agent

# Create development environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt
pip install -r requirements-dev.txt

# Install pre-commit hooks
pre-commit install

# Run development server
streamlit run main.py

Code Style

Type Hints: Required for all functions
Docstrings: Google style documentation
Testing: pytest with >80% coverage
Linting: flake8, black, isort
Security: bandit security scanning

Contribution Guidelines

Fork the repository
Create feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support & Documentation

Getting Help

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: Wiki

Common Issues & Solutions

DocSend Processing Issues

# Install Chrome/Chromium
# Ubuntu: sudo apt install google-chrome-stable
# macOS: brew install --cask google-chrome
# Windows: Download from google.com/chrome

# Verify Tesseract installation
tesseract --version

Notion Connection Issues

# Verify integration token
curl -H "Authorization: Bearer YOUR_TOKEN" \
     -H "Notion-Version: 2022-06-28" \
     https://api.notion.com/v1/users/me

Memory Issues with Large Documents

# Increase memory limits in config
MAX_DOCUMENT_SIZE = 50 * 1024 * 1024  # 50MB
CHUNK_SIZE = 1000  # Reduce for memory constraints

🔄 Version History

Current: v2.1.0

✅ Deep Research (ODR) integration
✅ Enhanced crypto intelligence
✅ Improved UI/UX
✅ Advanced authentication
✅ Comprehensive testing

Previous Versions

v2.0.0: Notion automation, crypto chatbot
v1.5.0: DocSend integration, RAG chat
v1.0.0: Basic research and document processing

🙏 Acknowledgments

LangChain: Open Deep Research framework
OpenRouter: Multi-model AI access
Streamlit: Web application framework
Notion: CRM integration platform
Firecrawl: Web scraping service
CoinGecko: Cryptocurrency data API

Built with ❤️ for researchers, analysts, and knowledge workers worldwide.

For detailed API documentation, advanced configuration, and deployment guides, visit our comprehensive documentation wiki.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
config		config
logs		logs
reports		reports
scripts		scripts
src		src
tests		tests
web_research		web_research
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
logo.png		logo.png
main.py		main.py
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt

sheeki03/analystOS

Folders and files

Latest commit

History

Repository files navigation

analystOS 🤖📊

✨ Two Ways to Use

🖥️ Web UI (Interactive Research)

🔗 Notion Automation (Zero-Touch Mode)

🌟 Key Features

🔗 Notion Automation (Zero-Touch Research)

🔬 Interactive Research Suite

💰 Crypto Intelligence Hub

🧠 Advanced AI Integration

🔍 Entity Extraction (LangExtract)

🏗️ Architecture

Core Components

Technology Stack

🚀 Quick Start

1. Installation

Option A: Docker (Recommended)

Option B: Local Installation

2. Configuration

Environment Variables (.env)

User Management (config/users.yaml)

3. Run Application

📖 Usage Guide

Interactive Research Workflow

Notion Automation Workflow

Crypto Intelligence Workflow

🔧 Advanced Configuration

Deep Research (ODR) Setup

Entity Extraction (LangExtract) Setup

Model Configuration

Performance Optimization

Redis Caching (Optional)

Browser Configuration

🔐 Security & Authentication

User Management

Security Features

🧪 Testing

Run Test Suite

Test Configuration

🚀 Deployment

Docker Deployment

AWS EC2 Deployment

Production Configuration

📊 API Integration Details

OpenRouter Integration

Notion Integration

Firecrawl Integration

MCP (Model Context Protocol)

🤝 Contributing

Development Setup

Code Style

Contribution Guidelines

📝 License

🆘 Support & Documentation

Getting Help

Common Issues & Solutions

DocSend Processing Issues

Notion Connection Issues

Memory Issues with Large Documents

🔄 Version History

Current: v2.1.0

Previous Versions

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages