Skip to content

analystOS - AI research platform with Web UI + Notion automation. Upload docs, scrape URLs, chat with research via RAG. Powered by OpenRouter (50+ models).

Notifications You must be signed in to change notification settings

sheeki03/analystOS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

62 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

analystOS πŸ€–πŸ“Š

AI-powered research platform with Web UI + optional Notion automation

A production-ready research automation platform powered by OpenRouter (50+ AI models). Use the interactive web interface for on-demand research, or connect Notion for fully automated workflows.

✨ Two Ways to Use

πŸ–₯️ Web UI (Interactive Research)

1. Open the Streamlit web interface
2. Upload documents, add URLs, or enter a research query
3. Select your AI model (GPT-4, Claude, Gemini, etc.)
4. Get comprehensive research reports instantly
5. Chat with your research using RAG-powered Q&A

πŸ”— Notion Automation (Zero-Touch Mode)

1. Connect your Notion database
2. Add a project to Notion
3. Agent automatically detects the new entry
4. AI researches, scores, and evaluates
5. Full report published back to Notion

Use either mode independently, or both together.


🌟 Key Features

πŸ”— Notion Automation (Zero-Touch Research)

  • Add & Forget: Add projects to Notion, get full research reports automatically
  • Real-Time Monitoring: Watches your Notion database for new entries
  • Auto-Research Pipeline: Triggers deep research on new projects
  • AI Scoring: Automated due diligence scoring and evaluation
  • Direct Publishing: Reports published directly to your Notion pages

πŸ”¬ Interactive Research Suite

  • Multi-Format Document Processing: PDF, DOCX, TXT, Markdown with OCR support
  • DocSend Integration: Automated presentation analysis with stealth browsing
  • Advanced Web Scraping: Firecrawl-powered content extraction with sitemap discovery
  • Deep Research Mode: LangChain's Open Deep Research (ODR) framework integration
  • RAG-Powered Chat: Context-aware Q&A about research content using FAISS vector search

πŸ’° Crypto Intelligence Hub

  • Live Market Data: Real-time cryptocurrency information via CoinGecko MCP
  • Interactive Analysis: AI-powered crypto insights and technical analysis
  • Dynamic Visualizations: Plotly and Altair-based charts and metrics
  • Portfolio Intelligence: Multi-coin comparisons and trend analysis

🧠 Advanced AI Integration

  • Multi-Model Support: GPT-5.2, Claude Opus 4.5, Gemini 3, Qwen, DeepSeek R1
  • OpenRouter API: Unified access to 50+ AI models
  • Free Tier Models: Qwen3, DeepSeek R1T Chimera for cost-effective research
  • Custom Research Prompts: Specialized prompts for due diligence and analysis

πŸ” Entity Extraction (LangExtract)

  • Structured Extraction: Automatically extract people, organizations, funding rounds, metrics, and more
  • Multi-Source Support: Extract from documents, web content, and DocSend presentations
  • Source Grounding: All entities are linked to their source documents
  • Smart Caching: Results are cached to avoid re-extraction on unchanged content
  • AI-Enhanced Reports: Extracted entities are automatically included in research prompts

πŸ—οΈ Architecture

Core Components

β”œβ”€β”€ main.py                     # Streamlit application entry point
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ controllers/            # Application orchestration
β”‚   β”‚   └── app_controller.py   # Main app controller with auth & routing
β”‚   β”œβ”€β”€ pages/                  # Modular page implementations
β”‚   β”‚   β”œβ”€β”€ interactive_research.py    # Document processing & AI analysis
β”‚   β”‚   β”œβ”€β”€ notion_automation.py       # Notion CRM integration
β”‚   β”‚   β”œβ”€β”€ crypto_chatbot.py          # Crypto intelligence interface
β”‚   β”‚   └── voice_cloner_page.py       # Voice synthesis (experimental)
β”‚   β”œβ”€β”€ services/               # External integrations
β”‚   β”‚   β”œβ”€β”€ odr_service.py             # Open Deep Research integration
β”‚   β”‚   β”œβ”€β”€ user_history_service.py    # Session management
β”‚   β”‚   └── crypto_analysis/           # Crypto data services
β”‚   β”œβ”€β”€ core/                   # Core business logic
β”‚   β”‚   β”œβ”€β”€ research_engine.py         # Research automation
β”‚   β”‚   β”œβ”€β”€ rag_utils.py              # Vector search & embeddings
β”‚   β”‚   β”œβ”€β”€ scanner_utils.py          # Web discovery & parsing
β”‚   β”‚   └── docsend_client.py         # DocSend processing
β”‚   └── models/                 # Data models & schemas
β”œβ”€β”€ config/                     # Configuration files
β”‚   β”œβ”€β”€ users.yaml             # User management
β”‚   └── mcp_config.json        # MCP integrations
└── tests/                     # Comprehensive test suite

Technology Stack

  • Backend: Python 3.11+, Streamlit, FastAPI
  • AI/ML: OpenAI, LangChain, FAISS, sentence-transformers
  • Browser Automation: Selenium, Playwright
  • Document Processing: PyMuPDF, python-docx, Tesseract OCR
  • Data: pandas, numpy, Redis (optional)
  • Visualization: Plotly, Altair, Bokeh

πŸš€ Quick Start

1. Installation

Option A: Docker (Recommended)

git clone <repository-url>
cd ai-research-agent
cp .env.example .env
# Configure your API keys in .env
docker-compose up -d

Option B: Local Installation

# Clone repository
git clone <repository-url>
cd ai-research-agent

# Install Python dependencies
pip install -r requirements.txt

# Install browser dependencies
playwright install

# Install system dependencies (macOS)
brew install tesseract

# Install system dependencies (Ubuntu)
sudo apt-get install tesseract-ocr

# Install system dependencies (Windows)
# Download from: https://github.com/UB-Mannheim/tesseract/wiki

2. Configuration

Environment Variables (.env)

# Required: AI Model Access
OPENROUTER_API_KEY=your_openrouter_api_key_here

# Optional: Additional AI Providers
OPENAI_API_KEY=your_openai_key_here
ANTHROPIC_API_KEY=your_anthropic_key_here

# Optional: Notion Integration
NOTION_TOKEN=your_notion_integration_token

# Optional: Web Scraping
FIRECRAWL_API_URL=http://localhost:3002
FIRECRAWL_API_KEY=your_firecrawl_key

# Optional: Deep Research (ODR)
TAVILY_API_KEY=your_tavily_search_key

# Optional: Caching & Performance
REDIS_URL=redis://localhost:6379

# System Configuration
TESSERACT_CMD=/usr/bin/tesseract  # Adjust for your system

User Management (config/users.yaml)

users:
  admin:
    username: admin
    password_hash: $2b$12$... # Use generate_password.py script
    role: admin
  researcher:
    username: researcher  
    password_hash: $2b$12$...
    role: researcher

3. Run Application

# Local development
streamlit run main.py

# Production
streamlit run main.py --server.port 8501 --server.address 0.0.0.0

Visit http://localhost:8501 and login with your configured credentials.

πŸ“– Usage Guide

Interactive Research Workflow

  1. πŸ” Authentication: Login with username/password
  2. πŸ“‹ Research Query: Define your research question or topic
  3. πŸ“„ Document Upload: Upload PDF, DOCX, or text files (optional)
  4. 🌐 Web Content: Add specific URLs or enable sitemap crawling (optional)
  5. πŸ“Š DocSend Processing: Process presentation decks with email authentication (optional)
  6. πŸ”¬ Research Mode Selection:
    • Classic Mode: Traditional AI analysis of provided sources
    • Deep Research (ODR): Advanced multi-agent research with web search
  7. βš™οΈ Configuration: Adjust research parameters (breadth, depth, tool calls)
  8. πŸ€– AI Analysis: Generate comprehensive research reports
  9. πŸ’¬ Interactive Chat: Ask questions about the research using RAG

Notion Automation Workflow

  1. πŸ”— Notion Setup: Configure Notion integration token
  2. πŸ“Š Database Selection: Choose Notion database to monitor
  3. ⚑ Enable Monitoring: Start real-time database polling
  4. 🎯 Configure Triggers: Set up automated research workflows
  5. πŸ“ˆ AI Scoring: Enable intelligent project evaluation
  6. πŸ“ Report Publishing: Automatic research report generation to Notion

Crypto Intelligence Workflow

  1. πŸ”Œ MCP Connection: Connect to CoinGecko data source
  2. πŸ’° Coin Selection: Choose cryptocurrencies to analyze
  3. πŸ“Š Technical Analysis: Generate interactive charts and metrics
  4. 🧠 AI Insights: Get AI-powered market analysis and predictions
  5. πŸ“ˆ Portfolio Tracking: Monitor multiple coins and trends

πŸ”§ Advanced Configuration

Deep Research (ODR) Setup

Deep Research uses LangChain's Open Deep Research framework for advanced multi-agent research:

# Install ODR dependencies (included in requirements.txt)
pip install langgraph langchain-community langchain-openai langchain-anthropic

# Get Tavily API key for web search
# Visit: https://tavily.com/
export TAVILY_API_KEY=your_tavily_key_here

ODR Parameters for Detailed Reports:

  • Breadth (6-15): Number of concurrent research units
  • Depth (4-8): Research iteration depth
  • Max Tool Calls (8-15): Web searches per iteration

Ultra-Comprehensive Mode: Breadth=10, Depth=6, Tools=12 (720 total operations)

Entity Extraction (LangExtract) Setup

Entity extraction automatically identifies people, organizations, funding rounds, metrics, and more from your research sources:

# Install langextract (pinned version for API stability)
pip install langextract==0.1.0

# Install libmagic (required system dependency)
# macOS:
brew install libmagic

# Ubuntu/Debian:
sudo apt-get install libmagic1

# Windows:
pip install python-magic-bin  # Includes bundled libmagic

Enable in .env:

LANGEXTRACT_ENABLED=true
LANGEXTRACT_MODEL=openai/gpt-4o
LANGEXTRACT_EXTRACTION_PASSES=2
LANGEXTRACT_MAX_CONCURRENT=3
LANGEXTRACT_MAX_CHUNK_SIZE=50000

Entity Types Extracted:

  • People: Names, titles, organizations, roles
  • Organizations: Companies, investors, partners, their roles
  • Funding: Rounds (Seed, Series A/B/C), amounts, dates
  • Metrics: MAU, revenue, growth rates, valuations
  • Technology: Tech stack, platforms, categories
  • Risk Factors: Identified risks with severity
  • Partnerships: Business alliances and collaborations

Model Configuration

The platform supports multiple AI providers and models:

# Available Models
AI_MODEL_OPTIONS = {
    # OpenAI Models
    "openai/gpt-5.2": "GPT-5.2",
    "openai/gpt-5.2-pro": "GPT-5.2 Pro",
    # Anthropic Models
    "anthropic/claude-sonnet-4.5": "Claude Sonnet 4.5",
    "anthropic/claude-opus-4.5": "Claude Opus 4.5",
    # Google Models
    "google/gemini-3": "Gemini 3",
    "google/gemini-2.5-pro": "Gemini 2.5 Pro",
    "google/gemini-2.5-flash": "Gemini 2.5 Flash",
    # Free Models
    "qwen/qwen3-30b-a3b:free": "Qwen3 30B (Free)",
    "qwen/qwen3-235b-a22b:free": "Qwen3 235B (Free)",
    "tngtech/deepseek-r1t-chimera:free": "DeepSeek R1T Chimera (Free)",
}

Performance Optimization

Redis Caching (Optional)

# Install Redis
docker run -d -p 6379:6379 redis:alpine

# Configure in .env
REDIS_URL=redis://localhost:6379

Browser Configuration

# Chrome/Chromium (Recommended for DocSend)
CHROME_BINARY=/usr/bin/google-chrome

# Firefox Alternative
FIREFOX_BINARY=/usr/bin/firefox

πŸ” Security & Authentication

User Management

The platform uses role-based access control:

  • Admin: Full system access, user management
  • Researcher: Research features, limited configuration

Password Generation:

python -c "
import bcrypt
password = 'your_password_here'
hashed = bcrypt.hashpw(password.encode('utf-8'), bcrypt.gensalt())
print(hashed.decode('utf-8'))
"

Security Features

  • πŸ” bcrypt Password Hashing: Secure password storage
  • πŸ›‘οΈ Session Management: Secure session tokens with expiration
  • πŸ“ Audit Logging: Comprehensive action tracking
  • 🚧 Rate Limiting: API abuse prevention
  • πŸ”’ Input Validation: Sanitized user inputs
  • πŸ”‘ API Key Management: Environment variable security

πŸ§ͺ Testing

Run Test Suite

# Run all tests
pytest

# Run specific test categories
pytest tests/test_research.py          # Research workflows
pytest tests/test_notion_connection.py # Notion integration
pytest tests/test_firecrawl.py        # Web scraping
pytest tests/test_odr_service.py      # Deep Research
pytest tests/test_browser_fix.py      # DocSend processing

# Run with coverage
pytest --cov=src tests/

Test Configuration

Tests require API credentials for integration testing:

# Add to .env for testing
NOTION_TOKEN=your_test_notion_token
FIRECRAWL_API_KEY=your_test_firecrawl_key
OPENROUTER_API_KEY=your_test_openrouter_key

πŸš€ Deployment

Docker Deployment

# docker-compose.yml
version: '3.8'
services:
  app:
    build: .
    ports:
      - "8501:8501"
    environment:
      - OPENROUTER_API_KEY=${OPENROUTER_API_KEY}
      - NOTION_TOKEN=${NOTION_TOKEN}
      - REDIS_URL=redis://redis:6379
    volumes:
      - ./logs:/app/logs
      - ./reports:/app/reports
    depends_on:
      - redis
      
  redis:
    image: redis:alpine
    ports:
      - "6379:6379"

AWS EC2 Deployment

# Launch EC2 instance (Ubuntu 20.04+)
# Install Docker and docker-compose
sudo apt update
sudo apt install docker.io docker-compose

# Clone and deploy
git clone <repository-url>
cd ai-research-agent
cp .env.example .env
# Configure .env with your API keys
sudo docker-compose up -d

# Set up monitoring (optional)
bash scripts/setup_monitoring.sh

Production Configuration

# Environment setup
export STREAMLIT_SERVER_PORT=80
export STREAMLIT_SERVER_ADDRESS=0.0.0.0
export STREAMLIT_SERVER_ENABLE_CORS=false
export STREAMLIT_GLOBAL_DEVELOPMENT_MODE=false

# Run with SSL (recommended)
streamlit run main.py \
  --server.port 443 \
  --server.sslCertFile /path/to/cert.pem \
  --server.sslKeyFile /path/to/key.pem

πŸ“Š API Integration Details

OpenRouter Integration

  • 50+ AI Models: GPT, Claude, Gemini, Qwen, Llama, and more
  • Unified API: Single interface for multiple providers
  • Rate Limiting: Configurable requests per hour
  • Fallback Strategy: Automatic model switching on failures

Notion Integration

  • Database Monitoring: Real-time change detection
  • Automated Workflows: Research pipeline triggers
  • Content Publishing: Direct page creation and updates
  • Property Mapping: Custom field synchronization

Firecrawl Integration

  • Intelligent Scraping: AI-powered content extraction
  • Sitemap Discovery: Automated URL discovery
  • Batch Processing: Multiple URL handling
  • Rate Limiting: Respectful crawling practices

MCP (Model Context Protocol)

  • CoinGecko Integration: Live cryptocurrency data
  • Extensible Framework: Plugin architecture for new providers
  • Type Safety: Structured data models
  • Real-time Updates: Live market data streams

🀝 Contributing

Development Setup

# Fork and clone repository
git clone https://github.com/yourusername/ai-research-agent.git
cd ai-research-agent

# Create development environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt
pip install -r requirements-dev.txt

# Install pre-commit hooks
pre-commit install

# Run development server
streamlit run main.py

Code Style

  • Type Hints: Required for all functions
  • Docstrings: Google style documentation
  • Testing: pytest with >80% coverage
  • Linting: flake8, black, isort
  • Security: bandit security scanning

Contribution Guidelines

  1. Fork the repository
  2. Create feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open Pull Request

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ†˜ Support & Documentation

Getting Help

Common Issues & Solutions

DocSend Processing Issues

# Install Chrome/Chromium
# Ubuntu: sudo apt install google-chrome-stable
# macOS: brew install --cask google-chrome
# Windows: Download from google.com/chrome

# Verify Tesseract installation
tesseract --version

Notion Connection Issues

# Verify integration token
curl -H "Authorization: Bearer YOUR_TOKEN" \
     -H "Notion-Version: 2022-06-28" \
     https://api.notion.com/v1/users/me

Memory Issues with Large Documents

# Increase memory limits in config
MAX_DOCUMENT_SIZE = 50 * 1024 * 1024  # 50MB
CHUNK_SIZE = 1000  # Reduce for memory constraints

πŸ”„ Version History

Current: v2.1.0

  • βœ… Deep Research (ODR) integration
  • βœ… Enhanced crypto intelligence
  • βœ… Improved UI/UX
  • βœ… Advanced authentication
  • βœ… Comprehensive testing

Previous Versions

  • v2.0.0: Notion automation, crypto chatbot
  • v1.5.0: DocSend integration, RAG chat
  • v1.0.0: Basic research and document processing

πŸ™ Acknowledgments

  • LangChain: Open Deep Research framework
  • OpenRouter: Multi-model AI access
  • Streamlit: Web application framework
  • Notion: CRM integration platform
  • Firecrawl: Web scraping service
  • CoinGecko: Cryptocurrency data API

Built with ❀️ for researchers, analysts, and knowledge workers worldwide.

For detailed API documentation, advanced configuration, and deployment guides, visit our comprehensive documentation wiki.

About

analystOS - AI research platform with Web UI + Notion automation. Upload docs, scrape URLs, chat with research via RAG. Powered by OpenRouter (50+ models).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •