AI-powered research platform with Web UI + optional Notion automation
A production-ready research automation platform powered by OpenRouter (50+ AI models). Use the interactive web interface for on-demand research, or connect Notion for fully automated workflows.
1. Open the Streamlit web interface
2. Upload documents, add URLs, or enter a research query
3. Select your AI model (GPT-4, Claude, Gemini, etc.)
4. Get comprehensive research reports instantly
5. Chat with your research using RAG-powered Q&A
1. Connect your Notion database
2. Add a project to Notion
3. Agent automatically detects the new entry
4. AI researches, scores, and evaluates
5. Full report published back to Notion
Use either mode independently, or both together.
- Add & Forget: Add projects to Notion, get full research reports automatically
- Real-Time Monitoring: Watches your Notion database for new entries
- Auto-Research Pipeline: Triggers deep research on new projects
- AI Scoring: Automated due diligence scoring and evaluation
- Direct Publishing: Reports published directly to your Notion pages
- Multi-Format Document Processing: PDF, DOCX, TXT, Markdown with OCR support
- DocSend Integration: Automated presentation analysis with stealth browsing
- Advanced Web Scraping: Firecrawl-powered content extraction with sitemap discovery
- Deep Research Mode: LangChain's Open Deep Research (ODR) framework integration
- RAG-Powered Chat: Context-aware Q&A about research content using FAISS vector search
- Live Market Data: Real-time cryptocurrency information via CoinGecko MCP
- Interactive Analysis: AI-powered crypto insights and technical analysis
- Dynamic Visualizations: Plotly and Altair-based charts and metrics
- Portfolio Intelligence: Multi-coin comparisons and trend analysis
- Multi-Model Support: GPT-5.2, Claude Opus 4.5, Gemini 3, Qwen, DeepSeek R1
- OpenRouter API: Unified access to 50+ AI models
- Free Tier Models: Qwen3, DeepSeek R1T Chimera for cost-effective research
- Custom Research Prompts: Specialized prompts for due diligence and analysis
- Structured Extraction: Automatically extract people, organizations, funding rounds, metrics, and more
- Multi-Source Support: Extract from documents, web content, and DocSend presentations
- Source Grounding: All entities are linked to their source documents
- Smart Caching: Results are cached to avoid re-extraction on unchanged content
- AI-Enhanced Reports: Extracted entities are automatically included in research prompts
βββ main.py # Streamlit application entry point
βββ src/
β βββ controllers/ # Application orchestration
β β βββ app_controller.py # Main app controller with auth & routing
β βββ pages/ # Modular page implementations
β β βββ interactive_research.py # Document processing & AI analysis
β β βββ notion_automation.py # Notion CRM integration
β β βββ crypto_chatbot.py # Crypto intelligence interface
β β βββ voice_cloner_page.py # Voice synthesis (experimental)
β βββ services/ # External integrations
β β βββ odr_service.py # Open Deep Research integration
β β βββ user_history_service.py # Session management
β β βββ crypto_analysis/ # Crypto data services
β βββ core/ # Core business logic
β β βββ research_engine.py # Research automation
β β βββ rag_utils.py # Vector search & embeddings
β β βββ scanner_utils.py # Web discovery & parsing
β β βββ docsend_client.py # DocSend processing
β βββ models/ # Data models & schemas
βββ config/ # Configuration files
β βββ users.yaml # User management
β βββ mcp_config.json # MCP integrations
βββ tests/ # Comprehensive test suite
- Backend: Python 3.11+, Streamlit, FastAPI
- AI/ML: OpenAI, LangChain, FAISS, sentence-transformers
- Browser Automation: Selenium, Playwright
- Document Processing: PyMuPDF, python-docx, Tesseract OCR
- Data: pandas, numpy, Redis (optional)
- Visualization: Plotly, Altair, Bokeh
git clone <repository-url>
cd ai-research-agent
cp .env.example .env
# Configure your API keys in .env
docker-compose up -d# Clone repository
git clone <repository-url>
cd ai-research-agent
# Install Python dependencies
pip install -r requirements.txt
# Install browser dependencies
playwright install
# Install system dependencies (macOS)
brew install tesseract
# Install system dependencies (Ubuntu)
sudo apt-get install tesseract-ocr
# Install system dependencies (Windows)
# Download from: https://github.com/UB-Mannheim/tesseract/wiki# Required: AI Model Access
OPENROUTER_API_KEY=your_openrouter_api_key_here
# Optional: Additional AI Providers
OPENAI_API_KEY=your_openai_key_here
ANTHROPIC_API_KEY=your_anthropic_key_here
# Optional: Notion Integration
NOTION_TOKEN=your_notion_integration_token
# Optional: Web Scraping
FIRECRAWL_API_URL=http://localhost:3002
FIRECRAWL_API_KEY=your_firecrawl_key
# Optional: Deep Research (ODR)
TAVILY_API_KEY=your_tavily_search_key
# Optional: Caching & Performance
REDIS_URL=redis://localhost:6379
# System Configuration
TESSERACT_CMD=/usr/bin/tesseract # Adjust for your systemusers:
admin:
username: admin
password_hash: $2b$12$... # Use generate_password.py script
role: admin
researcher:
username: researcher
password_hash: $2b$12$...
role: researcher# Local development
streamlit run main.py
# Production
streamlit run main.py --server.port 8501 --server.address 0.0.0.0Visit http://localhost:8501 and login with your configured credentials.
- π Authentication: Login with username/password
- π Research Query: Define your research question or topic
- π Document Upload: Upload PDF, DOCX, or text files (optional)
- π Web Content: Add specific URLs or enable sitemap crawling (optional)
- π DocSend Processing: Process presentation decks with email authentication (optional)
- π¬ Research Mode Selection:
- Classic Mode: Traditional AI analysis of provided sources
- Deep Research (ODR): Advanced multi-agent research with web search
- βοΈ Configuration: Adjust research parameters (breadth, depth, tool calls)
- π€ AI Analysis: Generate comprehensive research reports
- π¬ Interactive Chat: Ask questions about the research using RAG
- π Notion Setup: Configure Notion integration token
- π Database Selection: Choose Notion database to monitor
- β‘ Enable Monitoring: Start real-time database polling
- π― Configure Triggers: Set up automated research workflows
- π AI Scoring: Enable intelligent project evaluation
- π Report Publishing: Automatic research report generation to Notion
- π MCP Connection: Connect to CoinGecko data source
- π° Coin Selection: Choose cryptocurrencies to analyze
- π Technical Analysis: Generate interactive charts and metrics
- π§ AI Insights: Get AI-powered market analysis and predictions
- π Portfolio Tracking: Monitor multiple coins and trends
Deep Research uses LangChain's Open Deep Research framework for advanced multi-agent research:
# Install ODR dependencies (included in requirements.txt)
pip install langgraph langchain-community langchain-openai langchain-anthropic
# Get Tavily API key for web search
# Visit: https://tavily.com/
export TAVILY_API_KEY=your_tavily_key_hereODR Parameters for Detailed Reports:
- Breadth (6-15): Number of concurrent research units
- Depth (4-8): Research iteration depth
- Max Tool Calls (8-15): Web searches per iteration
Ultra-Comprehensive Mode: Breadth=10, Depth=6, Tools=12 (720 total operations)
Entity extraction automatically identifies people, organizations, funding rounds, metrics, and more from your research sources:
# Install langextract (pinned version for API stability)
pip install langextract==0.1.0
# Install libmagic (required system dependency)
# macOS:
brew install libmagic
# Ubuntu/Debian:
sudo apt-get install libmagic1
# Windows:
pip install python-magic-bin # Includes bundled libmagicEnable in .env:
LANGEXTRACT_ENABLED=true
LANGEXTRACT_MODEL=openai/gpt-4o
LANGEXTRACT_EXTRACTION_PASSES=2
LANGEXTRACT_MAX_CONCURRENT=3
LANGEXTRACT_MAX_CHUNK_SIZE=50000Entity Types Extracted:
- People: Names, titles, organizations, roles
- Organizations: Companies, investors, partners, their roles
- Funding: Rounds (Seed, Series A/B/C), amounts, dates
- Metrics: MAU, revenue, growth rates, valuations
- Technology: Tech stack, platforms, categories
- Risk Factors: Identified risks with severity
- Partnerships: Business alliances and collaborations
The platform supports multiple AI providers and models:
# Available Models
AI_MODEL_OPTIONS = {
# OpenAI Models
"openai/gpt-5.2": "GPT-5.2",
"openai/gpt-5.2-pro": "GPT-5.2 Pro",
# Anthropic Models
"anthropic/claude-sonnet-4.5": "Claude Sonnet 4.5",
"anthropic/claude-opus-4.5": "Claude Opus 4.5",
# Google Models
"google/gemini-3": "Gemini 3",
"google/gemini-2.5-pro": "Gemini 2.5 Pro",
"google/gemini-2.5-flash": "Gemini 2.5 Flash",
# Free Models
"qwen/qwen3-30b-a3b:free": "Qwen3 30B (Free)",
"qwen/qwen3-235b-a22b:free": "Qwen3 235B (Free)",
"tngtech/deepseek-r1t-chimera:free": "DeepSeek R1T Chimera (Free)",
}# Install Redis
docker run -d -p 6379:6379 redis:alpine
# Configure in .env
REDIS_URL=redis://localhost:6379# Chrome/Chromium (Recommended for DocSend)
CHROME_BINARY=/usr/bin/google-chrome
# Firefox Alternative
FIREFOX_BINARY=/usr/bin/firefoxThe platform uses role-based access control:
- Admin: Full system access, user management
- Researcher: Research features, limited configuration
Password Generation:
python -c "
import bcrypt
password = 'your_password_here'
hashed = bcrypt.hashpw(password.encode('utf-8'), bcrypt.gensalt())
print(hashed.decode('utf-8'))
"- π bcrypt Password Hashing: Secure password storage
- π‘οΈ Session Management: Secure session tokens with expiration
- π Audit Logging: Comprehensive action tracking
- π§ Rate Limiting: API abuse prevention
- π Input Validation: Sanitized user inputs
- π API Key Management: Environment variable security
# Run all tests
pytest
# Run specific test categories
pytest tests/test_research.py # Research workflows
pytest tests/test_notion_connection.py # Notion integration
pytest tests/test_firecrawl.py # Web scraping
pytest tests/test_odr_service.py # Deep Research
pytest tests/test_browser_fix.py # DocSend processing
# Run with coverage
pytest --cov=src tests/Tests require API credentials for integration testing:
# Add to .env for testing
NOTION_TOKEN=your_test_notion_token
FIRECRAWL_API_KEY=your_test_firecrawl_key
OPENROUTER_API_KEY=your_test_openrouter_key# docker-compose.yml
version: '3.8'
services:
app:
build: .
ports:
- "8501:8501"
environment:
- OPENROUTER_API_KEY=${OPENROUTER_API_KEY}
- NOTION_TOKEN=${NOTION_TOKEN}
- REDIS_URL=redis://redis:6379
volumes:
- ./logs:/app/logs
- ./reports:/app/reports
depends_on:
- redis
redis:
image: redis:alpine
ports:
- "6379:6379"# Launch EC2 instance (Ubuntu 20.04+)
# Install Docker and docker-compose
sudo apt update
sudo apt install docker.io docker-compose
# Clone and deploy
git clone <repository-url>
cd ai-research-agent
cp .env.example .env
# Configure .env with your API keys
sudo docker-compose up -d
# Set up monitoring (optional)
bash scripts/setup_monitoring.sh# Environment setup
export STREAMLIT_SERVER_PORT=80
export STREAMLIT_SERVER_ADDRESS=0.0.0.0
export STREAMLIT_SERVER_ENABLE_CORS=false
export STREAMLIT_GLOBAL_DEVELOPMENT_MODE=false
# Run with SSL (recommended)
streamlit run main.py \
--server.port 443 \
--server.sslCertFile /path/to/cert.pem \
--server.sslKeyFile /path/to/key.pem- 50+ AI Models: GPT, Claude, Gemini, Qwen, Llama, and more
- Unified API: Single interface for multiple providers
- Rate Limiting: Configurable requests per hour
- Fallback Strategy: Automatic model switching on failures
- Database Monitoring: Real-time change detection
- Automated Workflows: Research pipeline triggers
- Content Publishing: Direct page creation and updates
- Property Mapping: Custom field synchronization
- Intelligent Scraping: AI-powered content extraction
- Sitemap Discovery: Automated URL discovery
- Batch Processing: Multiple URL handling
- Rate Limiting: Respectful crawling practices
- CoinGecko Integration: Live cryptocurrency data
- Extensible Framework: Plugin architecture for new providers
- Type Safety: Structured data models
- Real-time Updates: Live market data streams
# Fork and clone repository
git clone https://github.com/yourusername/ai-research-agent.git
cd ai-research-agent
# Create development environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
pip install -r requirements-dev.txt
# Install pre-commit hooks
pre-commit install
# Run development server
streamlit run main.py- Type Hints: Required for all functions
- Docstrings: Google style documentation
- Testing: pytest with >80% coverage
- Linting: flake8, black, isort
- Security: bandit security scanning
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Wiki
# Install Chrome/Chromium
# Ubuntu: sudo apt install google-chrome-stable
# macOS: brew install --cask google-chrome
# Windows: Download from google.com/chrome
# Verify Tesseract installation
tesseract --version# Verify integration token
curl -H "Authorization: Bearer YOUR_TOKEN" \
-H "Notion-Version: 2022-06-28" \
https://api.notion.com/v1/users/me# Increase memory limits in config
MAX_DOCUMENT_SIZE = 50 * 1024 * 1024 # 50MB
CHUNK_SIZE = 1000 # Reduce for memory constraints- β Deep Research (ODR) integration
- β Enhanced crypto intelligence
- β Improved UI/UX
- β Advanced authentication
- β Comprehensive testing
- v2.0.0: Notion automation, crypto chatbot
- v1.5.0: DocSend integration, RAG chat
- v1.0.0: Basic research and document processing
- LangChain: Open Deep Research framework
- OpenRouter: Multi-model AI access
- Streamlit: Web application framework
- Notion: CRM integration platform
- Firecrawl: Web scraping service
- CoinGecko: Cryptocurrency data API
Built with β€οΈ for researchers, analysts, and knowledge workers worldwide.
For detailed API documentation, advanced configuration, and deployment guides, visit our comprehensive documentation wiki.