You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
๐ฎ Axion โ AI Research Agent with RAG, Vision & Voice
Axion is a multi-tool AI research agent that combines **Retrieval-Augmented Generation (RAG)** over local PDF documents with **real-time web search**, **computer vision**, **voice output**, and access to multiple external APIs โ all orchestrated through a LangGraph agent running entirely on local LLMs via Ollama.
โจ Key Features
Category
Capability
RAG
Ingest PDFs, chunk & embed them with HuggingFace, store in ChromaDB, query with semantic search
Web Search
Tavily advanced search for general web queries
Vision
Analyze images, capture webcam, detect objects, read text (OCR), compare images โ powered by LLaVA 7B
Voice
Text-to-speech output using Kokoro TTS
Jobs
Real-time job/internship search via JSearch (RapidAPI)
News
Current news articles via Real-Time News Data (RapidAPI)
Research
Academic paper search via Semantic Scholar API
Tech
Hacker News top-story search
Reference
Wikipedia summary lookup
Weather
Live weather data via OpenWeatherMap
Location
Geocoding & place details via Nominatim (OpenStreetMap)
Drop PDF files into the files/ directory. On the next run, the agent will automatically:
Detect new or modified PDFs via MD5 hash comparison
Load and split them into 1000-character chunks (200 overlap)
Embed with all-MiniLM-L6-v2
Store in the local ChromaDB vector store (rag_db/)
5. Run the Agent
Open main.ipynb in Jupyter / VS Code and run all cells. The final cell starts an interactive chat loop:
Axion is ready.
Vision model : llava:7b
Reasoning : qwen3:14b
Type 'quit' to exit.
You: What does the Attention Is All You Need paper propose?
Axion: ...
๐งฐ Available Tools (15 Total)
Document & Knowledge
Tool
Description
ask_rag
Query the local PDF knowledge base (always called first)
reason_deeply
Multi-step reasoning over collected context
Web & Search
Tool
Description
search_web
Tavily advanced internet search (20 results)
wikipedia_search
Wikipedia article summaries
hackernews_search
Search top 50 Hacker News stories by topic
search_papers
Semantic Scholar academic paper search
news_search
Real-time news articles (RapidAPI)
Vision
Tool
Description
analyze_image
Describe/analyze an image file
capture_webcam
Capture and analyze a live webcam frame
detect_objects
Structured object inventory from an image
compare_images
Side-by-side comparison of two images
read_text_in_image
OCR โ extract text from images
Utilities
Tool
Description
get_job
Search internship/job listings (JSearch API)
weather_search
Current weather for any city (OpenWeatherMap)
location_search
Geocoding and place details (Nominatim)
๐ Agent Reasoning Flow
The agent follows a structured decision process for every query:
Check Internal Knowledge โ Always calls ask_rag first to search local documents
Enrich Externally โ Uses web/news/papers tools to verify or expand findings
Match Intent to Tools โ Routes to specialized tools based on query type
Cross-Verify โ For high-stakes claims, runs multiple independent tools
Synthesize & Respond โ Combines all findings; uses reason_deeply for 3+ sources
Voice Output โ Speaks the answer aloud using Kokoro TTS
๐ฌ Example Queries
You: Search for machine learning internships in Bangalore
You: What's the weather in Tokyo?
You: Summarize the Attention Is All You Need paper
You: Analyze this image: C:/photos/chart.png
You: What can you see through the camera?
You: Find recent news about AI regulation
You: Search research papers on transformer architectures
๐ Notes
Incremental Ingestion โ Only new/modified PDFs are re-embedded (tracked via tracked_files.json)
Chat Memory โ Rolling window of the last 10 conversation turns (20 messages)
Fully Local LLMs โ Both reasoning and vision models run on-device via Ollama
Voice โ Kokoro TTS plays audio at 24kHz sample rate
Camera โ Set camera_index in .env to match your webcam device index
About
Axion is a local, privacy-first AI research agent orchestrated via LangGraph. It combines RAG over private PDFs with real-time web search, computer vision, and voice synthesis using Ollama. Axion seamlessly bridges local intelligence and external APIs to provide a secure, multi-modal powerhouse for complex data analysis.