Skip to content

SatyamPrakash09/Axion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

13 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ”ฎ Axion โ€” AI Research Agent with RAG, Vision & Voice

image
Axion is a multi-tool AI research agent that combines **Retrieval-Augmented Generation (RAG)** over local PDF documents with **real-time web search**, **computer vision**, **voice output**, and access to multiple external APIs โ€” all orchestrated through a LangGraph agent running entirely on local LLMs via Ollama.

โœจ Key Features

Category Capability
RAG Ingest PDFs, chunk & embed them with HuggingFace, store in ChromaDB, query with semantic search
Web Search Tavily advanced search for general web queries
Vision Analyze images, capture webcam, detect objects, read text (OCR), compare images โ€” powered by LLaVA 7B
Voice Text-to-speech output using Kokoro TTS
Jobs Real-time job/internship search via JSearch (RapidAPI)
News Current news articles via Real-Time News Data (RapidAPI)
Research Academic paper search via Semantic Scholar API
Tech Hacker News top-story search
Reference Wikipedia summary lookup
Weather Live weather data via OpenWeatherMap
Location Geocoding & place details via Nominatim (OpenStreetMap)
Reasoning Deep multi-step reasoning over gathered context

๐Ÿ—๏ธ Architecture

User Query
    โ”‚
    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚        LangGraph Agent           โ”‚
โ”‚   (Qwen3:14b via Ollama)         โ”‚
โ”‚                                  โ”‚
โ”‚  System Prompt + Tool Router     โ”‚
โ”‚  Rolling Chat History (10 turns) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚
     โ”Œโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”
     โ–ผ           โ–ผ
 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
 โ”‚ask_rag โ”‚  โ”‚        External Tools                โ”‚
 โ”‚(always โ”‚  โ”‚  search_web  ยท get_job ยท news_search โ”‚
 โ”‚ first) โ”‚  โ”‚  search_papers ยท hackernews_search   โ”‚
 โ”‚        โ”‚  โ”‚  wikipedia_search ยท weather_search   โ”‚
 โ”‚  PDF   โ”‚  โ”‚  location_search ยท analyze_image     โ”‚
 โ”‚ChromaDBโ”‚  โ”‚  capture_webcam ยท detect_objects     โ”‚
 โ”‚        โ”‚  โ”‚  compare_images ยท read_text_in_image โ”‚
 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚  reason_deeply                       โ”‚
             โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                        โ”‚
                        โ–ผ
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚  Kokoro TTS      โ”‚
              โ”‚  Voice Response  โ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ› ๏ธ Tech Stack

Component Technology
Language Python 3.10
Package Manager uv
Agent Framework LangChain + LangGraph
Reasoning LLM Qwen3:14b (via Ollama)
Vision LLM LLaVA:7b (via Ollama)
Embeddings sentence-transformers/all-MiniLM-L6-v2 (HuggingFace)
Vector Store ChromaDB (persisted locally in ./rag_db)
PDF Loading PyPDF (via LangChain PyPDFDirectoryLoader)
Text Splitting RecursiveCharacterTextSplitter (1000 chars, 200 overlap)
Web Search Tavily Search API
TTS Kokoro TTS + SoundDevice
Vision OpenCV + base64 encoding โ†’ LLaVA
External APIs RapidAPI (JSearch, Real-Time News), OpenWeatherMap, Nominatim, Semantic Scholar

๐Ÿ“ Project Structure

rag_web_search/
โ”œโ”€โ”€ main.ipynb           # Main notebook โ€” agent setup, tools, and chat loop
โ”œโ”€โ”€ files/               # Drop PDF documents here for RAG ingestion
โ”œโ”€โ”€ rag_db/              # ChromaDB persistent vector store
โ”œโ”€โ”€ tracked_files.json   # MD5 hashes to detect new/changed PDFs
โ”œโ”€โ”€ pyproject.toml       # Project metadata & dependencies
โ”œโ”€โ”€ .env                 # API keys (not committed)
โ”œโ”€โ”€ .gitignore
โ”œโ”€โ”€ .python-version      # Python 3.10
โ””โ”€โ”€ README.md

๐Ÿš€ Getting Started

Prerequisites

  • Python 3.10+
  • uv โ€” fast Python package manager
  • Ollama โ€” local LLM runtime
  • A webcam (optional, for vision tools)

1. Clone & Install Dependencies

git clone "https://github.com/SatyamPrakash09/Axion.git"
cd rag_web_search
uv sync

2. Pull Required Ollama Models

ollama pull qwen3:14b
ollama pull llava:7b

3. Configure Environment Variables

Create a .env file in the project root:

GEMINI_API_KEY=your_gemini_api_key
TAVILY_API_KEY=your_tavily_api_key
rapid_api_key=your_rapidapi_key
WEATHER_API_KEY=your_openweathermap_api_key
camera_index=0
Variable Source Used For
GEMINI_API_KEY Google AI Studio (Optional) Gemini model fallback
TAVILY_API_KEY Tavily Web search tool
rapid_api_key RapidAPI Job search & News search
WEATHER_API_KEY OpenWeatherMap Weather tool
camera_index Local device Webcam device index (default 0)

4. Add PDFs for RAG

Drop PDF files into the files/ directory. On the next run, the agent will automatically:

  1. Detect new or modified PDFs via MD5 hash comparison
  2. Load and split them into 1000-character chunks (200 overlap)
  3. Embed with all-MiniLM-L6-v2
  4. Store in the local ChromaDB vector store (rag_db/)

5. Run the Agent

Open main.ipynb in Jupyter / VS Code and run all cells. The final cell starts an interactive chat loop:

Axion is ready.
Vision model : llava:7b
Reasoning    : qwen3:14b
Type 'quit' to exit.

You: What does the Attention Is All You Need paper propose?
Axion: ...

๐Ÿงฐ Available Tools (15 Total)

Document & Knowledge

Tool Description
ask_rag Query the local PDF knowledge base (always called first)
reason_deeply Multi-step reasoning over collected context

Web & Search

Tool Description
search_web Tavily advanced internet search (20 results)
wikipedia_search Wikipedia article summaries
hackernews_search Search top 50 Hacker News stories by topic
search_papers Semantic Scholar academic paper search
news_search Real-time news articles (RapidAPI)

Vision

Tool Description
analyze_image Describe/analyze an image file
capture_webcam Capture and analyze a live webcam frame
detect_objects Structured object inventory from an image
compare_images Side-by-side comparison of two images
read_text_in_image OCR โ€” extract text from images

Utilities

Tool Description
get_job Search internship/job listings (JSearch API)
weather_search Current weather for any city (OpenWeatherMap)
location_search Geocoding and place details (Nominatim)

๐Ÿ”„ Agent Reasoning Flow

The agent follows a structured decision process for every query:

  1. Check Internal Knowledge โ€” Always calls ask_rag first to search local documents
  2. Enrich Externally โ€” Uses web/news/papers tools to verify or expand findings
  3. Match Intent to Tools โ€” Routes to specialized tools based on query type
  4. Cross-Verify โ€” For high-stakes claims, runs multiple independent tools
  5. Synthesize & Respond โ€” Combines all findings; uses reason_deeply for 3+ sources
  6. Voice Output โ€” Speaks the answer aloud using Kokoro TTS

๐Ÿ’ฌ Example Queries

You: Search for machine learning internships in Bangalore
You: What's the weather in Tokyo?
You: Summarize the Attention Is All You Need paper
You: Analyze this image: C:/photos/chart.png
You: What can you see through the camera?
You: Find recent news about AI regulation
You: Search research papers on transformer architectures

๐Ÿ“ Notes

  • Incremental Ingestion โ€” Only new/modified PDFs are re-embedded (tracked via tracked_files.json)
  • Chat Memory โ€” Rolling window of the last 10 conversation turns (20 messages)
  • Fully Local LLMs โ€” Both reasoning and vision models run on-device via Ollama
  • Voice โ€” Kokoro TTS plays audio at 24kHz sample rate
  • Camera โ€” Set camera_index in .env to match your webcam device index

About

Axion is a local, privacy-first AI research agent orchestrated via LangGraph. It combines RAG over private PDFs with real-time web search, computer vision, and voice synthesis using Ollama. Axion seamlessly bridges local intelligence and external APIs to provide a secure, multi-modal powerhouse for complex data analysis.

Resources

License

Stars

Watchers

Forks

Contributors