| Category | Technologies |
|---|---|
| AI Framework | LangChain 1.1.0, LangGraph, LangSmith |
| LLM Backend | Ollama (Gemma 3 27B), ChatOllama |
| Agent Pattern | ReAct (Reasoning + Acting) |
| Web Interface | Gradio 6.0.1, FastAPI |
| Multi-Modal | OpenAI Whisper, Pillow, OpenCV |
| Search Tools | DuckDuckGo, ArXiv, Wikipedia |
| Media Processing | yt-dlp, FFmpeg, PyDub |
| Data Processing | Pandas, NumPy, PyTorch 2.9.1 |
| Authentication | Hugging Face OAuth, Google Auth |
graph LR
subgraph UI["🖥️ User Interface"]
Web[Gradio Web UI]
end
subgraph Core["🤖 Agent Core"]
Agent[ReAct Agent]
LLM[Ollama<br/>Gemma 3 27B]
end
subgraph Tools["🛠️ Tools"]
Search[Web & Academic<br/>Search]
Media[Audio/Video<br/>Processing]
Vision[Image<br/>Analysis]
Code[Python<br/>Executor]
API[Weather &<br/>File APIs]
end
subgraph External["☁️ Services"]
OllamaServer[Ollama Server]
APIs[External APIs]
end
Web -->|Query| Agent
Agent <-->|Reasoning| LLM
Agent -->|Select Tool| Search
Agent -->|Select Tool| Media
Agent -->|Select Tool| Vision
Agent -->|Select Tool| Code
Agent -->|Select Tool| API
LLM <-->|API Call| OllamaServer
Vision -.->|Multimodal| OllamaServer
API -->|Request| APIs
style Agent fill:#4CAF50,stroke:#2E7D32,stroke-width:3px,color:#fff
style LLM fill:#2196F3,stroke:#1565C0,stroke-width:3px,color:#fff
style Web fill:#FF9800,stroke:#E65100,stroke-width:3px,color:#fff
style OllamaServer fill:#9C27B0,stroke:#6A1B9A,stroke-width:3px,color:#fff
style Search fill:#607D8B,stroke:#37474F,stroke-width:2px,color:#fff
style Media fill:#607D8B,stroke:#37474F,stroke-width:2px,color:#fff
style Vision fill:#607D8B,stroke:#37474F,stroke-width:2px,color:#fff
style Code fill:#607D8B,stroke:#37474F,stroke-width:2px,color:#fff
style API fill:#607D8B,stroke:#37474F,stroke-width:2px,color:#fff
style APIs fill:#795548,stroke:#5D4037,stroke-width:2px,color:#fff
This repository showcases a production-ready multi-tool AI agent built on the LangChain framework, designed to run entirely on self-hosted infrastructure using Ollama. The agent leverages the Gemma 3 27B model and implements the ReAct (Reasoning and Acting) pattern for intelligent tool selection and execution.
- ✅ Multimodal Understanding: Analyzes images using vision-language capabilities
- ✅ Media Processing: Transcribes audio and video content with Whisper
- ✅ Information Retrieval: Searches web, academic papers, and encyclopedic knowledge
- ✅ Code Execution: Runs Python code dynamically for complex calculations
- ✅ API Integration: Fetches real-time weather and external data
- ✅ Reliable Parsing: Handles complex multi-argument tool calls with custom input parsing
Analyzes local image files using Ollama's multimodal API.
Example Input:
"image_path='/path/to/chess.png', prompt='What is the best next move for black?'"Downloads and transcribes YouTube videos using yt-dlp and Whisper.
Example Usage:
"Extract transcript from https://www.youtube.com/watch?v=VIDEO_ID"Transcribes local audio files (MP3, WAV) using OpenAI Whisper.
Real-time web search for current news and general information.
Searches academic papers and scientific research.
Queries Wikipedia for encyclopedic knowledge.
Downloads files by task ID and provides content previews for Excel, JSON, images, and audio.
Supported Formats:
- Excel (.xlsx) → DataFrame preview
- JSON → Parsed content
- Images (.png, .jpg) → File path for vision analysis
- Audio (.mp3) → File path for transcription
Executes arbitrary Python code for mathematical calculations and data processing.
Example:
"Run this Python code: print(sum([i**2 for i in range(1, 101)]))"Fetches current weather data from WeatherStack API.
Example:
"What's the current weather in Tokyo?"1. Python Environment
- Python 3.11 or newer
- pip package manager
2. Ollama Server
- Install Ollama from ollama.ai
- Pull the Gemma 3 model:
ollama pull gemma3:27b
- Run Ollama server:
ollama serve
- Optional: Expose via Cloudflare Tunnel for remote access
3. System Dependencies
# Ubuntu/Debian
sudo apt install yt-dlp ffmpeg
# macOS
brew install yt-dlp ffmpeg
# Windows (using Chocolatey)
choco install yt-dlp ffmpeg1. Clone the Repository
git clone https://github.com/yourusername/langchain-agent.git
cd langchain-agent2. Install Python Dependencies
pip install -r requirements.txt3. Configure Environment Variables
Create a .env file in the project root:
CLOUDFLARE_TUNNEL_URL=https://your-ollama-tunnel-url.com
WEATHER_API=your_weatherstack_api_key
OLLAMA_MODEL_ID=gemma3:27b4. Set Up API Keys
- WeatherStack: Get a free API key from weatherstack.com
- Hugging Face (optional): For Gradio OAuth features
python app.pyAccess the web UI at http://localhost:7860
Features:
- Login with Hugging Face account
- Run evaluation on multiple questions
- Submit answers and get scored results
- View detailed agent logs
from agent import build_agent
# Initialize the agent
agent = build_agent()
# Ask a question
response = agent.invoke({
"input": "What's the weather in New York?",
"chat_history": []
})
print(response['output'])python agent_tester.pyThis runs a comprehensive test suite covering all tools with performance metrics.
langchain_agent/
├── agent.py # Core agent logic and tool definitions
├── app.py # Gradio web interface
├── agent_tester.py # Tool testing framework
├── requirements.txt # Python dependencies
├── system_prompt.txt # ReAct agent system prompt
├── README.md # This file
└── .env # Environment variables (create this)
Edit system_prompt.txt to customize the agent's behavior, output format, and reasoning style.
Modify the tools list in agent.py (line 383-392) to add/remove tools:
tools = [
WeatherInfoTool,
transcribe_audio_whisper,
youtube_transcript_func,
caption_image_func,
file_download_func,
search_tool,
wiki_search_tool,
archive_search_tool,
python_repl_tool,
]Update these constants in agent.py (lines 27-29):
CLOUDFLARE_TUNNEL_URL = "http://localhost:11434" # or your tunnel URL
OLLAMA_MODEL_ID = "gemma3:27b" # or another model
WEATHER_API = "your_api_key_here"The agent uses custom regex-based parsing to handle complex multi-argument tool calls:
# Example: Handles inputs like
# "image_path='/path/file.png', prompt='Analyze this'"
matches = re.findall(r"(\w+)\s*=\s*([^\,]+)", raw_input)Implements the powerful ReAct (Reasoning + Acting) pattern:
- Thought: Agent reasons about the task
- Action: Selects and executes appropriate tool
- Observation: Processes tool output
- Repeat: Continues until final answer is reached
Built-in error recovery with handle_parsing_errors=True allows the agent to self-correct when tool calls fail.
Integrated timing metrics for monitoring tool execution speeds:
weather_time = weather_stop - weather_start
print(f"⏱️ Weather tool response: {weather_time:.2f} seconds")Run the comprehensive test suite:
python agent_tester.pyTest Coverage:
- Image analysis
- Web search
- Academic search
- Wikipedia queries
- Weather API
- Python code execution
- YouTube transcription
- File downloads
User: "Analyze the image at /path/chess.png and suggest the best move for black"
→ Agent selects caption_image_func
→ Sends image to Ollama multimodal API
→ Returns chess move analysis
User: "Find the latest paper on quantum computing and calculate 2^1024"
→ Agent selects academic_search (ArXiv)
→ Agent selects python_repl_tool
→ Combines research results with calculation
User: "Transcribe https://youtube.com/watch?v=XYZ and summarize"
→ Agent uses youtube_transcript_func
→ Downloads audio with yt-dlp
→ Transcribes with Whisper
→ Summarizes content using LLM
- API keys are stored in environment variables, not hardcoded
- Create a
.gitignoreto exclude.envfiles - Use secure tunnels (Cloudflare) for remote Ollama access
- Validate user inputs before executing Python code
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new tools
- Submit a pull request
This project is licensed under the MIT License.
- LangChain for the agent framework
- Ollama for local LLM hosting
- OpenAI Whisper for audio transcription
- Gradio for the web interface
- Hugging Face for model hosting and evaluation infrastructure