🤖 Versatile Multi-Tool LangChain Agent

🚀 Technology Stack

Category	Technologies
AI Framework	LangChain 1.1.0, LangGraph, LangSmith
LLM Backend	Ollama (Gemma 3 27B), ChatOllama
Agent Pattern	ReAct (Reasoning + Acting)
Web Interface	Gradio 6.0.1, FastAPI
Multi-Modal	OpenAI Whisper, Pillow, OpenCV
Search Tools	DuckDuckGo, ArXiv, Wikipedia
Media Processing	yt-dlp, FFmpeg, PyDub
Data Processing	Pandas, NumPy, PyTorch 2.9.1
Authentication	Hugging Face OAuth, Google Auth

📐 System Architecture

graph LR
    subgraph UI["🖥️ User Interface"]
        Web[Gradio Web UI]
    end

    subgraph Core["🤖 Agent Core"]
        Agent[ReAct Agent]
        LLM[Ollama<br/>Gemma 3 27B]
    end

    subgraph Tools["🛠️ Tools"]
        Search[Web & Academic<br/>Search]
        Media[Audio/Video<br/>Processing]
        Vision[Image<br/>Analysis]
        Code[Python<br/>Executor]
        API[Weather &<br/>File APIs]
    end

    subgraph External["☁️ Services"]
        OllamaServer[Ollama Server]
        APIs[External APIs]
    end

    Web -->|Query| Agent
    Agent <-->|Reasoning| LLM
    Agent -->|Select Tool| Search
    Agent -->|Select Tool| Media
    Agent -->|Select Tool| Vision
    Agent -->|Select Tool| Code
    Agent -->|Select Tool| API
    
    LLM <-->|API Call| OllamaServer
    Vision -.->|Multimodal| OllamaServer
    API -->|Request| APIs

    style Agent fill:#4CAF50,stroke:#2E7D32,stroke-width:3px,color:#fff
    style LLM fill:#2196F3,stroke:#1565C0,stroke-width:3px,color:#fff
    style Web fill:#FF9800,stroke:#E65100,stroke-width:3px,color:#fff
    style OllamaServer fill:#9C27B0,stroke:#6A1B9A,stroke-width:3px,color:#fff
    style Search fill:#607D8B,stroke:#37474F,stroke-width:2px,color:#fff
    style Media fill:#607D8B,stroke:#37474F,stroke-width:2px,color:#fff
    style Vision fill:#607D8B,stroke:#37474F,stroke-width:2px,color:#fff
    style Code fill:#607D8B,stroke:#37474F,stroke-width:2px,color:#fff
    style API fill:#607D8B,stroke:#37474F,stroke-width:2px,color:#fff
    style APIs fill:#795548,stroke:#5D4037,stroke-width:2px,color:#fff

🌟 Project Description

This repository showcases a production-ready multi-tool AI agent built on the LangChain framework, designed to run entirely on self-hosted infrastructure using Ollama. The agent leverages the Gemma 3 27B model and implements the ReAct (Reasoning and Acting) pattern for intelligent tool selection and execution.

Core Capabilities

✅ Multimodal Understanding: Analyzes images using vision-language capabilities
✅ Media Processing: Transcribes audio and video content with Whisper
✅ Information Retrieval: Searches web, academic papers, and encyclopedic knowledge
✅ Code Execution: Runs Python code dynamically for complex calculations
✅ API Integration: Fetches real-time weather and external data
✅ Reliable Parsing: Handles complex multi-argument tool calls with custom input parsing

🛠️ Tool Ecosystem

1. Multimodal & Vision Tools

`caption_image_func`

Analyzes local image files using Ollama's multimodal API.

Example Input:

"image_path='/path/to/chess.png', prompt='What is the best next move for black?'"

2. Media Processing Tools

`youtube_transcript_func`

Downloads and transcribes YouTube videos using yt-dlp and Whisper.

Example Usage:

"Extract transcript from https://www.youtube.com/watch?v=VIDEO_ID"

`transcribe_audio_whisper`

Transcribes local audio files (MP3, WAV) using OpenAI Whisper.

3. Information Retrieval Tools

`general_web_search` (DuckDuckGo)

Real-time web search for current news and general information.

`academic_search` (ArXiv)

Searches academic papers and scientific research.

`wikipedia_search`

Queries Wikipedia for encyclopedic knowledge.

4. File & Data Tools

`file_download_func`

Downloads files by task ID and provides content previews for Excel, JSON, images, and audio.

Supported Formats:

Excel (.xlsx) → DataFrame preview
JSON → Parsed content
Images (.png, .jpg) → File path for vision analysis
Audio (.mp3) → File path for transcription

5. Computation Tools

`python_repl_tool`

Executes arbitrary Python code for mathematical calculations and data processing.

Example:

"Run this Python code: print(sum([i**2 for i in range(1, 101)]))"

6. API Integration Tools

`WeatherInfoTool`

Fetches current weather data from WeatherStack API.

Example:

"What's the current weather in Tokyo?"

⚙️ Setup and Installation

Prerequisites

1. Python Environment

Python 3.11 or newer
pip package manager

2. Ollama Server

Install Ollama from ollama.ai
Pull the Gemma 3 model:
```
ollama pull gemma3:27b
```
Run Ollama server:
```
ollama serve
```
Optional: Expose via Cloudflare Tunnel for remote access

3. System Dependencies

# Ubuntu/Debian
sudo apt install yt-dlp ffmpeg

# macOS
brew install yt-dlp ffmpeg

# Windows (using Chocolatey)
choco install yt-dlp ffmpeg

Installation Steps

1. Clone the Repository

git clone https://github.com/yourusername/langchain-agent.git
cd langchain-agent

2. Install Python Dependencies

pip install -r requirements.txt

3. Configure Environment Variables

Create a .env file in the project root:

CLOUDFLARE_TUNNEL_URL=https://your-ollama-tunnel-url.com
WEATHER_API=your_weatherstack_api_key
OLLAMA_MODEL_ID=gemma3:27b

4. Set Up API Keys

WeatherStack: Get a free API key from weatherstack.com
Hugging Face (optional): For Gradio OAuth features

🎯 Usage

1. Run the Gradio Web Interface

python app.py

Access the web UI at http://localhost:7860

Features:

Login with Hugging Face account
Run evaluation on multiple questions
Submit answers and get scored results
View detailed agent logs

2. Use the Agent Programmatically

from agent import build_agent

# Initialize the agent
agent = build_agent()

# Ask a question
response = agent.invoke({
    "input": "What's the weather in New York?",
    "chat_history": []
})

print(response['output'])

3. Test Individual Tools

python agent_tester.py

This runs a comprehensive test suite covering all tools with performance metrics.

📊 Project Structure

langchain_agent/
├── agent.py              # Core agent logic and tool definitions
├── app.py                # Gradio web interface
├── agent_tester.py       # Tool testing framework
├── requirements.txt      # Python dependencies
├── system_prompt.txt     # ReAct agent system prompt
├── README.md            # This file
└── .env                 # Environment variables (create this)

🔧 Configuration

Agent Behavior

Edit system_prompt.txt to customize the agent's behavior, output format, and reasoning style.

Tool Selection

Modify the tools list in agent.py (line 383-392) to add/remove tools:

tools = [
    WeatherInfoTool,
    transcribe_audio_whisper,
    youtube_transcript_func,
    caption_image_func,
    file_download_func,
    search_tool,
    wiki_search_tool,
    archive_search_tool,
    python_repl_tool,
]

Ollama Configuration

Update these constants in agent.py (lines 27-29):

CLOUDFLARE_TUNNEL_URL = "http://localhost:11434"  # or your tunnel URL
OLLAMA_MODEL_ID = "gemma3:27b"  # or another model
WEATHER_API = "your_api_key_here"

🎨 Key Technical Features

1. Robust Input Parsing

The agent uses custom regex-based parsing to handle complex multi-argument tool calls:

# Example: Handles inputs like
# "image_path='/path/file.png', prompt='Analyze this'"
matches = re.findall(r"(\w+)\s*=\s*([^\,]+)", raw_input)

2. ReAct Agent Pattern

Implements the powerful ReAct (Reasoning + Acting) pattern:

Thought: Agent reasons about the task
Action: Selects and executes appropriate tool
Observation: Processes tool output
Repeat: Continues until final answer is reached

3. Error Handling

Built-in error recovery with handle_parsing_errors=True allows the agent to self-correct when tool calls fail.

4. Performance Tracking

Integrated timing metrics for monitoring tool execution speeds:

weather_time = weather_stop - weather_start
print(f"⏱️ Weather tool response: {weather_time:.2f} seconds")

🧪 Testing

Run the comprehensive test suite:

python agent_tester.py

Test Coverage:

Image analysis
Web search
Academic search
Wikipedia queries
Weather API
Python code execution
YouTube transcription
File downloads

📝 Example Workflows

Workflow 1: Image Analysis

User: "Analyze the image at /path/chess.png and suggest the best move for black"
→ Agent selects caption_image_func
→ Sends image to Ollama multimodal API
→ Returns chess move analysis

Workflow 2: Research + Calculation

User: "Find the latest paper on quantum computing and calculate 2^1024"
→ Agent selects academic_search (ArXiv)
→ Agent selects python_repl_tool
→ Combines research results with calculation

Workflow 3: Media Transcription

User: "Transcribe https://youtube.com/watch?v=XYZ and summarize"
→ Agent uses youtube_transcript_func
→ Downloads audio with yt-dlp
→ Transcribes with Whisper
→ Summarizes content using LLM

🔒 Security Notes

API keys are stored in environment variables, not hardcoded
Create a .gitignore to exclude .env files
Use secure tunnels (Cloudflare) for remote Ollama access
Validate user inputs before executing Python code

🤝 Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Add tests for new tools
Submit a pull request

📜 License

This project is licensed under the MIT License.

🙏 Acknowledgments

LangChain for the agent framework
Ollama for local LLM hosting
OpenAI Whisper for audio transcription
Gradio for the web interface
Hugging Face for model hosting and evaluation infrastructure

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
1f975693-876d-457b-a649-393859e79bf3.mp3		1f975693-876d-457b-a649-393859e79bf3.mp3
7bd855d8-463d-4ed5-93ca-5fe35145f733.xlsx		7bd855d8-463d-4ed5-93ca-5fe35145f733.xlsx
99c9cc74-fdc8-46c6-8f8d-3ce2d3bfeea3.mp3		99c9cc74-fdc8-46c6-8f8d-3ce2d3bfeea3.mp3
README.md		README.md
agent.py		agent.py
agent_tester.py		agent_tester.py
app.py		app.py
cca530fc-4052-43b2-b130-b30968d8aa44.png		cca530fc-4052-43b2-b130-b30968d8aa44.png
deneme.py		deneme.py
frame_0.jpg		frame_0.jpg
frame_1.jpg		frame_1.jpg
frame_2.jpg		frame_2.jpg
requirements.txt		requirements.txt
system_prompt.txt		system_prompt.txt
task.py		task.py
video.mp4		video.mp4

Folders and files

Latest commit

History

Repository files navigation

🤖 Versatile Multi-Tool LangChain Agent

🚀 Technology Stack

📐 System Architecture

🌟 Project Description

Core Capabilities

🛠️ Tool Ecosystem

1. Multimodal & Vision Tools

caption_image_func

2. Media Processing Tools

youtube_transcript_func

transcribe_audio_whisper

3. Information Retrieval Tools

general_web_search (DuckDuckGo)

academic_search (ArXiv)

wikipedia_search

4. File & Data Tools

file_download_func

5. Computation Tools

python_repl_tool

6. API Integration Tools

WeatherInfoTool

⚙️ Setup and Installation

Prerequisites

Installation Steps

🎯 Usage

1. Run the Gradio Web Interface

2. Use the Agent Programmatically

3. Test Individual Tools

📊 Project Structure

🔧 Configuration

Agent Behavior

Tool Selection

Ollama Configuration

🎨 Key Technical Features

1. Robust Input Parsing

2. ReAct Agent Pattern

3. Error Handling

4. Performance Tracking

🧪 Testing

📝 Example Workflows

Workflow 1: Image Analysis

Workflow 2: Research + Calculation

Workflow 3: Media Transcription

🔒 Security Notes

🤝 Contributing

📜 License

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`caption_image_func`

`youtube_transcript_func`

`transcribe_audio_whisper`

`general_web_search` (DuckDuckGo)

`academic_search` (ArXiv)

`wikipedia_search`

`file_download_func`

`python_repl_tool`

`WeatherInfoTool`

Packages