Compare AI browser automation agents side-by-side. Submit a task, watch agents run in real-time, and compare speed, cost, and success.
⭐ Star this repo if you find it useful! It helps others discover the project.
🚀 Live Demo • 📖 Documentation • 🐛 Report Bug • 💡 Request Feature
- Browser-Use - Multi-step browser automation with LLM reasoning
- Smooth - AI-powered web automation
- Stagehand - Local and cloud browser automation
- Notte - Cloud browser sessions with live preview
- Parallel execution - Watch multiple agents work simultaneously
- Live browser views - See exactly what each agent is doing
- Built-in metrics - Time, steps, cost, and success rate tracking
- Session history - All runs are automatically saved
- Multi-LLM support - OpenAI, Google Gemini, Anthropic Claude
- Cost tracking - Monitor API usage and expenses
- Node.js 18+ and npm
- Python 3.12+
- A Convex account (sign up free)
- At least one LLM API key (OpenAI, Google, or Anthropic)
- Clone the repository
git clone https://github.com/kalil0321/the-browser-arena.git
cd the-browser-arena- Install dependencies
# Install web app dependencies
npm install
# Install agent server dependencies
cd agents
uv sync # or: pip install -r requirements.txt
cd ..- Set up Convex
npx convex auth
npx convex dev- Configure environment variables
Create .env.local in the project root:
# Convex (required)
NEXT_PUBLIC_CONVEX_URL=https://your-deployment.convex.cloud
CONVEX_DEPLOYMENT=your-deployment
# LLMs (add at least one)
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=...
ANTHROPIC_API_KEY=...
# Browser automation services
BROWSER_USE_API_KEY=... # Optional
SMOOTH_API_KEY=... # Optional
BROWSERBASE_API_KEY=... # Optional
BROWSERBASE_PROJECT_ID=... # Optional
# Agent servers
AGENT_SERVER_URL=http://localhost:8080
STAGEHAND_SERVER_URL=http://localhost:3001- Start the services
Terminal 1 - Web app:
npm run devTerminal 2 - Agent server:
cd agents
source .venv/bin/activate
python server.pyTerminal 3 - Stagehand server:
cd stagehand
npm i
npx tsx src/index.ts # if you have tsx
# OR vercel dev- Open your browser
Navigate to http://localhost:3000
- Create a task - Enter a browser automation instruction (e.g., "Find the cheapest flight from NYC to London")
- Select agents - Choose which agents to compare
- Configure settings - Set LLM models and other parameters
- Watch them run - Observe agents execute in real-time with live browser views
- Compare results - Review metrics, actions, and analyze performance
Contributions are welcome! Here's how you can help:
- 🐛 Report bugs - Open an issue with detailed information
- 💡 Suggest features - Share your ideas for improvements
- 🔀 Submit PRs - Fix bugs or add new features
We want to give credit to BrowserArena by Sagnik Anupam, Davis Brown, Shuo Li, Eric Wong, Hamed Hassani, and Osbert Bastani, who independently introduced a similar idea earlier in their paper "BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks" (October 2025).
This project was developed independently and without knowledge of their work at the time. We learned of it after the fact and want to acknowledge their prior contribution to the same research direction.
- 📄 Paper: arXiv:2510.02418
- 💻 Repository: sagnikanupam/browserarena
If you use this project in academic work, please also consider citing their paper:
@misc{anupam2025browserarenaevaluatingllmagents,
title={BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks},
author={Sagnik Anupam and Davis Brown and Shuo Li and Eric Wong and Hamed Hassani and Osbert Bastani},
year={2025},
eprint={2510.02418},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2510.02418}
}This project is licensed under the MIT License - see the LICENSE file for details.

