GitHub - kalil0321/the-browser-arena: A simple way to test different browser agents with multiple models

Compare AI browser automation agents side-by-side. Submit a task, watch agents run in real-time, and compare speed, cost, and success.

⭐ Star this repo if you find it useful! It helps others discover the project.

🚀 Live Demo • 📖 Documentation • 🐛 Report Bug • 💡 Request Feature

✨ Features

🤖 Multi-Agent Support

Browser-Use - Multi-step browser automation with LLM reasoning
Smooth - AI-powered web automation
Stagehand - Local and cloud browser automation
Notte - Cloud browser sessions with live preview

📊 Real-Time Comparison

Parallel execution - Watch multiple agents work simultaneously
Live browser views - See exactly what each agent is doing
Built-in metrics - Time, steps, cost, and success rate tracking
Session history - All runs are automatically saved

🔧 Flexible Configuration

Multi-LLM support - OpenAI, Google Gemini, Anthropic Claude
Cost tracking - Monitor API usage and expenses

🚀 Quick Start

Prerequisites

Node.js 18+ and npm
Python 3.12+
A Convex account (sign up free)
At least one LLM API key (OpenAI, Google, or Anthropic)

Installation

Clone the repository

git clone https://github.com/kalil0321/the-browser-arena.git
cd the-browser-arena

Install dependencies

# Install web app dependencies
npm install

# Install agent server dependencies
cd agents
uv sync  # or: pip install -r requirements.txt
cd ..

Set up Convex

npx convex auth
npx convex dev

Configure environment variables

Create .env.local in the project root:

# Convex (required)
NEXT_PUBLIC_CONVEX_URL=https://your-deployment.convex.cloud
CONVEX_DEPLOYMENT=your-deployment

# LLMs (add at least one)
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=...
ANTHROPIC_API_KEY=...

# Browser automation services
BROWSER_USE_API_KEY=...          # Optional
SMOOTH_API_KEY=...               # Optional
BROWSERBASE_API_KEY=...          # Optional
BROWSERBASE_PROJECT_ID=...       # Optional

# Agent servers
AGENT_SERVER_URL=http://localhost:8080
STAGEHAND_SERVER_URL=http://localhost:3001

Start the services

Terminal 1 - Web app:

npm run dev

Terminal 2 - Agent server:

cd agents
source .venv/bin/activate
python server.py

Terminal 3 - Stagehand server:

cd stagehand
npm i
npx tsx src/index.ts # if you have tsx 
# OR vercel dev

Open your browser

Navigate to http://localhost:3000

📖 Usage

Create a task - Enter a browser automation instruction (e.g., "Find the cheapest flight from NYC to London")
Select agents - Choose which agents to compare
Configure settings - Set LLM models and other parameters
Watch them run - Observe agents execute in real-time with live browser views
Compare results - Review metrics, actions, and analyze performance

🤝 Contributing

Contributions are welcome! Here's how you can help:

🐛 Report bugs - Open an issue with detailed information
💡 Suggest features - Share your ideas for improvements
🔀 Submit PRs - Fix bugs or add new features

🙏 Acknowledgments

We want to give credit to BrowserArena by Sagnik Anupam, Davis Brown, Shuo Li, Eric Wong, Hamed Hassani, and Osbert Bastani, who independently introduced a similar idea earlier in their paper "BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks" (October 2025).

This project was developed independently and without knowledge of their work at the time. We learned of it after the fact and want to acknowledge their prior contribution to the same research direction.

📄 Paper: arXiv:2510.02418
💻 Repository: sagnikanupam/browserarena

If you use this project in academic work, please also consider citing their paper:

@misc{anupam2025browserarenaevaluatingllmagents,
  title={BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks},
  author={Sagnik Anupam and Davis Brown and Shuo Li and Eric Wong and Hamed Hassani and Osbert Bastani},
  year={2025},
  eprint={2510.02418},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2510.02418}
}

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 252 Commits
.github/workflows		.github/workflows
agents		agents
assets		assets
convex		convex
public		public
src		src
stagehand		stagehand
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
components.json		components.json
env.example		env.example
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✨ Features

🤖 Multi-Agent Support

📊 Real-Time Comparison

🔧 Flexible Configuration

🚀 Quick Start

Prerequisites

Installation

📖 Usage

🤝 Contributing

🙏 Acknowledgments

📝 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

✨ Features

🤖 Multi-Agent Support

📊 Real-Time Comparison

🔧 Flexible Configuration

🚀 Quick Start

Prerequisites

Installation

📖 Usage

🤝 Contributing

🙏 Acknowledgments

📝 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages