Skip to content

kg912/agent-browser-mcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Agent-Browser MCP Server

A highly secure, zero-dependency, local-only Model Context Protocol (MCP) server wrapping Vercel's agent-browser.

This server provides AI agents (like Claude via Claude Desktop) with a fast, token-efficient, and secure interface for browser automation.


⚡ Why Agent-Browser?

Traditional Playwright-based MCP implementations often flood LLM context windows with massive raw DOM states and complex tool schemas.

agent-browser solves this by utilizing a fast CLI backed by a persistent daemon, returning hyper-compact, ref-based accessibility snapshots (e.g., @e1, @e42). This dramatically reduces token usage, speeds up agent reasoning, and leads to more reliable interactions.

This MCP server encapsulates agent-browser's speed while providing a strictly governed, type-safe JSON-RPC interface designed specifically for AI agents.

🏗 Architecture

The system is architected in three layers, optimized for multi-core processors and local execution:

  1. Layer 1: The MCP Server (This Project)
    A pure ESM Node.js server implementing the JSON-RPC 2.0 protocol over stdio. It handles schema governance, strict argument validation, session routing, and security boundaries.
  2. Layer 2: The Command Layer
    The agent-browser native CLI, which provides sub-millisecond command routing.
  3. Layer 3: The Engine
    A persistent background Node.js daemon that manages Chromium instances via the Chrome DevTools Protocol (CDP).

🛡️ Security & Safety Posture

Security is paramount when granting AI agents local browser execution access. This server implements defense-in-depth:

  • Zero External Dependencies: The MCP server is built using only built-in Node.js modules (fs, crypto, child_process, etc.). No zod, no external SDKs. This eliminates supply-chain risks.
  • Strict Command Sanitization: Arguments are validated against strict allowlists (e.g., action enums, regex-enforced ref patterns like ^@e\d+$).
  • No Shell Execution: The agent-browser CLI is invoked using child_process.execFile with shell: false. Arguments are passed as literal arrays, making shell injection impossible.
  • Network Boundaries: The server communicates exclusively via standard I/O (stdio). It binds to no network ports, eliminating external network ingress vectors.
  • Encrypted State: Session state (cookies, local storage) is stored locally and encrypted at rest using a 256-bit AES key (AGENT_BROWSER_ENCRYPTION_KEY).
  • Resource Guardrails: Hard caps on concurrent sessions (MCP_MAX_SESSIONS), execution timeouts (MCP_COMMAND_TIMEOUT_MS), and maximum output truncation to prevent runaway CDP processes and context-window flooding.
  • Sanitized Logging: All diagnostic logs are written strictly to stderr with automated redaction of absolute home paths and secrets.

🚀 Installation & Setup

Prerequisites

  • Node.js >= 18.0.0
  • macOS, Linux, or Windows (WSL recommended)

1. Clone & Install

git clone https://github.com/yourusername/agent-browser-mcp.git
cd agent-browser-mcp
npm install

2. Auto-Generate Configuration

Run the included setup script to automatically generate your Claude Desktop configuration. This script securely generates an encryption key and resolves the absolute paths required for the server.

node setup.js

This will create a generated_mcp_config.json file in your repository folder, with the following format:

{
  "mcpServers": {
    "agent-browser": {
      "command": "node",
      "args": ["/absolute/path/to/agent-browser-mcp/index.js"],
      "env": {
        "AGENT_BROWSER_ENCRYPTION_KEY": "<paste-your-64-char-hex-key-here>",
        "MCP_SESSION_DIR": "/absolute/path/to/agent-browser-mcp/.sessions",
        "MCP_LOG_LEVEL": "info",
        "MCP_MAX_SESSIONS": "5",
        "MCP_COMMAND_TIMEOUT_MS": "30000",
        "AGENT_BROWSER_HEADED": "true"
      }
    }
  }
}

⚙️ Claude Desktop Configuration

Copy the contents of the newly created generated_mcp_config.json file and add it to your claude_desktop_config.json file.

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json

🧰 Available Tools

The server exposes 15 highly-optimized tools for agent interaction:

Tool Description
browser_navigate Navigate to a URL. Returns page title and current URL.
browser_snapshot Returns a compact accessibility-tree snapshot with @eN refs.
browser_interact Interact with an element via its ref (click, fill, type, hover, check, etc.).
browser_get_text Extract visible text content from a specific ref or the full page.
browser_press Press a keyboard key (e.g., Enter, Tab, Control+a).
browser_scroll Scroll the page (up, down, left, right).
browser_screenshot Take a screenshot of the current viewport or full page.
browser_get_url Return the current active URL.
browser_select Select options in a <select> dropdown by ref.
browser_navigate_back Go back in history.
browser_navigate_forward Go forward in history.
browser_reload Reload the current page.
browser_wait Wait for an element to appear (by ref) or for a specific duration (ms).
session_list List active, isolated browser sessions and their idle metrics.
session_teardown Securely destroy a named session and wipe its saved state.

💡 Typical Agent Workflow

An AI agent using this server will typically follow a loop like this:

  1. Navigate: browser_navigate({ url: "https://example.com" })
  2. Observe: browser_snapshot({ interactive_only: true })
  3. Act: browser_interact({ action: "click", ref: "@e5" })
  4. Verify: browser_wait({ ref: "@e12" }) followed by another browser_snapshot.

Because the agent interacts with @eN identifiers rather than raw DOM nodes or complex CSS selectors, the context window remains clean, and interactions are significantly less prone to breakage from minor UI changes.

🛠️ Development & Testing

The project includes a lightweight, zero-dependency test runner that tests validation logic and session state behavior.

# Run the test suite (42 tests)
npm test

Project Structure

  • index.js: Main entry point.
  • src/server.js: JSON-RPC 2.0 protocol handler and request router.
  • src/tools.js: MCP tool schemas and dispatch logic.
  • src/validate.js: Strict input validators.
  • src/runner.js: Secure child process execution.
  • src/session.js: Session tracking and teardown.
  • src/logger.js: Sanitized stderr logging.

License

MIT

About

Secure local MCP server wrapping Vercel's agent-browser CLI for AI agent browser automation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors