If it's on the screen, it's an API.
Control any desktop app through JSON. REST API + MCP server. Single binary. Zero dependencies.
Quick Start • How It Works • API • MCP Setup • Dashboard • Agent Prompt • Contributing
OculOS is a lightweight daemon that reads the OS accessibility tree and exposes every button, text field, checkbox, and menu item as a JSON endpoint. It works as a REST API for scripts and as an MCP server for AI agents — Claude, GPT, Gemini, or your own model.
No screenshots. No pixel coordinates. No browser extensions. No code injection. Just structured JSON.
Claude Code uses OculOS MCP tools to find Spotify, focus it, search for a song, and play it — fully autonomous.
Built-in dashboard with window list, interactive element tree, inspector, recorder, and live WebSocket events.
git clone https://github.com/huseyinstif/oculos.git
cd oculos
cargo build --releaseOculOS reads the OS accessibility tree, so macOS requires you to grant permission:
- Open System Settings → Privacy & Security → Accessibility
- Click the lock icon and enter your password
- Click + and add your terminal app (Terminal, iTerm2, Windsurf, etc.) or the
oculosbinary itself - Make sure the toggle is enabled
Without this permission, OculOS can list windows but cannot read UI elements or interact with them.
./target/release/oculos
# API → http://127.0.0.1:7878
# Dashboard → http://127.0.0.1:7878./target/release/oculos --mcpOculOS reads the OS accessibility tree and assigns each UI element a session-scoped UUID (oculos_id). You use that ID to interact.
# 1. List open windows
curl http://localhost:7878/windows
# 2. Get the UI tree for a window
curl http://localhost:7878/windows/{pid}/tree
# 3. Find a specific element
curl "http://localhost:7878/windows/{pid}/find?q=Submit&type=Button"
# 4. Click it
curl -X POST http://localhost:7878/interact/{id}/click
# 5. Type into a text field
curl -X POST http://localhost:7878/interact/{id}/set-text \
-H "Content-Type: application/json" \
-d '{"text":"hello world"}'Every element includes an actions array — the API tells you exactly what you can do:
{
"oculos_id": "a3f8c2d1-...",
"type": "Button",
"label": "Submit",
"enabled": true,
"actions": ["click", "focus"],
"rect": { "x": 120, "y": 340, "width": 80, "height": 32 }
}| Endpoint | Description |
|---|---|
GET /windows |
List all visible windows |
GET /windows/{pid}/tree |
Full UI element tree |
GET /windows/{pid}/find?q=&type=&interactive= |
Search elements |
GET /hwnd/{hwnd}/tree |
Tree by window handle |
GET /hwnd/{hwnd}/find |
Search by window handle |
| Endpoint | Description |
|---|---|
POST /windows/{pid}/focus |
Bring to foreground |
POST /windows/{pid}/close |
Close gracefully |
| Endpoint | Body | Description |
|---|---|---|
POST /interact/{id}/click |
— | Click |
POST /interact/{id}/set-text |
{"text":"…"} |
Replace text content |
POST /interact/{id}/send-keys |
{"keys":"…"} |
Keyboard input |
POST /interact/{id}/focus |
— | Move focus |
POST /interact/{id}/toggle |
— | Toggle checkbox |
POST /interact/{id}/expand |
— | Expand dropdown / tree |
POST /interact/{id}/collapse |
— | Collapse |
POST /interact/{id}/select |
— | Select list item |
POST /interact/{id}/set-range |
{"value":N} |
Set slider value |
POST /interact/{id}/scroll |
{"direction":"…"} |
Scroll container |
POST /interact/{id}/scroll-into-view |
— | Scroll into viewport |
POST /interact/{id}/highlight |
{"duration_ms":N} |
Highlight on screen |
| Endpoint | Description |
|---|---|
GET /health |
Status, version, uptime |
GET /ws |
WebSocket (live action events) |
Works with any MCP-compatible client. Add to your config:
{
"mcpServers": {
"oculos": {
"command": "/path/to/oculos",
"args": ["--mcp"]
}
}
}Tested with: Claude Code, Claude Desktop, Cursor, Windsurf
For non-MCP agents (OpenAI, Gemini, custom), paste AGENTS.md into the system prompt and give the agent HTTP access.
Built-in web UI at http://127.0.0.1:7878:
- Window list — all open windows with focus/close buttons
- Element tree — full interactive UI tree with search and filter
- Inspector — element details, properties, and all available actions
- Recorder — record a sequence of interactions, export as Python, JavaScript, or curl
- JSON viewer — raw element data with copy
- WebSocket — live event indicator, real-time action feed
- Shortcuts —
Rrefresh ·/search ·Eexpand ·Ccollapse ·Hhighlight ·JJSON
| Platform | Backend | Status |
|---|---|---|
| Windows | UI Automation (windows-rs) |
✅ Full — Win32, WPF, Electron, Qt |
| Linux | AT-SPI2 (atspi + zbus) |
✅ Working — GTK, Qt, Electron |
| macOS | Accessibility API (AXUIElement + CoreGraphics) |
✅ Working — Cocoa, Electron, Qt |
oculos [OPTIONS]
-b, --bind <ADDR> Bind address [default: 127.0.0.1:7878]
--static-dir <DIR> Static files directory [default: static]
--log <LEVEL> Log level: trace/debug/info/warn/error [default: info]
--mcp Run as MCP server over stdin/stdout
-h, --help Print help
| OculOS | Vision agents | Screen coordinate tools | Browser-only tools | |
|---|---|---|---|---|
| Approach | OS accessibility tree | Screenshots + LLM | Pixel positions | DOM / a11y tree |
| Scope | Any desktop app | Any (with latency) | Any (fragile) | Browser only |
| Speed | Instant | Seconds | Instant | Instant |
| Deterministic | ✅ | ❌ | ✅ | ✅ |
| GPU required | ❌ | ✅ | ❌ | ❌ |
| Cloud required | ❌ | Usually | ❌ | ❌ |
| Semantic | ✅ Labels + types | Varies | ❌ Coordinates | ✅ |
- Windows UIA backend (full — Win32, WPF, Electron, Qt)
- Linux AT-SPI2 backend
- macOS Accessibility backend (
AXUIElement, CoreGraphics window enumeration, CGEvent keyboard simulation) - REST API server (Axum)
- MCP server (JSON-RPC 2.0 over stdio)
- Session-scoped element registry with UUIDs
- Full keyboard simulation engine
- Window list with focus/close
- Interactive element tree with search/filter
- Element inspector with all actions
- API request log
- JSON viewer with copy
- Keyboard shortcuts
- Element highlighting (native GDI overlay)
- Automation recorder (record + export Python/JS/curl)
- WebSocket live events
- Health endpoint (uptime, version, platform)
- macOS element highlighting (native overlay)
- Python & TypeScript client SDKs
- Batch operations (multiple interactions per request)
- Conditional waits (wait for element to appear)
- Element caching & diffing (change detection)
- Docker image for CI/CD
We welcome contributions! See CONTRIBUTING.md for details.
Top areas:
- macOS backend —
AXUIElementimplementation - Client SDKs — Python, TypeScript wrappers
- Tests — cross-app integration tests

