Skip to content

huseyinstif/oculos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OculOS

OculOS

If it's on the screen, it's an API.
Control any desktop app through JSON. REST API + MCP server. Single binary. Zero dependencies.

Quick StartHow It WorksAPIMCP SetupDashboardAgent PromptContributing

MIT License GitHub Stars Built with Rust Platforms


OculOS is a lightweight daemon that reads the OS accessibility tree and exposes every button, text field, checkbox, and menu item as a JSON endpoint. It works as a REST API for scripts and as an MCP server for AI agents — Claude, GPT, Gemini, or your own model.

No screenshots. No pixel coordinates. No browser extensions. No code injection. Just structured JSON.


Claude Code + OculOS → Spotify

Claude Code controlling Spotify through OculOS MCP

Claude Code uses OculOS MCP tools to find Spotify, focus it, search for a song, and play it — fully autonomous.

Web Dashboard

OculOS Dashboard — element tree inspector

Built-in dashboard with window list, interactive element tree, inspector, recorder, and live WebSocket events.


Quick Start

git clone https://github.com/huseyinstif/oculos.git
cd oculos
cargo build --release

macOS: Grant Accessibility Permission

OculOS reads the OS accessibility tree, so macOS requires you to grant permission:

  1. Open System Settings → Privacy & Security → Accessibility
  2. Click the lock icon and enter your password
  3. Click + and add your terminal app (Terminal, iTerm2, Windsurf, etc.) or the oculos binary itself
  4. Make sure the toggle is enabled

Without this permission, OculOS can list windows but cannot read UI elements or interact with them.

HTTP mode (API + Dashboard)

./target/release/oculos
# API       → http://127.0.0.1:7878
# Dashboard → http://127.0.0.1:7878

MCP mode (for AI agents)

./target/release/oculos --mcp

How It Works

OculOS reads the OS accessibility tree and assigns each UI element a session-scoped UUID (oculos_id). You use that ID to interact.

# 1. List open windows
curl http://localhost:7878/windows

# 2. Get the UI tree for a window
curl http://localhost:7878/windows/{pid}/tree

# 3. Find a specific element
curl "http://localhost:7878/windows/{pid}/find?q=Submit&type=Button"

# 4. Click it
curl -X POST http://localhost:7878/interact/{id}/click

# 5. Type into a text field
curl -X POST http://localhost:7878/interact/{id}/set-text \
  -H "Content-Type: application/json" \
  -d '{"text":"hello world"}'

Every element includes an actions array — the API tells you exactly what you can do:

{
  "oculos_id": "a3f8c2d1-...",
  "type": "Button",
  "label": "Submit",
  "enabled": true,
  "actions": ["click", "focus"],
  "rect": { "x": 120, "y": 340, "width": 80, "height": 32 }
}

API

Discovery

Endpoint Description
GET /windows List all visible windows
GET /windows/{pid}/tree Full UI element tree
GET /windows/{pid}/find?q=&type=&interactive= Search elements
GET /hwnd/{hwnd}/tree Tree by window handle
GET /hwnd/{hwnd}/find Search by window handle

Window operations

Endpoint Description
POST /windows/{pid}/focus Bring to foreground
POST /windows/{pid}/close Close gracefully

Element interactions

Endpoint Body Description
POST /interact/{id}/click Click
POST /interact/{id}/set-text {"text":"…"} Replace text content
POST /interact/{id}/send-keys {"keys":"…"} Keyboard input
POST /interact/{id}/focus Move focus
POST /interact/{id}/toggle Toggle checkbox
POST /interact/{id}/expand Expand dropdown / tree
POST /interact/{id}/collapse Collapse
POST /interact/{id}/select Select list item
POST /interact/{id}/set-range {"value":N} Set slider value
POST /interact/{id}/scroll {"direction":"…"} Scroll container
POST /interact/{id}/scroll-into-view Scroll into viewport
POST /interact/{id}/highlight {"duration_ms":N} Highlight on screen

System

Endpoint Description
GET /health Status, version, uptime
GET /ws WebSocket (live action events)

MCP Setup

Works with any MCP-compatible client. Add to your config:

{
  "mcpServers": {
    "oculos": {
      "command": "/path/to/oculos",
      "args": ["--mcp"]
    }
  }
}

Tested with: Claude Code, Claude Desktop, Cursor, Windsurf

For non-MCP agents (OpenAI, Gemini, custom), paste AGENTS.md into the system prompt and give the agent HTTP access.


Dashboard

Built-in web UI at http://127.0.0.1:7878:

  • Window list — all open windows with focus/close buttons
  • Element tree — full interactive UI tree with search and filter
  • Inspector — element details, properties, and all available actions
  • Recorder — record a sequence of interactions, export as Python, JavaScript, or curl
  • JSON viewer — raw element data with copy
  • WebSocket — live event indicator, real-time action feed
  • ShortcutsR refresh · / search · E expand · C collapse · H highlight · J JSON

Platform Support

Platform Backend Status
Windows UI Automation (windows-rs) ✅ Full — Win32, WPF, Electron, Qt
Linux AT-SPI2 (atspi + zbus) ✅ Working — GTK, Qt, Electron
macOS Accessibility API (AXUIElement + CoreGraphics) ✅ Working — Cocoa, Electron, Qt

CLI

oculos [OPTIONS]

  -b, --bind <ADDR>       Bind address [default: 127.0.0.1:7878]
      --static-dir <DIR>  Static files directory [default: static]
      --log <LEVEL>       Log level: trace/debug/info/warn/error [default: info]
      --mcp               Run as MCP server over stdin/stdout
  -h, --help              Print help

How OculOS Differs

OculOS Vision agents Screen coordinate tools Browser-only tools
Approach OS accessibility tree Screenshots + LLM Pixel positions DOM / a11y tree
Scope Any desktop app Any (with latency) Any (fragile) Browser only
Speed Instant Seconds Instant Instant
Deterministic
GPU required
Cloud required Usually
Semantic ✅ Labels + types Varies ❌ Coordinates

Everything Built So Far

Core

  • Windows UIA backend (full — Win32, WPF, Electron, Qt)
  • Linux AT-SPI2 backend
  • macOS Accessibility backend (AXUIElement, CoreGraphics window enumeration, CGEvent keyboard simulation)
  • REST API server (Axum)
  • MCP server (JSON-RPC 2.0 over stdio)
  • Session-scoped element registry with UUIDs
  • Full keyboard simulation engine

Dashboard

  • Window list with focus/close
  • Interactive element tree with search/filter
  • Element inspector with all actions
  • API request log
  • JSON viewer with copy
  • Keyboard shortcuts

Advanced

  • Element highlighting (native GDI overlay)
  • Automation recorder (record + export Python/JS/curl)
  • WebSocket live events
  • Health endpoint (uptime, version, platform)

Planned

  • macOS element highlighting (native overlay)
  • Python & TypeScript client SDKs
  • Batch operations (multiple interactions per request)
  • Conditional waits (wait for element to appear)
  • Element caching & diffing (change detection)
  • Docker image for CI/CD

Contributing

We welcome contributions! See CONTRIBUTING.md for details.

Top areas:

  • macOS backendAXUIElement implementation
  • Client SDKs — Python, TypeScript wrappers
  • Tests — cross-app integration tests

License

MIT

About

If it's on the screen, it's an API. Control any desktop app via REST + MCP. Rust.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors