Skip to content

edihasaj/guiport

guiport

Playwright for desktop apps, built for coding agents.

macOS shipped · Windows beta · Linux beta.

CI License Platform Swift


A fast CLI/MCP control layer that lets agents like Claude, Codex, opencode, and Gemini inspect and operate desktop apps through structured UI data, then save successful flows as replayable tests.

Status

MVP. macOS is the primary target — Accessibility tree first, screenshots as fallback.

Windows is in beta with a day-1 input/screenshot/apps surface (Win32 SendInput, GDI BitBlt/PrintWindow, EnumWindows). UIA-backed tree/observe/find/click-by-selector and WinRT OCR are pending — they throw a clear uia_pending / ocr_pending error today. Track progress under the windows label.

Linux is in beta with the same shape: shell-out to xdotool/wmctrl/scrot on X11 and ydotool/grim on Wayland for input + screenshot, /proc + wmctrl for app enumeration. AT-SPI2-backed tree/observe/find and tesseract OCR throw atspi_pending / ocr_pending until those bindings land. Track under linux.

Why

Agents shouldn't drive desktop apps by guessing pixels. guiport exposes the desktop as structured data: app/window list, focused app/window, accessibility tree, element role/name/value/state/bounds/actions, screenshots only when needed, deterministic replay scripts after exploration.

Install

macOS 13+. See INSTALL.md for full options + platform status.

# Homebrew (once tap is published)
brew tap edihasaj/guiport && brew install guiport

# Or install script
curl -fsSL https://raw.githubusercontent.com/edihasaj/guiport/main/scripts/install.sh | sh

# Or from source
swift build -c release && sudo cp .build/release/guiport /usr/local/bin/guiport

Windows (beta — input/screenshot/apps; UIA tree pending):

iwr -useb https://raw.githubusercontent.com/edihasaj/guiport/main/scripts/install.ps1 | iex

Linux (beta — same shape; AT-SPI2 tree pending). Install xdotool+wmctrl+scrot (X11) or ydotool+grim (Wayland), then:

curl -fsSL https://raw.githubusercontent.com/edihasaj/guiport/main/scripts/install.sh | sh

See INSTALL.md for full per-platform notes.

Quick start

guiport doctor                                       # check permissions
guiport apps --json                                  # list running apps with windows
guiport observe --app "Safari"                       # focused window summary
guiport tree --app "Safari" --json                   # full accessibility tree
guiport find --app "Safari" 'button[name="Save"]'    # match selector
guiport click --app "Safari" 'button[name="Save"]'
guiport type "hello"
guiport screenshot --app "Safari" -o safari.png

# Vision fallback for canvas / sparse-AX apps:
guiport find-text --app "Figma" "Save"               # OCR via Apple Vision
guiport click-text --app "Figma" "Save"              # OCR + click center
guiport click-at 420 180                             # raw coordinates
guiport record smoke.yaml                            # WIP
guiport run smoke.yaml
guiport serve --mcp                                  # MCP server over stdio

Selector syntax

role[attr=value][attr~=substring][index]

Examples:

button[name="Save"]
textfield[identifier="search"]
AXButton[name~="Open"][index=0]

Supported attributes: role, name (title), value, identifier, description, text (matches name or value), index.

Vision fallback (canvas / Electron apps)

For apps with sparse or absent accessibility (Figma, custom-rendered editors, hardened Electron), guiport falls back through three layers:

  1. click-at X Y — raw screen coordinates. The agent reads coords off a screenshot.
  2. find-text "Save" / click-text "Save" — Apple Vision (VNRecognizeTextRequest) OCRs the window and returns bounds + center for matched text. On-device, free, no extra deps.
  3. LLM vision — out of scope for MVP; agents can call screenshot + their own model to get coords, then click-at.

OCR-found bounds drift across font/scale changes, so prefer AX selectors for replay and OCR for exploration.

Permissions

guiport needs:

  • Accessibility — required for AX tree + input events.
  • Screen Recording — required for screenshot and screenshot-on-failure artifacts.

Run guiport doctor to check status and get System Settings deep links.

Architecture

  • Pure Swift, single binary.
  • GuiportCore library: AX bridge, selector engine, input, screenshots, replay runner, MCP server.
  • guiport CLI: thin wrapper using swift-argument-parser.

Non-goals (MVP)

  • No Windows/Linux yet.
  • No vision-first automation.
  • No autonomous Manus clone.
  • No background/session-0 automation.

License

MIT — see LICENSE.

Author

Edi Hasaj

About

Playwright for desktop apps, built for coding agents. macOS first; Windows/Linux on the roadmap.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors