Playwright for desktop apps, built for coding agents.
macOS shipped · Windows beta · Linux beta.
A fast CLI/MCP control layer that lets agents like Claude, Codex, opencode, and Gemini inspect and operate desktop apps through structured UI data, then save successful flows as replayable tests.
MVP. macOS is the primary target — Accessibility tree first, screenshots as fallback.
Windows is in beta with a day-1 input/screenshot/apps surface (Win32 SendInput, GDI BitBlt/PrintWindow, EnumWindows). UIA-backed tree/observe/find/click-by-selector and WinRT OCR are pending — they throw a clear uia_pending / ocr_pending error today. Track progress under the windows label.
Linux is in beta with the same shape: shell-out to xdotool/wmctrl/scrot on X11 and ydotool/grim on Wayland for input + screenshot, /proc + wmctrl for app enumeration. AT-SPI2-backed tree/observe/find and tesseract OCR throw atspi_pending / ocr_pending until those bindings land. Track under linux.
Agents shouldn't drive desktop apps by guessing pixels. guiport exposes the desktop as structured data: app/window list, focused app/window, accessibility tree, element role/name/value/state/bounds/actions, screenshots only when needed, deterministic replay scripts after exploration.
macOS 13+. See INSTALL.md for full options + platform status.
# Homebrew (once tap is published)
brew tap edihasaj/guiport && brew install guiport
# Or install script
curl -fsSL https://raw.githubusercontent.com/edihasaj/guiport/main/scripts/install.sh | sh
# Or from source
swift build -c release && sudo cp .build/release/guiport /usr/local/bin/guiportWindows (beta — input/screenshot/apps; UIA tree pending):
iwr -useb https://raw.githubusercontent.com/edihasaj/guiport/main/scripts/install.ps1 | iexLinux (beta — same shape; AT-SPI2 tree pending). Install xdotool+wmctrl+scrot (X11) or ydotool+grim (Wayland), then:
curl -fsSL https://raw.githubusercontent.com/edihasaj/guiport/main/scripts/install.sh | shSee INSTALL.md for full per-platform notes.
guiport doctor # check permissions
guiport apps --json # list running apps with windows
guiport observe --app "Safari" # focused window summary
guiport tree --app "Safari" --json # full accessibility tree
guiport find --app "Safari" 'button[name="Save"]' # match selector
guiport click --app "Safari" 'button[name="Save"]'
guiport type "hello"
guiport screenshot --app "Safari" -o safari.png
# Vision fallback for canvas / sparse-AX apps:
guiport find-text --app "Figma" "Save" # OCR via Apple Vision
guiport click-text --app "Figma" "Save" # OCR + click center
guiport click-at 420 180 # raw coordinates
guiport record smoke.yaml # WIP
guiport run smoke.yaml
guiport serve --mcp # MCP server over stdiorole[attr=value][attr~=substring][index]
Examples:
button[name="Save"]
textfield[identifier="search"]
AXButton[name~="Open"][index=0]
Supported attributes: role, name (title), value, identifier, description, text (matches name or value), index.
For apps with sparse or absent accessibility (Figma, custom-rendered editors, hardened Electron), guiport falls back through three layers:
click-at X Y— raw screen coordinates. The agent reads coords off a screenshot.find-text "Save"/click-text "Save"— Apple Vision (VNRecognizeTextRequest) OCRs the window and returns bounds + center for matched text. On-device, free, no extra deps.- LLM vision — out of scope for MVP; agents can call
screenshot+ their own model to get coords, thenclick-at.
OCR-found bounds drift across font/scale changes, so prefer AX selectors for replay and OCR for exploration.
guiport needs:
- Accessibility — required for AX tree + input events.
- Screen Recording — required for
screenshotand screenshot-on-failure artifacts.
Run guiport doctor to check status and get System Settings deep links.
- Pure Swift, single binary.
GuiportCorelibrary: AX bridge, selector engine, input, screenshots, replay runner, MCP server.guiportCLI: thin wrapper using swift-argument-parser.
- No Windows/Linux yet.
- No vision-first automation.
- No autonomous Manus clone.
- No background/session-0 automation.
MIT — see LICENSE.