GitHub - edihasaj/guiport: Playwright for desktop apps, built for coding agents. macOS first; Windows/Linux on the roadmap.

Playwright for desktop apps, built for coding agents.

macOS shipped · Windows beta · Linux beta.

A fast CLI/MCP control layer that lets agents like Claude, Codex, opencode, and Gemini inspect and operate desktop apps through structured UI data, then save successful flows as replayable tests.

Status

MVP. macOS is the primary target — Accessibility tree first, screenshots as fallback.

Windows is in beta with a day-1 input/screenshot/apps surface (Win32 SendInput, GDI BitBlt/PrintWindow, EnumWindows). UIA-backed tree/observe/find/click-by-selector and WinRT OCR are pending — they throw a clear uia_pending / ocr_pending error today. Track progress under the windows label.

Linux is in beta with the same shape: shell-out to xdotool/wmctrl/scrot on X11 and ydotool/grim on Wayland for input + screenshot, /proc + wmctrl for app enumeration. AT-SPI2-backed tree/observe/find and tesseract OCR throw atspi_pending / ocr_pending until those bindings land. Track under linux.

Why

Agents shouldn't drive desktop apps by guessing pixels. guiport exposes the desktop as structured data: app/window list, focused app/window, accessibility tree, element role/name/value/state/bounds/actions, screenshots only when needed, deterministic replay scripts after exploration.

Install

macOS 13+. See INSTALL.md for full options + platform status.

# Homebrew (once tap is published)
brew tap edihasaj/guiport && brew install guiport

# Or install script
curl -fsSL https://raw.githubusercontent.com/edihasaj/guiport/main/scripts/install.sh | sh

# Or from source
swift build -c release && sudo cp .build/release/guiport /usr/local/bin/guiport

Windows (beta — input/screenshot/apps; UIA tree pending):

iwr -useb https://raw.githubusercontent.com/edihasaj/guiport/main/scripts/install.ps1 | iex

Linux (beta — same shape; AT-SPI2 tree pending). Install xdotool+wmctrl+scrot (X11) or ydotool+grim (Wayland), then:

curl -fsSL https://raw.githubusercontent.com/edihasaj/guiport/main/scripts/install.sh | sh

See INSTALL.md for full per-platform notes.

Quick start

guiport doctor                                       # check permissions
guiport apps --json                                  # list running apps with windows
guiport observe --app "Safari"                       # focused window summary
guiport tree --app "Safari" --json                   # full accessibility tree
guiport find --app "Safari" 'button[name="Save"]'    # match selector
guiport click --app "Safari" 'button[name="Save"]'
guiport type "hello"
guiport screenshot --app "Safari" -o safari.png

# Vision fallback for canvas / sparse-AX apps:
guiport find-text --app "Figma" "Save"               # OCR via Apple Vision
guiport click-text --app "Figma" "Save"              # OCR + click center
guiport click-at 420 180                             # raw coordinates
guiport record smoke.yaml                            # WIP
guiport run smoke.yaml
guiport serve --mcp                                  # MCP server over stdio

Selector syntax

role[attr=value][attr~=substring][index]

Examples:

button[name="Save"]
textfield[identifier="search"]
AXButton[name~="Open"][index=0]

Supported attributes: role, name (title), value, identifier, description, text (matches name or value), index.

Vision fallback (canvas / Electron apps)

For apps with sparse or absent accessibility (Figma, custom-rendered editors, hardened Electron), guiport falls back through three layers:

click-at X Y — raw screen coordinates. The agent reads coords off a screenshot.
find-text "Save" / click-text "Save" — Apple Vision (VNRecognizeTextRequest) OCRs the window and returns bounds + center for matched text. On-device, free, no extra deps.
LLM vision — out of scope for MVP; agents can call screenshot + their own model to get coords, then click-at.

OCR-found bounds drift across font/scale changes, so prefer AX selectors for replay and OCR for exploration.

Permissions

guiport needs:

Accessibility — required for AX tree + input events.
Screen Recording — required for screenshot and screenshot-on-failure artifacts.

Run guiport doctor to check status and get System Settings deep links.

Architecture

Pure Swift, single binary.
GuiportCore library: AX bridge, selector engine, input, screenshots, replay runner, MCP server.
guiport CLI: thin wrapper using swift-argument-parser.

Non-goals (MVP)

No Windows/Linux yet.
No vision-first automation.
No autonomous Manus clone.
No background/session-0 automation.

License

MIT — see LICENSE.

Author

Edi Hasaj

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github		.github
Formula		Formula
Resources		Resources
Sources		Sources
Tests/GuiportCoreTests		Tests/GuiportCoreTests
assets		assets
docs		docs
examples		examples
scripts		scripts
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
Package.swift		Package.swift
README.md		README.md
SECURITY.md		SECURITY.md
goal.md		goal.md
initial_idea.md		initial_idea.md
wrangler.toml		wrangler.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Status

Why

Install

Quick start

Selector syntax

Vision fallback (canvas / Electron apps)

Permissions

Architecture

Non-goals (MVP)

License

Author

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Status

Why

Install

Quick start

Selector syntax

Vision fallback (canvas / Electron apps)

Permissions

Architecture

Non-goals (MVP)

License

Author

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages