Open Interface
- Runs a screenshot-driven desktop agent loop powered by GPT-5, GPT-4o, GPT-4V, Gemini, Claude, Qwen, and compatible OpenAI-style endpoints.
- Asks the model for at most one next UI action at a time, then executes it with local keyboard and mouse control.
- Re-observes the screen after each step, optionally verifies visual change locally, and course-corrects until the task is done or safely stopped.
"Solve Today's Wordle"

clipped, 2x
MacOS
- Download the MacOS binary from the latest release.
- Unzip the file and move Open Interface to the Applications Folder.
Apple Silicon M-Series Macs
Intel Macs
-
Launch the app from the Applications folder.
You might face the standard Mac "Open Interface cannot be opened" error.

In that case, press "Cancel".
Then go to System Preferences -> Security and Privacy -> Open Anyway.
ย
ย
-
Open Interface will also need Accessibility access to operate your keyboard and mouse for you, and Screen Recording access to take screenshots to assess its progress.

- Lastly, checkout the Setup section to connect Open Interface to your preferred LLM provider.
Run as a Script
- Clone the repo
git clone https://github.com/AmberSahdev/Open-Interface.git - Enter the directory
cd Open-Interface - Optionally use a Python virtual environment
- Note: pyenv handles tkinter installation weirdly so you may have to debug for your own system yourself.
pyenv local 3.12.2python -m venv .venvsource .venv/bin/activate
- Install dependencies
pip install -r requirements.txt - Run the app using
python app/app.py
Set up the OpenAI API key
-
Get your OpenAI API key
- Open Interface can use OpenAI-compatible models including GPT-4o, GPT-4V, GPT-5, and
computer-use-previewdepending on your configuration. OpenAI keys can be downloaded from your OpenAI account at platform.openai.com/settings/organization/api-keys. - Follow the steps here to add balance to your OpenAI account. Some higher-tier models may require prepaid billing or additional account access.
- More info
- Open Interface can use OpenAI-compatible models including GPT-4o, GPT-4V, GPT-5, and
-
Save the API key in Open Interface settings
- In Open Interface, go to the Settings menu on the top right and enter the key you received from OpenAI into the text field like so:
- In Open Interface, go to the Settings menu on the top right and enter the key you received from OpenAI into the text field like so:
-
After setting the API key for the first time you'll need to restart the app.
Set up the Google Gemini API key
- Go to Settings -> Advanced Settings and select the Gemini model you wish to use.
- Get your Google Gemini API key from https://aistudio.google.com/app/apikey.
- Save the API key in Open Interface settings.
- Save the settings and restart the app.
Optional: Setup a Custom LLM
- Open Interface supports using other OpenAI API style LLMs (such as Llava) as a backend and can be configured easily in the Advanced Settings window.
- Enter the custom base url and model name in the Advanced Settings window and the API key in the Settings window as needed.
- NOTE - If you're using Llama:
- You may need to enter a random string like "xxx" in the API key input box.
- You may need to append /v1/ to the base URL.
- If your LLM does not support an OpenAI style API, you can use a library like this to convert it to one.
- You will need to restart the app after these changes.
Open Interface now uses a structured request pipeline and Prompt System v1. The current app is still a strict single-step visual agent loop, but the prompt, history, and verification layers are more explicit and consistent than the earlier context.txt + request_data JSON approach.
- The runtime is a single-step closed loop, not a multi-step batch planner.
- Each model round may return multiple steps, but the runtime executes only the first executable one.
Corecreates a structuredrequest_contextfor every request and persists messages plus execution logs throughSessionStore.- Most providers now share one prompt semantics source; provider adapters differ mainly in API message formatting.
computer-use-previewis still a separate tool-driven path rather than the standard JSON step-output flow.
- The UI sends the user's natural-language goal into a queue.
Appforwards it toCore.execute_user_request(...)on a worker thread.Corestops any previous request, snapshots the active session history, stores the new user message, and buildsrequest_context.LLMand the selected provider capture the latest screenshot and build a unified prompt package.- The model returns JSON with
stepsanddone; the runtime keeps at most one step. Interpreterexecutes the step locally and writes an execution log.- If local verification is enabled,
StepVerifiercompares before/after screenshots and feeds the result back into the next round. - The loop repeats until the model returns
done, the request is interrupted, or the runtime stops after repeated failure.
Prompt assembly now lives under app/prompting/ and is built through app/prompting/builder.py.
- Stable prompt layers:
PromptSystemContextand registry-generatedPromptToolSchema - Dynamic prompt layers:
PromptTaskContext,PromptExecutionTimeline,PromptRecentDetails,PromptVisualContext, andPromptOutputContract context.txtnow stores stable rules only; dynamic runtime state is assembled fromrequest_context- Tool definitions come from
ToolRegistry, so models see an explicit allowlist of tool names, parameters, and usage rules - Coordinate actions use the same
0-100ruler values shown on the screenshot grid; the runtime converts them locally to pixels
session_history_snapshotcaptures narrative session history at request startstep_historyrecords authoritative per-request execution progress and verification resultsagent_memorykeeps compact loop memory such as recent failures, recent actions, and unreliable anchors- Local step verification can be toggled with
runtime.disable_local_step_verification - When local verification is disabled, successful steps are still re-observed and recorded as
verification_status = skipped - Prompt text dumps can be enabled with
advanced.save_prompt_text_dumps, which writes final prompt text topromptdump/
- Accurate spatial-reasoning and hence clicking buttons.
- Keeping track of itself in tabular contexts, like Excel and Google Sheets, for similar reasons as stated above.
- Navigating complex GUI-rich applications like Counter-Strike, Spotify, Garage Band, etc due to heavy reliance on cursor actions.
(with better models trained on video walkthroughs like Youtube tutorials)
- "Create a couple of bass samples for me in Garage Band for my latest project."
- "Read this design document for a new feature, edit the code on Github, and submit it for review."
- "Find my friends' music taste from Spotify and create a party playlist for tonight's event."
- "Take the pictures from my Tahoe trip and make a White Lotus type montage in iMovie."
- Cost Estimation: $0.0005 - $0.002 per LLM request depending on the model used.
(User requests can require between two to a few dozen LLM backend calls depending on the request's complexity.) - You can interrupt the app anytime by pressing the Stop button, or by dragging your cursor to any of the screen corners.
- Open Interface can only see your primary display when using multiple monitors. Therefore, if the cursor/focus is on a secondary screen, it might keep retrying the same actions as it is unable to see its progress.
- Most providers now share one prompt contract, but
computer-use-previewstill follows a separate real-tool execution path. - Prompt text dumps are available for debugging through
advanced.save_prompt_text_dumps; they exclude API credentials and image binaries.
+-------------------------------------------------------------------+
| App / UI |
| |
| user goal -> Core -> SessionStore |
| | |
| v |
| request_context |
| | |
| v |
| LLM / Provider Adapter |
| | |
| v |
| PromptBuilder + ToolRegistry + Screenshot |
| | |
| v |
| Model returns JSON { steps, done } |
| | |
| v |
| Interpreter executes one step |
| | |
| v |
| StepVerifier observes before/after screen change |
| | |
| +-----> step_history / agent_memory ----+ |
| | |
| <---------------- repeat until done / stop / failure ---+ |
+-------------------------------------------------------------------+
- Check out more of my projects at AmberSah.dev.
- Other demos and press kit can be found at MEDIA.md.



