You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Remote desktop control HTTP API for AI agents - mouse, keyboard, screenshot, Chinese input, and human-in-the-loop
π Features
Feature
Description
Mouse Control
Move, click, drag, scroll with smooth interpolation
Keyboard Control
Type, press keys, hotkeys, key hold/release
Chinese Input
Native Chinese text input via clipboard paste
Screenshot
Full screen and region capture in PNG/base64
Window Management
List windows, activate by title (Windows)
Human-in-the-Loop
Pause/resume task execution for CAPTCHA handling
Cross-Platform
Windows, macOS, Linux support
SSH Tunnel Ready
Secure remote access via SSH port forwarding
π Comparison with Similar Projects
Feature
desktop-control-server
Pan-Agent
Auto-Use-MCP
mcp-auto-control
Human-in-the-Loop
β
β
β
β
Chinese Input
β
β
β
β
One-Click Install
β
β
β
β
Coze Native Integration
β
β
β
β
HTTP API
β
β
β
β
Screenshot
β
β
β
β
Cross-Platform
β
β
β
β
Our Differentiators
Human-in-the-Loop (CAPTCHA Handling): Automatically pause when CAPTCHA is detected, resume after human intervention
Native Chinese Input: No additional configuration needed - just call /clipboard/paste with Chinese text
One-Click Install: Zero-configuration installation with install.bat for Windows
Coze Native Integration: Specially designed for Coze Agent workflow with Coze Skill
π Quick Start
One-Click Install (Windows)
# 1. Download and run install.bat (as Administrator)# 2. Double-click start-server.bat to start the service# Default endpoint: http://127.0.0.1:9100# Default API Key: zaizai2026
Manual Install
# Clone the repository
git clone https://github.com/155143783/desktop-control-server.git
cd desktop-control-server
# Install dependencies (requires Node.js 18+)
npm install
# Start server
npm start
π API Documentation
Authentication
All protected endpoints require X-API-Key header or ?api_key=your-key query parameter.
Default API Key: zaizai2026
Public Endpoints (No Auth Required)
Endpoint
Method
Description
/health
GET
Health check - returns server status
/task/pause
POST
Pause task execution
/task/resume
POST
Resume task execution
/task/status
GET
Get pause/resume status
Screenshot Endpoints
Endpoint
Method
Parameters
Description
/screenshot
GET
format=png|base64, region=x,y,w,h
Capture screen
/screenshot/region
POST
{x, y, width, height}
Capture region
/screen/size
GET
-
Get screen resolution
Mouse Control
Endpoint
Method
Parameters
Description
/mouse/position
GET
-
Get current position
/mouse/move
POST
{x, y, duration?}
Move mouse
/mouse/click
POST
{x?, y?, button?, clicks?}
Click
/mouse/drag
POST
{startX, startY, endX, endY, duration?, button?}
Drag
/mouse/scroll
POST
{x?, y?, scrollAmount}
Scroll
Keyboard Control
Endpoint
Method
Parameters
Description
/keyboard/type
POST
{text}
Type English text
/keyboard/press
POST
{key}
Press single key
/keyboard/hotkey
POST
{keys: ["ctrl", "c"]}
Press hotkey
/keyboard/keyDown
POST
{key}
Hold key down
/keyboard/keyUp
POST
{key}
Release key
Clipboard (Chinese Input)
Endpoint
Method
Parameters
Description
/clipboard/paste
POST
{text}
Paste text (supports Chinese)
/clipboard/copy
POST
{text}
Copy to clipboard
/clipboard/get
GET
-
Get clipboard content
Window Management (Windows Only)
Endpoint
Method
Parameters
Description
/window/list
GET
-
List all windows
/window/activate
POST
{title}
Activate window by title
Shell Execution
Endpoint
Method
Parameters
Description
/exec
POST
{command, timeout?}
Execute shell command
π‘ Usage Examples
Health Check
curl http://127.0.0.1:9100/health
Capture Screenshot
# Save as PNG file
curl -H "X-API-Key: zaizai2026" http://127.0.0.1:9100/screenshot -o screen.png
# Get as base64
curl -H "X-API-Key: zaizai2026""http://127.0.0.1:9100/screenshot?format=base64"
# Type Chinese text (via clipboard paste)
curl -X POST -H "X-API-Key: zaizai2026" -H "Content-Type: application/json" \
-d '{"text": "δ½ ε₯½δΈη"}' \
http://127.0.0.1:9100/clipboard/paste
# Press hotkey
curl -X POST -H "X-API-Key: zaizai2026" -H "Content-Type: application/json" \
-d '{"keys": ["control", "c"]}' \
http://127.0.0.1:9100/keyboard/hotkey
# Type English
curl -X POST -H "X-API-Key: zaizai2026" -H "Content-Type: application/json" \
-d '{"text": "hello"}' \
http://127.0.0.1:9100/keyboard/type
Human-in-the-Loop (CAPTCHA Handling)
# When AI detects CAPTCHA, pause execution
curl -X POST http://127.0.0.1:9100/task/pause
# After user completes CAPTCHA, resume
curl -X POST http://127.0.0.1:9100/task/resume
# Check status
curl http://127.0.0.1:9100/task/status