Skip to content

155143783/desktop-control-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Desktop Control Server

npm version License: MIT Node.js Version

Remote desktop control HTTP API for AI agents - mouse, keyboard, screenshot, Chinese input, and human-in-the-loop

🌟 Features

Feature Description
Mouse Control Move, click, drag, scroll with smooth interpolation
Keyboard Control Type, press keys, hotkeys, key hold/release
Chinese Input Native Chinese text input via clipboard paste
Screenshot Full screen and region capture in PNG/base64
Window Management List windows, activate by title (Windows)
Human-in-the-Loop Pause/resume task execution for CAPTCHA handling
Cross-Platform Windows, macOS, Linux support
SSH Tunnel Ready Secure remote access via SSH port forwarding

πŸ“Š Comparison with Similar Projects

Feature desktop-control-server Pan-Agent Auto-Use-MCP mcp-auto-control
Human-in-the-Loop βœ… ❌ ❌ ❌
Chinese Input βœ… ❌ ❌ ❌
One-Click Install βœ… ❌ ❌ ❌
Coze Native Integration βœ… ❌ ❌ ❌
HTTP API βœ… βœ… βœ… βœ…
Screenshot βœ… βœ… βœ… βœ…
Cross-Platform βœ… ❌ βœ… βœ…

Our Differentiators

  1. Human-in-the-Loop (CAPTCHA Handling): Automatically pause when CAPTCHA is detected, resume after human intervention
  2. Native Chinese Input: No additional configuration needed - just call /clipboard/paste with Chinese text
  3. One-Click Install: Zero-configuration installation with install.bat for Windows
  4. Coze Native Integration: Specially designed for Coze Agent workflow with Coze Skill

πŸš€ Quick Start

One-Click Install (Windows)

# 1. Download and run install.bat (as Administrator)
# 2. Double-click start-server.bat to start the service

# Default endpoint: http://127.0.0.1:9100
# Default API Key: zaizai2026

Manual Install

# Clone the repository
git clone https://github.com/155143783/desktop-control-server.git
cd desktop-control-server

# Install dependencies (requires Node.js 18+)
npm install

# Start server
npm start

πŸ“š API Documentation

Authentication

All protected endpoints require X-API-Key header or ?api_key=your-key query parameter.

Default API Key: zaizai2026

Public Endpoints (No Auth Required)

Endpoint Method Description
/health GET Health check - returns server status
/task/pause POST Pause task execution
/task/resume POST Resume task execution
/task/status GET Get pause/resume status

Screenshot Endpoints

Endpoint Method Parameters Description
/screenshot GET format=png|base64, region=x,y,w,h Capture screen
/screenshot/region POST {x, y, width, height} Capture region
/screen/size GET - Get screen resolution

Mouse Control

Endpoint Method Parameters Description
/mouse/position GET - Get current position
/mouse/move POST {x, y, duration?} Move mouse
/mouse/click POST {x?, y?, button?, clicks?} Click
/mouse/drag POST {startX, startY, endX, endY, duration?, button?} Drag
/mouse/scroll POST {x?, y?, scrollAmount} Scroll

Keyboard Control

Endpoint Method Parameters Description
/keyboard/type POST {text} Type English text
/keyboard/press POST {key} Press single key
/keyboard/hotkey POST {keys: ["ctrl", "c"]} Press hotkey
/keyboard/keyDown POST {key} Hold key down
/keyboard/keyUp POST {key} Release key

Clipboard (Chinese Input)

Endpoint Method Parameters Description
/clipboard/paste POST {text} Paste text (supports Chinese)
/clipboard/copy POST {text} Copy to clipboard
/clipboard/get GET - Get clipboard content

Window Management (Windows Only)

Endpoint Method Parameters Description
/window/list GET - List all windows
/window/activate POST {title} Activate window by title

Shell Execution

Endpoint Method Parameters Description
/exec POST {command, timeout?} Execute shell command

πŸ’‘ Usage Examples

Health Check

curl http://127.0.0.1:9100/health

Capture Screenshot

# Save as PNG file
curl -H "X-API-Key: zaizai2026" http://127.0.0.1:9100/screenshot -o screen.png

# Get as base64
curl -H "X-API-Key: zaizai2026" "http://127.0.0.1:9100/screenshot?format=base64"

Mouse Operations

# Move mouse
curl -X POST -H "X-API-Key: zaizai2026" -H "Content-Type: application/json" \
  -d '{"x": 500, "y": 300}' \
  http://127.0.0.1:9100/mouse/move

# Smooth move with duration
curl -X POST -H "X-API-Key: zaizai2026" -H "Content-Type: application/json" \
  -d '{"x": 500, "y": 300, "duration": 500}' \
  http://127.0.0.1:9100/mouse/move

# Click
curl -X POST -H "X-API-Key: zaizai2026" -H "Content-Type: application/json" \
  -d '{"x": 500, "y": 300, "button": "left", "clicks": 2}' \
  http://127.0.0.1:9100/mouse/click

# Drag (for CAPTCHA slider)
curl -X POST -H "X-API-Key: zaizai2026" -H "Content-Type: application/json" \
  -d '{"startX": 300, "startY": 400, "endX": 600, "endY": 400, "duration": 0.5}' \
  http://127.0.0.1:9100/mouse/drag

Keyboard & Chinese Input

# Type Chinese text (via clipboard paste)
curl -X POST -H "X-API-Key: zaizai2026" -H "Content-Type: application/json" \
  -d '{"text": "δ½ ε₯½δΈ–η•Œ"}' \
  http://127.0.0.1:9100/clipboard/paste

# Press hotkey
curl -X POST -H "X-API-Key: zaizai2026" -H "Content-Type: application/json" \
  -d '{"keys": ["control", "c"]}' \
  http://127.0.0.1:9100/keyboard/hotkey

# Type English
curl -X POST -H "X-API-Key: zaizai2026" -H "Content-Type: application/json" \
  -d '{"text": "hello"}' \
  http://127.0.0.1:9100/keyboard/type

Human-in-the-Loop (CAPTCHA Handling)

# When AI detects CAPTCHA, pause execution
curl -X POST http://127.0.0.1:9100/task/pause

# After user completes CAPTCHA, resume
curl -X POST http://127.0.0.1:9100/task/resume

# Check status
curl http://127.0.0.1:9100/task/status

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     AI Agent (Coze Agent)                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
                                β”‚ HTTP/REST
                                β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        SSH Tunnel                                β”‚
β”‚  ssh -L 9100:127.0.0.1:9100 user@remote-server                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
                                β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  Desktop Control Server                          β”‚
β”‚              (Node.js + Express + robotjs)                      β”‚
β”‚                                                                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚
β”‚  β”‚ Screenshotβ”‚  β”‚  Mouse   β”‚  β”‚ Keyboard β”‚  β”‚ Window   β”‚       β”‚
β”‚  β”‚  Module   β”‚  β”‚  Module  β”‚  β”‚  Module  β”‚  β”‚  Module  β”‚       β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚
β”‚                                                                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚
β”‚  β”‚              Task Control (Human-in-the-Loop)        β”‚      β”‚
β”‚  β”‚              Pause/Resume + Screenshot Notify         β”‚      β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
                                β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Operating System                               β”‚
β”‚              Windows / macOS / Linux                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”’ Security

  • Server only listens on 127.0.0.1 (localhost) for security
  • All sensitive operations require API key authentication
  • Use SSH tunnel for remote access (see examples/ssh-tunnel.md)

πŸ“¦ Dependencies

  • express: HTTP server framework
  • @jitsi/robotjs: Cross-platform desktop automation
  • sharp: Image processing for screenshots

🀝 Contributing

Contributions are welcome! Please read CONTRIBUTING.md for details.

πŸ“„ License

MIT License - see LICENSE for details.

πŸ”— Related Links

About

Remote desktop control HTTP API for AI agents - mouse, keyboard, screenshot, Chinese input, human-in-the-loop

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors