Skip to content

Replace Playwright with Kernel native API in OpenAI CUA templates#124

Open
rgarcia wants to merge 1 commit intomainfrom
rgarcia/cua-native-kernel-api
Open

Replace Playwright with Kernel native API in OpenAI CUA templates#124
rgarcia wants to merge 1 commit intomainfrom
rgarcia/cua-native-kernel-api

Conversation

@rgarcia
Copy link
Contributor

@rgarcia rgarcia commented Feb 25, 2026

Summary

  • Replace Playwright (over CDP) with Kernel's native computer control API in both TypeScript and Python OpenAI CUA templates
  • Add batch_computer_actions function tool that executes multiple browser actions in a single API call, reducing latency
  • Add local test scripts (test.local.ts / test_local.py) that create remote Kernel browsers for testing without deploying a Kernel app

Details

New KernelComputer class (TS + Python) wraps the Kernel SDK for all computer actions:

  • captureScreenshot, clickMouse, typeText, pressKey, scroll, moveMouse, dragMouse
  • batch endpoint for batched actions
  • playwright.execute for navigation (goto, back, forward, getCurrentUrl)
  • CUA key name to X11 keysym translation map (ported from Go reference implementation)
  • Button normalization (CUA model sends numeric button values 1/2/3 in batch calls)

Batch tool: System instructions guide the model to prefer batch_computer_actions for predictable sequences (e.g., click + type + enter).

Removed dependencies: playwright-core, sharp (TS), playwright (Python). Bumped @onkernel/sdk to ^0.38.0 and kernel to >=0.38.0.

Test plan

  • TypeScript test.local.ts E2E: created remote Kernel browser, ran CUA agent (eBay search task), batch tool used successfully, browser cleaned up
  • Python test_local.py E2E: same test, batch tool used on first action (type + enter), agent completed successfully
  • TypeScript compiles cleanly (tsc --noEmit)

Made with Cursor


Note

Medium Risk
Replaces the core browser-automation layer and action execution path in both templates, which can change interaction timing/behavior and screenshot handling. Risk is mitigated by being template/sample code and by keeping URL blocklist/safety-check handling in place.

Overview
Updates both the Python and TypeScript OpenAI CUA templates to stop using Playwright-over-CDP and instead drive browsers via Kernel’s native computer control endpoints (screenshot/click/type/scroll/batch), implemented through new KernelComputer wrappers.

Introduces a new batch_computer_actions function tool plus model instructions to prefer batching predictable action sequences, reducing round trips; agents are updated to execute batched actions and return a single post-batch screenshot.

Cleans up template dependencies and tooling (removes Playwright/Sharp/Pillow usage, bumps Kernel SDK versions, adds KERNEL_API_KEY env var), and adds local test scripts (test_local.py, test.local.ts) for running against a remote Kernel browser without deploying.

Written by Cursor Bugbot for commit 415546a. This will update automatically on new commits. Configure here.

Both TypeScript and Python OpenAI CUA templates now use Kernel's native
computer control API (screenshot, click, type, scroll, batch, etc.)
instead of Playwright over CDP. This enables the batch_computer_actions
tool which executes multiple actions in a single API call for lower
latency.

Key changes:
- New KernelComputer class wrapping Kernel SDK for all computer actions
- Added batch_computer_actions function tool with system instructions
- Navigation (goto/back/forward) via Kernel's playwright.execute endpoint
- Local test scripts create remote Kernel browsers without app deployment
- Removed playwright-core, sharp (TS) and playwright (Python) dependencies
- Bumped @onkernel/sdk to ^0.38.0 and kernel to >=0.38.0

Made-with: Cursor
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is ON. A Cloud Agent has been kicked off to fix the reported issues.

}

const currentUrl = await this.computer.getCurrentUrl();
utils.checkBlocklistedUrl(currentUrl);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypeScript URL blocklist check return value silently ignored

Medium Severity

checkBlocklistedUrl returns a boolean, but agent.ts discards the return value, making the URL blocklist entirely non-functional. The Python counterpart correctly raises a ValueError to halt execution. Previously, Playwright's route-level route.abort() handler provided actual network-level blocking, but that was removed in this PR, leaving no working URL blocking in the TypeScript template.

Additional Locations (1)

Fix in Cursor Fix in Web

from .kernel_computer import KernelComputer

computers_config = {
"local-playwright": LocalPlaywrightBrowser,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated config.py is now dead code

Low Severity

config.py was updated in this PR to reference KernelComputer, but computers/__init__.py no longer imports or exports computers_config. No other file references it either, making this entire file dead code.

Fix in Cursor Fix in Web

return "left"
if isinstance(button, int):
return {1: "left", 2: "middle", 3: "right"}.get(button, "left")
return str(button)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing handling for special click button values

Medium Severity

The CUA model can send click actions with button set to "back", "forward", or "wheel". The deleted Playwright code explicitly handled these by routing to self.back(), self.forward(), or mouse.wheel(). The new _normalize_button/normalizeButton functions pass these strings through unchanged to the Kernel click_mouse API, which only accepts "left", "right", or "middle" — causing an API error when the model uses these button types.

Additional Locations (1)

Fix in Cursor Fix in Web

@cursor
Copy link

cursor bot commented Feb 25, 2026

Bugbot Autofix prepared fixes for 3 of the 3 bugs found in the latest run.

  • ✅ Fixed: TypeScript URL blocklist check return value silently ignored
    • Changed checkBlocklistedUrl from returning a boolean to throwing an Error when a blocked URL is detected, matching the Python counterpart's ValueError behavior.
  • ✅ Fixed: Updated config.py is now dead code
    • Deleted the dead config.py file since computers_config is not imported or used anywhere in the codebase.
  • ✅ Fixed: Missing handling for special click button values
    • Added handling for 'back', 'forward', and 'wheel' button values in both Python and TypeScript KernelComputer.click() methods (routing to back/forward/scroll) and in batch translation functions (using Alt+Left/Right keypresses and scroll actions).

Create PR

Or push these changes by commenting:

@cursor push a9e2870223
Preview (a9e2870223)
diff --git a/pkg/templates/python/openai-computer-use/computers/config.py b/pkg/templates/python/openai-computer-use/computers/config.py
deleted file mode 100644
--- a/pkg/templates/python/openai-computer-use/computers/config.py
+++ /dev/null
@@ -1,5 +1,0 @@
-from .kernel_computer import KernelComputer
-
-computers_config = {
-    "kernel": KernelComputer,
-}
\ No newline at end of file

diff --git a/pkg/templates/python/openai-computer-use/computers/kernel_computer.py b/pkg/templates/python/openai-computer-use/computers/kernel_computer.py
--- a/pkg/templates/python/openai-computer-use/computers/kernel_computer.py
+++ b/pkg/templates/python/openai-computer-use/computers/kernel_computer.py
@@ -74,12 +74,22 @@
 def _translate_cua_action(action: Dict[str, Any]) -> Dict[str, Any]:
     action_type = action.get("type", "")
     if action_type == "click":
+        button = action.get("button")
+        if button == "back":
+            return {"type": "press_key", "press_key": {"keys": ["Alt_L", "Left"]}}
+        if button == "forward":
+            return {"type": "press_key", "press_key": {"keys": ["Alt_L", "Right"]}}
+        if button == "wheel":
+            return {
+                "type": "scroll",
+                "scroll": {"x": action.get("x", 0), "y": action.get("y", 0), "delta_x": 0, "delta_y": 0},
+            }
         return {
             "type": "click_mouse",
             "click_mouse": {
                 "x": action.get("x", 0),
                 "y": action.get("y", 0),
-                "button": _normalize_button(action.get("button")),
+                "button": _normalize_button(button),
             },
         }
     elif action_type == "double_click":
@@ -134,6 +144,15 @@
         return base64.b64encode(resp.read()).decode("utf-8")
 
     def click(self, x: int, y: int, button="left") -> None:
+        if button == "back":
+            self.back()
+            return
+        if button == "forward":
+            self.forward()
+            return
+        if button == "wheel":
+            self.scroll(x, y, 0, 0)
+            return
         self.client.browsers.computer.click_mouse(self.session_id, x=x, y=y, button=_normalize_button(button))
 
     def double_click(self, x: int, y: int) -> None:

diff --git a/pkg/templates/typescript/openai-computer-use/lib/kernel-computer.ts b/pkg/templates/typescript/openai-computer-use/lib/kernel-computer.ts
--- a/pkg/templates/typescript/openai-computer-use/lib/kernel-computer.ts
+++ b/pkg/templates/typescript/openai-computer-use/lib/kernel-computer.ts
@@ -105,11 +105,18 @@
 
 function translateCuaAction(action: CuaAction): BatchAction {
   switch (action.type) {
-    case 'click':
+    case 'click': {
+      if (action.button === 'back')
+        return { type: 'press_key', press_key: { keys: ['Alt_L', 'Left'] } };
+      if (action.button === 'forward')
+        return { type: 'press_key', press_key: { keys: ['Alt_L', 'Right'] } };
+      if (action.button === 'wheel')
+        return { type: 'scroll', scroll: { x: action.x ?? 0, y: action.y ?? 0, delta_x: 0, delta_y: 0 } };
       return {
         type: 'click_mouse',
         click_mouse: { x: action.x ?? 0, y: action.y ?? 0, button: normalizeButton(action.button) },
       };
+    }
     case 'double_click':
       return {
         type: 'click_mouse',
@@ -168,6 +175,9 @@
   }
 
   async click(x: number, y: number, button: string | number = 'left'): Promise<void> {
+    if (button === 'back') { await this.back(); return; }
+    if (button === 'forward') { await this.forward(); return; }
+    if (button === 'wheel') { await this.scroll(x, y, 0, 0); return; }
     await this.client.browsers.computer.clickMouse(this.sessionId, {
       x,
       y,

diff --git a/pkg/templates/typescript/openai-computer-use/lib/utils.ts b/pkg/templates/typescript/openai-computer-use/lib/utils.ts
--- a/pkg/templates/typescript/openai-computer-use/lib/utils.ts
+++ b/pkg/templates/typescript/openai-computer-use/lib/utils.ts
@@ -40,12 +40,14 @@
   }
 }
 
-export function checkBlocklistedUrl(url: string): boolean {
+export function checkBlocklistedUrl(url: string): void {
   try {
     const host = new URL(url).hostname;
-    return BLOCKED_DOMAINS.some((d) => host === d || host.endsWith(`.${d}`));
-  } catch {
-    return false;
+    if (BLOCKED_DOMAINS.some((d) => host === d || host.endsWith(`.${d}`))) {
+      throw new Error(`Blocked URL: ${url}`);
+    }
+  } catch (e) {
+    if (e instanceof Error && e.message.startsWith('Blocked URL:')) throw e;
   }
 }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant