Skip to content

Commit cd8bdfe

Browse files
feat(sparsekernel): expand brokered browser actions
1 parent 5346c2f commit cd8bdfe

5 files changed

Lines changed: 592 additions & 34 deletions

File tree

docs/architecture/browser-broker.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ V0 also has a concrete local-browser attachment point: callers may register and
1515

1616
The `@openclaw/sparsekernel-browser-broker` adapter materializes the ledger lease into a real CDP browser context with `Target.createBrowserContext`, creates a target for the task/session, captures screenshots through `Page.captureScreenshot`, captures PDFs through `Page.printToPDF`, records console events, and writes screenshots/downloads/PDFs through the SparseKernel artifact API. Callers can construct it directly with a kernel-shaped client or use `createSparseKernelCdpBrowserBroker` to wire it to `sparsekerneld`. Download capture uses CDP download events and the content-addressed artifact store rather than exposing arbitrary download paths to agents.
1717

18-
When `OPENCLAW_RUNTIME_BROWSER_BROKER=cdp` and `OPENCLAW_SPARSEKERNEL_BROWSER_CDP_ENDPOINT=<loopback endpoint>` are set, the embedded OpenClaw browser tool receives an internal SparseKernel proxy instead of raw CDP access. Set `OPENCLAW_RUNTIME_BROWSER_BROKER=managed` to let SparseKernel ask the existing OpenClaw browser control service to start the managed browser and return its loopback CDP endpoint; `OPENCLAW_SPARSEKERNEL_BROWSER_CONTROL_URL` overrides the default `http://127.0.0.1:18791` control URL. Supported v0 actions (`status`, `doctor`, `profiles`, `tabs`, `open`, `navigate`, `focus`, `close`, `snapshot`, `console`, `screenshot`, `pdf`, direct file-input `upload`, `dialog`, and basic `act`) operate against the leased CDP context. Snapshots use a bounded CDP `Runtime.evaluate` DOM read, basic actions resolve refs from the latest brokered snapshot, console output is captured from CDP runtime/log events, and screenshot/PDF output is captured as SparseKernel artifacts, read back through artifact access, and converted to existing tool result formats for compatibility. The context is retained for the active embedded run and released during broker cleanup, not opened and closed for every browser tool call.
18+
When `OPENCLAW_RUNTIME_BROWSER_BROKER=cdp` and `OPENCLAW_SPARSEKERNEL_BROWSER_CDP_ENDPOINT=<loopback endpoint>` are set, the embedded OpenClaw browser tool receives an internal SparseKernel proxy instead of raw CDP access. Set `OPENCLAW_RUNTIME_BROWSER_BROKER=managed` to let SparseKernel ask the existing OpenClaw browser control service to start the managed browser and return its loopback CDP endpoint; `OPENCLAW_SPARSEKERNEL_BROWSER_CONTROL_URL` overrides the default `http://127.0.0.1:18791` control URL. Supported v0 actions (`status`, `doctor`, `profiles`, `tabs`, `open`, `navigate`, `focus`, `close`, `snapshot`, `console`, `screenshot`, `pdf`, direct file-input `upload`, `dialog`, and brokered `act`) operate against the leased CDP context. Brokered `act` covers the OpenClaw action contract for click, coordinate click, type, press, hover, scroll, drag, select, fill, resize, wait, evaluate, close, and batch using CDP input events plus bounded DOM evaluation. Snapshots use a bounded CDP `Runtime.evaluate` DOM read, actions resolve refs from the latest brokered snapshot where needed, console output is captured from CDP runtime/log events, and screenshot/PDF output is captured as SparseKernel artifacts, read back through artifact access, and converted to existing tool result formats for compatibility. The context is retained for the active embedded run and released during broker cleanup, not opened and closed for every browser tool call.
1919

2020
The current daemon can ask the existing OpenClaw browser control service to own the browser process, but it does not yet launch or supervise SparseKernel-native process pools by trust zone. The next backend should:
2121

@@ -26,3 +26,5 @@ The current daemon can ask the existing OpenClaw browser control service to own
2626
- enforce max context counts through leases.
2727

2828
BrowserContext isolation is session isolation, not host isolation. Playwright route blocking is useful request control, not a hard security boundary.
29+
30+
The brokered CDP action engine is intentionally implemented inside the broker boundary, but it is not a byte-for-byte Playwright clone. Some browser behaviors that depend on Playwright's locator retry model or network-idle accounting may still differ and should be covered by targeted tests before relying on them for critical automations.

docs/architecture/local-agent-kernel.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ The browser broker model is:
9696

9797
Important boundary: BrowserContext isolation is session isolation, not host isolation. Playwright route blocking and SSRF guards are useful controls, but they are not hard security boundaries.
9898

99-
The broker applies configured trust-zone network policy to explicit allowed origins before allocating a context. This is an egress guard for brokered contexts, not a kernel or VM boundary. Set `OPENCLAW_RUNTIME_BROWSER_BROKER=cdp` and `OPENCLAW_SPARSEKERNEL_BROWSER_CDP_ENDPOINT=<loopback endpoint>` to make the OpenClaw browser tool acquire a real SparseKernel CDP context for the active run. Set `OPENCLAW_RUNTIME_BROWSER_BROKER=managed` to use the existing OpenClaw browser control service as the managed process owner and let SparseKernel lease CDP contexts from its reported endpoint. The runtime injects an internal browser proxy for supported navigation, tab, snapshot, console, screenshot, PDF, direct file-input upload, dialog, and basic action routes instead of exposing raw CDP to the agent. Screenshot and PDF outputs go through the artifact store.
99+
The broker applies configured trust-zone network policy to explicit allowed origins before allocating a context. This is an egress guard for brokered contexts, not a kernel or VM boundary. Set `OPENCLAW_RUNTIME_BROWSER_BROKER=cdp` and `OPENCLAW_SPARSEKERNEL_BROWSER_CDP_ENDPOINT=<loopback endpoint>` to make the OpenClaw browser tool acquire a real SparseKernel CDP context for the active run. Set `OPENCLAW_RUNTIME_BROWSER_BROKER=managed` to use the existing OpenClaw browser control service as the managed process owner and let SparseKernel lease CDP contexts from its reported endpoint. The runtime injects an internal browser proxy for supported navigation, tab, snapshot, console, screenshot, PDF, direct file-input upload, dialog, and action routes instead of exposing raw CDP to the agent. Brokered actions cover the OpenClaw action contract with CDP input events and bounded DOM evaluation; screenshot and PDF outputs go through the artifact store.
100100

101101
## Sandbox broker
102102

packages/browser-broker/src/index.test.ts

Lines changed: 142 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -108,33 +108,9 @@ class FakeCdpTransport implements CdpTransport {
108108
this.handleNavigate(String(message.params?.url ?? ""));
109109
break;
110110
case "Runtime.evaluate":
111-
this.respond(message.id, {
112-
result: {
113-
value: JSON.stringify({
114-
title: this.currentUrl.includes("example.com") ? "Example" : "",
115-
url: this.currentUrl,
116-
text: "Example page body",
117-
links: [
118-
{
119-
text: "Example link",
120-
href: "https://example.com/link",
121-
selector: "body > a:nth-of-type(1)",
122-
},
123-
],
124-
buttons: [{ text: "Submit", selector: "body > button:nth-of-type(1)" }],
125-
inputs: [
126-
{
127-
label: "Search",
128-
name: "q",
129-
type: "text",
130-
selector: "body > input:nth-of-type(1)",
131-
},
132-
],
133-
truncated: false,
134-
}),
135-
},
136-
});
111+
this.respondRuntimeEvaluate(message);
137112
break;
113+
case "Input.dispatchMouseEvent":
138114
case "Input.dispatchKeyEvent":
139115
case "Emulation.setDeviceMetricsOverride":
140116
case "Page.handleJavaScriptDialog":
@@ -196,6 +172,59 @@ class FakeCdpTransport implements CdpTransport {
196172
});
197173
}
198174

175+
private respondRuntimeEvaluate(message: { id: number; params?: Record<string, unknown> }): void {
176+
const expression = String(message.params?.expression ?? "");
177+
if (expression.includes("const links =")) {
178+
this.respond(message.id, {
179+
result: {
180+
value: JSON.stringify({
181+
title: this.currentUrl.includes("example.com") ? "Example" : "",
182+
url: this.currentUrl,
183+
text: "Example page body",
184+
links: [
185+
{
186+
text: "Example link",
187+
href: "https://example.com/link",
188+
selector: "body > a:nth-of-type(1)",
189+
},
190+
],
191+
buttons: [{ text: "Submit", selector: "body > button:nth-of-type(1)" }],
192+
inputs: [
193+
{
194+
label: "Search",
195+
name: "q",
196+
type: "text",
197+
selector: "body > input:nth-of-type(1)",
198+
},
199+
],
200+
truncated: false,
201+
}),
202+
},
203+
});
204+
return;
205+
}
206+
if (expression.includes("title: document.title")) {
207+
this.respond(message.id, {
208+
result: {
209+
value: JSON.stringify({
210+
title: this.currentUrl.includes("example.com") ? "Example" : "",
211+
url: this.currentUrl,
212+
}),
213+
},
214+
});
215+
return;
216+
}
217+
if (expression.includes("() => 42")) {
218+
this.respond(message.id, {
219+
result: { value: 42 },
220+
});
221+
return;
222+
}
223+
this.respond(message.id, {
224+
result: { value: { ok: true } },
225+
});
226+
}
227+
199228
private respond(id: number, result: Record<string, unknown>): void {
200229
this.emit({ id, result });
201230
}
@@ -577,6 +606,93 @@ describe("@openclaw/sparsekernel-browser-broker", () => {
577606
);
578607
});
579608

609+
it("performs the broader brokered action contract against the leased context", async () => {
610+
const kernel = new FakeKernel();
611+
const transport = new FakeCdpTransport();
612+
const broker = new SparseKernelCdpBrowserBroker({
613+
kernel,
614+
fetchImpl: async () =>
615+
Response.json({
616+
webSocketDebuggerUrl: "ws://127.0.0.1/devtools/browser/test",
617+
}),
618+
transportFactory: async () => transport,
619+
});
620+
621+
const context = await broker.acquireContext({
622+
trust_zone_id: "public_web",
623+
cdp_endpoint: "http://127.0.0.1:9222",
624+
initial_url: "https://example.com/",
625+
});
626+
await broker.snapshotContext(context.ledger_context.id);
627+
628+
await broker.actContext(context.ledger_context.id, {
629+
kind: "clickCoords",
630+
x: 10,
631+
y: 20,
632+
doubleClick: true,
633+
});
634+
await broker.actContext(context.ledger_context.id, {
635+
kind: "scrollIntoView",
636+
ref: "e1",
637+
});
638+
await broker.actContext(context.ledger_context.id, {
639+
kind: "drag",
640+
startRef: "e1",
641+
endRef: "e2",
642+
});
643+
await broker.actContext(context.ledger_context.id, {
644+
kind: "select",
645+
ref: "e3",
646+
values: ["choice"],
647+
});
648+
await broker.actContext(context.ledger_context.id, {
649+
kind: "fill",
650+
fields: [{ ref: "e3", type: "text", value: "filled" }],
651+
});
652+
const evaluated = await broker.actContext(context.ledger_context.id, {
653+
kind: "evaluate",
654+
fn: "() => 42",
655+
});
656+
const batch = await broker.actContext(context.ledger_context.id, {
657+
kind: "batch",
658+
actions: [
659+
{ kind: "press", key: "Enter" },
660+
{ kind: "wait", selector: "body", url: "example.com", loadState: "load" },
661+
],
662+
});
663+
664+
expect(evaluated.value).toBe(42);
665+
expect(batch.value).toMatchObject({
666+
results: [
667+
{ ok: true, kind: "press" },
668+
{ ok: true, kind: "wait" },
669+
],
670+
});
671+
expect(transport.sent).toEqual(
672+
expect.arrayContaining([
673+
expect.objectContaining({ method: "Input.dispatchMouseEvent" }),
674+
expect.objectContaining({
675+
method: "Runtime.evaluate",
676+
params: expect.objectContaining({
677+
expression: expect.stringContaining("scrollIntoView"),
678+
}),
679+
}),
680+
expect.objectContaining({
681+
method: "Runtime.evaluate",
682+
params: expect.objectContaining({
683+
expression: expect.stringContaining("DataTransfer"),
684+
}),
685+
}),
686+
expect.objectContaining({
687+
method: "Runtime.evaluate",
688+
params: expect.objectContaining({
689+
expression: expect.stringContaining("filled"),
690+
}),
691+
}),
692+
]),
693+
);
694+
});
695+
580696
it("constructs from the SparseKernel daemon client", async () => {
581697
const calls: Array<{ url: string; body?: unknown }> = [];
582698
const transport = new FakeCdpTransport();

0 commit comments

Comments
 (0)