Skip to content

Commit 6431fb4

Browse files
feat(sparsekernel): guard brokered browser navigation
1 parent 1db23e6 commit 6431fb4

8 files changed

Lines changed: 633 additions & 149 deletions

File tree

docs/architecture/browser-broker.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,8 @@ When `OPENCLAW_RUNTIME_BROWSER_BROKER=cdp` and `OPENCLAW_SPARSEKERNEL_BROWSER_CD
1919

2020
Set `OPENCLAW_RUNTIME_BROWSER_BROKER=native` to let SparseKernel launch and supervise a local Chromium-compatible process pool by trust zone and profile. The native pool uses a loopback-only remote debugging endpoint, a runtime-owned browser profile directory, and pooled process refcounts; the leased CDP context is released first, then the browser process is stopped after the pool idle timeout. Use `OPENCLAW_SPARSEKERNEL_BROWSER_EXECUTABLE` when Chrome/Chromium is not discoverable on `PATH` or a common platform path. Headless mode is on by default. `OPENCLAW_SPARSEKERNEL_BROWSER_NO_SANDBOX=1` is an explicit opt-out and should only be used when the host environment cannot run Chromium's sandbox.
2121

22-
Supported v0 actions (`status`, `doctor`, `profiles`, `tabs`, `open`, `navigate`, `focus`, `close`, `snapshot`, `console`, `screenshot`, `pdf`, direct file-input `upload`, `dialog`, and brokered `act`) operate against the leased CDP context. Brokered `act` covers the OpenClaw action contract for click, coordinate click, type, press, hover, scroll, drag, select, fill, resize, wait, evaluate, close, and batch using CDP input events plus bounded DOM evaluation. Selector-backed actions retry inside the leased page until their action timeout, and `wait --load networkidle` uses CDP Network events plus a quiet window rather than only checking `document.readyState`. Snapshots use a bounded CDP `Runtime.evaluate` DOM read, actions resolve refs from the latest brokered snapshot where needed, console output is captured from CDP runtime/log events, and screenshot/PDF output is captured as SparseKernel artifacts, read back through artifact access, and converted to existing tool result formats for compatibility. The context is retained for the active embedded run and released during broker cleanup, not opened and closed for every browser tool call.
22+
Supported v0 actions (`status`, `doctor`, `profiles`, `tabs`, `open`, `navigate`, `focus`, `close`, `snapshot`, `console`, `screenshot`, `pdf`, direct file-input `upload`, `dialog`, and brokered `act`) operate against the leased CDP context. Brokered `act` covers the OpenClaw action contract for click, coordinate click, type, press, hover, scroll, drag, select, fill, resize, wait, evaluate, close, and batch using CDP input events plus bounded DOM evaluation. Selector-backed actions retry inside the leased page until their action timeout, and `wait --load networkidle` uses CDP Network events plus a quiet window rather than only checking `document.readyState`. Actions that can change page state are followed by a broker-side navigation check: same-target navigations are accepted only when the resulting URL stays inside the context's allowed-origin policy, while new tabs/windows are closed and rejected in v0 because they are unleased targets. Snapshots use a bounded CDP `Runtime.evaluate` DOM read, actions resolve refs from the latest brokered snapshot where needed, console output is captured from CDP runtime/log events, and screenshot/PDF output is captured as SparseKernel artifacts, read back through artifact access, and converted to existing tool result formats for compatibility. The context is retained for the active embedded run and released during broker cleanup, not opened and closed for every browser tool call.
2323

2424
BrowserContext isolation is session isolation, not host isolation. Playwright route blocking is useful request control, not a hard security boundary.
2525

26-
The brokered CDP action engine is intentionally implemented inside the broker boundary, but it is not a byte-for-byte Playwright clone. Behaviors that depend on Playwright's full actionability checks, locator retry model, navigation guard, or network-idle semantics should still be covered by targeted tests before relying on them for critical automations. Set `OPENCLAW_SPARSEKERNEL_BROWSER_LIVE=1` when running `src/local-kernel/browser-cdp-live.test.ts` to exercise the native pool and brokered actions against a real local Chromium-compatible browser.
26+
The brokered CDP action engine is intentionally implemented inside the broker boundary, but it is not a byte-for-byte Playwright clone. Behaviors that depend on Playwright's full actionability checks, locator retry model, popup ownership, or network-idle semantics should still be covered by targeted tests before relying on them for critical automations. Set `OPENCLAW_SPARSEKERNEL_BROWSER_LIVE=1` when running `src/local-kernel/browser-cdp-live.test.ts` to exercise the native pool and brokered actions against a real local Chromium-compatible browser.

docs/architecture/local-agent-kernel.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ The browser broker model is:
9696

9797
Important boundary: BrowserContext isolation is session isolation, not host isolation. Playwright route blocking and SSRF guards are useful controls, but they are not hard security boundaries.
9898

99-
The broker applies configured trust-zone network policy to explicit allowed origins before allocating a context. This is an egress guard for brokered contexts, not a kernel or VM boundary. Set `OPENCLAW_RUNTIME_BROWSER_BROKER=cdp` and `OPENCLAW_SPARSEKERNEL_BROWSER_CDP_ENDPOINT=<loopback endpoint>` to make the OpenClaw browser tool acquire a real SparseKernel CDP context for the active run. Set `OPENCLAW_RUNTIME_BROWSER_BROKER=managed` to use the existing OpenClaw browser control service as the managed process owner and let SparseKernel lease CDP contexts from its reported endpoint. Set `OPENCLAW_RUNTIME_BROWSER_BROKER=native` to let SparseKernel launch and supervise a local Chromium-compatible process pool keyed by trust zone/profile, with process lifetime tied to brokered context leases and idle timeout. The runtime injects an internal browser proxy for supported navigation, tab, snapshot, console, screenshot, PDF, direct file-input upload, dialog, and action routes instead of exposing raw CDP to the agent. Brokered actions cover the OpenClaw action contract with CDP input events, bounded DOM evaluation, selector retry, and CDP-backed network-idle waiting; screenshot and PDF outputs go through the artifact store.
99+
The broker applies configured trust-zone network policy to explicit allowed origins before allocating a context. This is an egress guard for brokered contexts, not a kernel or VM boundary. Set `OPENCLAW_RUNTIME_BROWSER_BROKER=cdp` and `OPENCLAW_SPARSEKERNEL_BROWSER_CDP_ENDPOINT=<loopback endpoint>` to make the OpenClaw browser tool acquire a real SparseKernel CDP context for the active run. Set `OPENCLAW_RUNTIME_BROWSER_BROKER=managed` to use the existing OpenClaw browser control service as the managed process owner and let SparseKernel lease CDP contexts from its reported endpoint. Set `OPENCLAW_RUNTIME_BROWSER_BROKER=native` to let SparseKernel launch and supervise a local Chromium-compatible process pool keyed by trust zone/profile, with process lifetime tied to brokered context leases and idle timeout. The runtime injects an internal browser proxy for supported navigation, tab, snapshot, console, screenshot, PDF, direct file-input upload, dialog, and action routes instead of exposing raw CDP to the agent. Brokered actions cover the OpenClaw action contract with CDP input events, bounded DOM evaluation, selector retry, CDP-backed network-idle waiting, and post-action navigation checks. Same-target action navigations must stay inside the context's allowed origins when a policy is configured; new tabs/windows are closed and rejected until the broker owns a multi-tab lease model. Screenshot and PDF outputs go through the artifact store.
100100

101101
## Sandbox broker
102102

@@ -138,7 +138,7 @@ Session metadata writes are mirrored into the runtime ledger by default. Set `OP
138138
## Current limitations
139139

140140
- Full transcript ownership is still staged: session metadata can run with SQLite as primary, assistant transcript mirrors and import/export are ledger-backed, and the pi session manager still appends JSONL.
141-
- Native browser process pooling exists for loopback CDP contexts, but it is still a small supervisor around Chromium. Brokered actions now cover selector retry and CDP-backed network-idle waits, but the engine is still not a full Playwright-equivalent process manager and does not provide host isolation.
141+
- Native browser process pooling exists for loopback CDP contexts, but it is still a small supervisor around Chromium. Brokered actions now cover selector retry, CDP-backed network-idle waits, and post-action navigation checks, but the engine is still not a full Playwright-equivalent process manager and does not provide host isolation.
142142
- Sandbox allocation records requested bwrap/minijail/Docker backends and checks availability, but local/no-isolation still does not harden execution.
143143
- Native plugins still execute in process; tool invocation is brokered and audited but untrusted plugin isolation is a later process-boundary change.
144144
- Network policy enforcement currently exists where brokers call it; host-level egress proxying is still future work.

packages/browser-broker/src/index.test.ts

Lines changed: 166 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,8 @@ class FakeCdpTransport implements CdpTransport {
6565
private readonly errorListeners: Array<(error: Error) => void> = [];
6666
private downloadPath: string | undefined;
6767
private currentUrl = "about:blank";
68+
private nextActionNavigationUrl: string | undefined;
69+
private nextActionNewTarget: { targetId: string; url: string } | undefined;
6870

6971
send(data: string): void {
7072
const message = JSON.parse(data) as {
@@ -89,6 +91,7 @@ class FakeCdpTransport implements CdpTransport {
8991
case "Runtime.enable":
9092
case "Network.enable":
9193
case "Log.enable":
94+
case "Target.setDiscoverTargets":
9295
this.respond(message.id, {});
9396
break;
9497
case "Browser.setDownloadBehavior":
@@ -189,6 +192,14 @@ class FakeCdpTransport implements CdpTransport {
189192
});
190193
}
191194

195+
queueActionNavigation(url: string): void {
196+
this.nextActionNavigationUrl = url;
197+
}
198+
199+
queueActionNewTarget(url: string, targetId = "target-popup"): void {
200+
this.nextActionNewTarget = { targetId, url };
201+
}
202+
192203
private respondRuntimeEvaluate(message: { id: number; params?: Record<string, unknown> }): void {
193204
const expression = String(message.params?.expression ?? "");
194205
if (expression.includes("const links =")) {
@@ -237,6 +248,22 @@ class FakeCdpTransport implements CdpTransport {
237248
});
238249
return;
239250
}
251+
if (expression.includes("waitForActionTarget")) {
252+
const navigationUrl = this.nextActionNavigationUrl;
253+
const newTarget = this.nextActionNewTarget;
254+
this.nextActionNavigationUrl = undefined;
255+
this.nextActionNewTarget = undefined;
256+
this.respond(message.id, {
257+
result: { value: { ok: true } },
258+
});
259+
if (navigationUrl) {
260+
setTimeout(() => this.emitFrameNavigation(navigationUrl), 0);
261+
}
262+
if (newTarget) {
263+
setTimeout(() => this.emitNewTarget(newTarget.targetId, newTarget.url), 0);
264+
}
265+
return;
266+
}
240267
this.respond(message.id, {
241268
result: { value: { ok: true } },
242269
});
@@ -295,6 +322,41 @@ class FakeCdpTransport implements CdpTransport {
295322
});
296323
}, 0);
297324
}
325+
326+
private emitFrameNavigation(url: string): void {
327+
this.currentUrl = url;
328+
this.emit({
329+
method: "Page.frameNavigated",
330+
params: {
331+
frame: {
332+
id: "frame-1",
333+
url,
334+
},
335+
},
336+
sessionId: "session-1",
337+
});
338+
setTimeout(() => {
339+
this.emit({
340+
method: "Page.loadEventFired",
341+
params: {},
342+
sessionId: "session-1",
343+
});
344+
}, 0);
345+
}
346+
347+
private emitNewTarget(targetId: string, url: string): void {
348+
this.emit({
349+
method: "Target.targetCreated",
350+
params: {
351+
targetInfo: {
352+
targetId,
353+
type: "page",
354+
browserContextId: "cdp-context-1",
355+
url,
356+
},
357+
},
358+
});
359+
}
298360
}
299361

300362
describe("@openclaw/sparsekernel-browser-broker", () => {
@@ -623,6 +685,106 @@ describe("@openclaw/sparsekernel-browser-broker", () => {
623685
);
624686
});
625687

688+
it("accepts same-origin post-action navigation within the context policy", async () => {
689+
const kernel = new FakeKernel();
690+
const transport = new FakeCdpTransport();
691+
const broker = new SparseKernelCdpBrowserBroker({
692+
kernel,
693+
fetchImpl: async () =>
694+
Response.json({
695+
webSocketDebuggerUrl: "ws://127.0.0.1/devtools/browser/test",
696+
}),
697+
transportFactory: async () => transport,
698+
});
699+
700+
const context = await broker.acquireContext({
701+
trust_zone_id: "public_web",
702+
cdp_endpoint: "http://127.0.0.1:9222",
703+
initial_url: "https://example.com/start",
704+
allowed_origins: ["https://example.com"],
705+
});
706+
transport.queueActionNavigation("https://example.com/next");
707+
708+
await expect(
709+
broker.actContext(context.ledger_context.id, {
710+
kind: "click",
711+
selector: "#continue",
712+
}),
713+
).resolves.toMatchObject({ ok: true, kind: "click" });
714+
await expect(broker.listTabs(context.ledger_context.id)).resolves.toEqual([
715+
expect.objectContaining({ url: "https://example.com/next" }),
716+
]);
717+
});
718+
719+
it("releases a context when post-action navigation leaves the allowed origins", async () => {
720+
const kernel = new FakeKernel();
721+
const transport = new FakeCdpTransport();
722+
const broker = new SparseKernelCdpBrowserBroker({
723+
kernel,
724+
fetchImpl: async () =>
725+
Response.json({
726+
webSocketDebuggerUrl: "ws://127.0.0.1/devtools/browser/test",
727+
}),
728+
transportFactory: async () => transport,
729+
});
730+
731+
const context = await broker.acquireContext({
732+
trust_zone_id: "public_web",
733+
cdp_endpoint: "http://127.0.0.1:9222",
734+
initial_url: "https://example.com/start",
735+
allowed_origins: ["https://example.com"],
736+
});
737+
transport.queueActionNavigation("https://blocked.example/next");
738+
739+
await expect(
740+
broker.actContext(context.ledger_context.id, {
741+
kind: "click",
742+
selector: "#escape",
743+
}),
744+
).rejects.toThrow(/post-action navigation blocked by allowed origins/);
745+
await expect(broker.listTabs(context.ledger_context.id)).rejects.toThrow(/not materialized/);
746+
expect(kernel.releasedContextIds).toEqual(["browser_ctx_1"]);
747+
});
748+
749+
it("closes new tabs opened by brokered actions instead of accepting unleased targets", async () => {
750+
const kernel = new FakeKernel();
751+
const transport = new FakeCdpTransport();
752+
const broker = new SparseKernelCdpBrowserBroker({
753+
kernel,
754+
fetchImpl: async () =>
755+
Response.json({
756+
webSocketDebuggerUrl: "ws://127.0.0.1/devtools/browser/test",
757+
}),
758+
transportFactory: async () => transport,
759+
});
760+
761+
const context = await broker.acquireContext({
762+
trust_zone_id: "public_web",
763+
cdp_endpoint: "http://127.0.0.1:9222",
764+
initial_url: "https://example.com/start",
765+
allowed_origins: ["https://example.com"],
766+
});
767+
transport.queueActionNewTarget("https://example.com/popup", "target-popup");
768+
769+
await expect(
770+
broker.actContext(context.ledger_context.id, {
771+
kind: "click",
772+
selector: "#popup",
773+
}),
774+
).rejects.toThrow(/opened a new tab\/window/);
775+
expect(transport.sent).toEqual(
776+
expect.arrayContaining([
777+
expect.objectContaining({
778+
method: "Target.closeTarget",
779+
params: expect.objectContaining({ targetId: "target-popup" }),
780+
}),
781+
]),
782+
);
783+
await expect(broker.listTabs(context.ledger_context.id)).resolves.toEqual([
784+
expect.objectContaining({ targetId: "target-1" }),
785+
]);
786+
});
787+
626788
it("performs the broader brokered action contract against the leased context", async () => {
627789
const kernel = new FakeKernel();
628790
const transport = new FakeCdpTransport();
@@ -740,7 +902,10 @@ describe("@openclaw/sparsekernel-browser-broker", () => {
740902
(message as { method?: unknown }).method === "Runtime.evaluate" &&
741903
typeof (message as { params?: { expression?: unknown } }).params?.expression === "string",
742904
);
743-
const actionExpression = runtimeEvaluations.at(-1)?.params.expression ?? "";
905+
const actionExpression =
906+
runtimeEvaluations.find((message) =>
907+
message.params.expression.includes("waitForActionTarget"),
908+
)?.params.expression ?? "";
744909
expect(actionExpression).toContain("waitForActionTarget");
745910
expect(actionExpression).toContain("const timeoutMs = 1234");
746911
expect(actionExpression).toContain("document.querySelector(selector)");

0 commit comments

Comments
 (0)