feat(agent): let the Cate agent control panels (cate-control) by architawr · Pull Request #211 · 0-AI-UG/cate

architawr · 2026-05-30T19:53:15Z

Summary

Adds cate-control: the in-app Cate agent (pi) can now drive the workspace itself — open, arrange, focus/move/resize/close panels and interact with their contents (run a terminal command, navigate a browser, reveal a file at a line, toggle markdown preview) on the canvas it lives in. The agent operates the workspace (what's open, where it sits, what's running, where the camera looks); file edits stay on its normal Edit tool.

12 new cate_* tools, a per-chat Guarded/Auto toggle, and a global on/off setting.

How it works

Transport. pi exposes no generic host RPC — only ctx.ui.select/confirm/input. So a bundled cate-control pi extension piggybacks a structured request on ctx.ui.input() with a sentinel-prefixed JSON payload (@@cate-control@@{...}). The renderer intercepts the sentinel in agentStore.handleEvent before the dialog queue, dispatches it, and replies over the existing agent:uiResponse channel. Pure-visual actions (pan/zoom) are fire-and-forget.

Dispatcher. src/agent/renderer/cateControl.ts resolves the calling chat's workspace/canvas via a context registry that AgentPanel populates (useCanvasStoreApi()), classifies the action (safe vs side-effect), applies the chat's mode policy, runs an executor (cateExecutors.ts), and returns a structured response.

Modes (guarded / auto). A per-chat toggle in the agent chat footer (next to plan-mode), mirroring plan/auto in Claude Code:

Guarded (default) — reads + safe panel ops run immediately; side-effects (close, run command, open URL) prompt an inline Allow/Deny card (reuses the existing approval UI).
Auto — everything runs without prompting.

A global cateControlEnabled setting gates the whole feature.

Semantic placement. The agent never sends pixels — it states intent (right of X, tile, relativeTo: 'self'); cateControlLayout.ts (pure geometry, unit-tested) computes canvas-space rects.

Self-protection. cate_get_layout marks the agent's own host panel isSelf: true; executors refuse to close/move it and exclude it from arrange unless targeted explicitly by id. The feature is placement-agnostic so a future "global/operator agent" can land without rework.

Tool surface (12 `cate_*` tools)

Query: cate_get_layout
Lifecycle: cate_open_panel (editor/terminal/browser/git/fileExplorer/document), cate_close_panel
Management: cate_focus_panel, cate_move_panel, cate_resize_panel, cate_arrange (tile/grid/cascade/focus-one)
Content: cate_run_in_terminal, cate_open_url, cate_reveal_in_editor, cate_set_markdown_preview
Viewport: cate_pan_to, cate_zoom

Two fixes from live testing

Terminal commands now actually run. createTerminal's initialInput is intentionally not persisted (it would re-run on session restore), so open_panel(terminal, command) silently dropped the command; run_in_terminal's single 250 ms retry was too short for a fresh node-pty to register. Both now route through writeToTerminalWhenReady, which polls the terminal registry until the PTY is live, then writes the command.
Markdown preview is reachable. The app already supported preview (EditorPanel → setPanelMarkdownPreview) but the agent had no tool. Added cate_set_markdown_preview + a preview option on cate_reveal_in_editor.

Implementation notes

Bundled extension installs per-workspace on agent-session start (mirrors cate-plan-mode); shipped in prod via the existing extraResources glob.
UI change: one small Guarded/Auto toggle button in the agent chat footer (both sidebar + dock render sites).
Registered cateControlEnabled in SETTINGS_SCHEMA (src/main/store.ts) so tsc stays clean.

Testing

Unit (Vitest): classifier, pure placement geometry, dispatcher gating, per-chat mode state, and all executors (incl. condition-based terminal send + preview). Full suite 453 passing; tsc --noEmit clean; npm run build green.
E2E (Playwright — e2e/cate-control.spec.ts): drives the real renderer dispatcher in a live Electron window — run_in_terminal and open_panel(terminal, command) actually execute in a spawned PTY (asserted via command output in the xterm buffer), and set_markdown_preview flips the editor into preview.
Manual: verified live (npm run dev) — toggle renders + flips, panels open/arrange, terminal commands run, preview toggles.

Future ideas (brief)

Drag a panel into the chat to add it to context (file / terminal output / browser) — unique to the spatial model.
Visual session tree — render pi's branch tree as canvas nodes.
Render ctx.ui.custom extensions as native panels — unblocks much of the pi package catalog (currently flagged "requires terminal").
Global / operator agent — lift the agent out of the canvas as an app-level driver (this work is deliberately placement-agnostic to allow it).
Per-task model routing, multi-workspace control (v1 is active-workspace + main-window only), and richer permission rules for unattended Auto mode.

Test plan

npm run build passes
npm run test passes (453)
In-app: toggle Guarded/Auto; ask the agent to open a file, run a terminal command, tile panels, toggle markdown preview
Guarded mode prompts Allow/Deny for side-effects; Auto runs without prompts

…ed transport)

Replaces the spike sentinel interception in agentStore.handleEvent with a real dispatchCateRequest round-trip, adds the side-effect cateExecutors import, registers each chat's CateControlContext from AgentPanel (under the top-level CanvasStoreProvider, whose store is the active workspace canvas), and adds requestCateApproval/resolveCateApproval plus the cate:-prefix branch in handleApproval. Also makes the cateControl executor holder hoisted-function-based so the import-cycle (cateControl -> agentStore -> cateExecutors -> cateControl) registration is TDZ-safe regardless of module entry order, and polyfills `self` in the node test env so .test.ts suites that transitively import terminalRegistry (xterm) can load.

…typecheck)

- open_panel/run_in_terminal: send the command to the PTY via condition-based waiting (poll until node-pty registers) instead of a 250ms guess; stop relying on createTerminal's initialInput, which the store never forwards. - add cate_set_markdown_preview tool + preview option on cate_reveal_in_editor, wired to appStore.setPanelMarkdownPreview (the app already supports preview; the agent just had no way to trigger it).

…review Drives the real renderer dispatcher via window.__cateE2E.cateControl (as an agent tool would) and observes the live app: run_in_terminal and open_panel(terminal,command) actually execute in a spawned PTY (asserted via command output in the xterm buffer), and set_markdown_preview flips the editor into preview. Adds terminalText + cateControl e2e harness hooks.

Anton-Horn · 2026-05-30T23:17:20Z

We have similar behaviour implicit by letting any agent write to the workspace.json file. Not quite the same but it fits in more into the overall application instead of placing the cateControl system on top. I do see the advantages and I like it being a pi extension. What I don't like so much about it: I don't think the agent should control zoom/camera position. Additionally I noticed when creating a new panel instead of focusing that one it just pan's to a random location.

two things I would like to see here:

make it a bit more basic: remove use actions like panning and zooming. Add in a terminal read (so this is not just a canvas edit feature but also supports agent orchestration). Make sure the tools are all optimised for agents (keep them lean and focused, maybe reduce count, 12 feels very heavy)
include custom tool renderings. This is quite a "flashy" feature so we should display it as it. (custom tool rendering's, not just displaying raw json, approval workflows ). Make sure it fit's to the general UI of the agent panel.

Anton-Horn · 2026-05-30T23:45:35Z

Removed the skill.md for workspaces in #214 so this one is supposed to be the new way how agents can control the workspace/do orchestration. Will be merged once the changes are in and the feature is polished. Thanks!

…en-focus Addresses review on 0-AI-UG#211 (keep the agent toolset lean + focused). - Drop camera-control tools: remove `pan_to` (was identical to `focus_panel` — both just focusAndCenter) and `zoom` (the agent shouldn't drive zoom/viewport). - Fold `reveal_in_editor` into `open_panel`: open_panel now focuses+centers what it opens and accepts target.preview for markdown, so the dedicated reveal tool was redundant. - Add `read_terminal`: read a terminal panel's recent buffer (visible screen + scrollback) as text, so an agent can inspect output it ran via run_in_terminal — the other half of terminal orchestration. Net 13 → 11 tools. Fix the "new panel pans to a random location" bug: execOpenPanel never focused the panel it created and estimated the viewport center as the centroid of all nodes (could be far off-screen). Now it centers on the real viewport (via viewToCanvas + containerSize) and focusAndCenters the opened panel so it lands in view.

Addresses review on 0-AI-UG#211 (render tool calls as custom UI, not raw JSON; make the approval workflow fit the agent panel). - cate-control calls are now surfaced in the chat thread as compact, accent-tinted CateToolCards (icon + verb + summary, expandable to params/result) instead of being silent round-trips. Status tracks running → success / denied / error. - The guarded-mode ApprovalCard renders cate actions with the same icon + a human-readable request ("Let Cate run `npm test`?") rather than a raw `cate:<action>` name + JSON dump. - New cateToolDisplay maps (action, params) → { icon, verb, summary } and is shared by both the thread card and the approval card so they stay consistent.

architawr · 2026-05-31T12:22:43Z

Addressed both points:

1. Leaner, more agent-focused toolset (13 → 11).

Removed pan_to (it was literally identical to focus_panel — both just focusAndCenter) and zoom (the agent shouldn't drive zoom/camera).
Folded reveal_in_editor into open_panel: open_panel now focuses + centers what it opens and accepts target.preview for markdown, so the dedicated reveal tool was redundant.
Added read_terminal — reads a terminal panel's recent buffer (screen + scrollback) as text, so the agent can inspect output it ran via run_in_terminal (the other half of orchestration).
Also fixed the "new panel pans to a random location" bug: open_panel no longer estimates the viewport as the centroid of all nodes — it centers on the real viewport (containerSize + viewToCanvas) and focuses the panel it opened.

2. Custom tool renderings + approval workflow.

cate-control calls now render in the thread as compact, accent-tinted cards (icon + verb + summary, expandable to params/result) with running → success/denied/error status — instead of being silent round-trips.
The guarded-mode approval card renders the action with an icon + human-readable request ("Let Cate run npm test?") rather than cate:<action> + a JSON dump.
A shared cateToolDisplay maps (action, params) → { icon, verb, summary } so the thread card and approval card stay consistent.

Merged latest main; typecheck / test / build all green.

architawr · 2026-05-31T12:32:23Z

@PaulHorn — would appreciate your eyes on this one too when you have a moment 🙏

Anton-Horn · 2026-05-31T12:37:04Z

I'll check it later. 11 tools still seem like a lot of complexity / token usage. Will give you more detailed feedback then. Thanks for the update.

Follow-up to review feedback on 0-AI-UG#211 (Anton-Horn: "11 tools still seem like a lot of complexity / token usage"). Collapses the surface the agent sees from 11 tools to 4, grouped by concept rather than per-verb: - cate_layout {op: get|arrange} — read the canvas / rearrange panels - cate_panel {op: open|focus|move|resize|close|preview} — single-panel lifecycle - cate_browser {panelId?, url} — navigate a browser panel (room to grow) - cate_terminal {op: run|read} — run a command / read output Implementation: thin op-routers (execLayout/execPanel/execBrowser/execTerminal) delegate to the same focused executors as before, so per-op behavior and the self-protection guards (won't close/move the host agent panel) are unchanged. classifyCateAction still escalates only destructive (close) and outbound (run a command, navigate/open a remote url) ops to guarded-mode approval. `arrange` moved out of panel into `layout` (it's a canvas-wide op, not per-panel); `navigate` moved into the new `browser` tool; `editor` preview stays a panel op (thin — can be split out symmetrically with browser later if it grows). cateToolDisplay + the thread/approval cards updated for the new actions. Shared protocol, extension tool defs, executors, e2e spec, and all unit tests migrated.

The extension is copied into each workspace's pi-agent extensions dir, where pi loads it at agent start. installCateControl used copyIfMissing (skip-if-exists), so once installed the copy never refreshed — after the toolset was consolidated the agent kept loading the OLD extension and emitted action names (open_panel, close_panel, …) the renderer dispatcher no longer handles, so every cate tool call failed with "Unknown or unimplemented action". This is also a latent prod bug: shipping a new extension version would never reach users who already had an older copy installed. Fix: copyIfChanged overwrites the installed copy whenever its bytes differ from the bundled source. The extension's action protocol is coupled to the renderer, so the bundled copy is authoritative — there's no user-customization to preserve. Adds a unit test (missing → write, differing → overwrite, identical → skip). Note: the e2e suite drives the renderer dispatcher directly (window.__cateE2E), bypassing the installed extension, which is why it didn't catch the skew.

architawr · 2026-05-31T13:57:19Z

@Anton-Horn — update on the complexity / token-usage point.

Consolidated 11 → 4 tools. The agent now sees only:

cate_layout {op: get|arrange} — read the canvas / rearrange panels
cate_panel {op: open|focus|move|resize|close|preview} — single-panel lifecycle
cate_browser {panelId?, url} — navigate a browser panel (room to grow: reload/back/JS)
cate_terminal {op: run|read} — run a command / read its output

Thin op-routers delegate to the same focused executors, so per-op behavior and the self-protection guards (won't close/move the agent's own host panel) are unchanged. classifyCateAction still escalates only destructive (close) and outbound (run a command, navigate to a remote url) ops to guarded-mode approval; reads/focus/layout stay safe.

Custom rendering carries over from the earlier round: cate actions render as compact accent cards in the thread (icon + verb + summary, expandable to params/result), and guarded side-effects show a human-readable “Let Cate run …?” prompt instead of raw JSON.

Also fixed a bug found while testing this: the extension is copied into each workspace's pi-agent dir, and the installer used skip-if-exists — so after the toolset changed, the agent kept loading the stale copy and emitted action names the renderer no longer handled (“Unknown or unimplemented action”). Switched to refresh-on-change (copyIfChanged); this also fixes the latent case where a shipped extension update would never reach users who already had a copy.

typecheck / test (541) / build all green. Out of draft now — ready for your detailed look.

…t/agent-controls-cate

PR 0-AI-UG#226 removed git, fileExplorer, projectList from PanelType and deleted createGit/createFileExplorer from AppStore. Remove them from cateExecutors OPENABLE list and execOpenPanel switch to fix typecheck.

…-control PR 0-AI-UG#226 dropped git/fileExplorer/projectList panels. Beyond the executor switch, clean up the rest of cate-control's references to them: - tool schema description no longer advertises git|fileExplorer as openable - cateToolDisplay drops their icon entries (+ unused GitBranch/TreeStructure imports) - tests drop the dead createGit/createFileExplorer mocks and the git example

Anton-Horn · 2026-05-31T18:21:21Z

I'm on it now. Would like to push some code and steer this pr myself. If that's fine with you.

architawr · 2026-05-31T18:26:38Z

Ok, let's go

Artur Karapetyan added 15 commits May 30, 2026 23:31

feat(agent): scaffold cate-control extension installer (spike validat…

3ec651c

…ed transport)

feat(agent): cate-control wire types + action classifier

6208ea1

feat(settings): add cateControlEnabled flag (default on)

605e58c

feat(agent): pure placement + arrange geometry for cate-control

d2dec33

feat(agent): per-chat cateControlMode state in agentStore

c858f1a

feat(agent): cate-control dispatcher core (registry + gating)

8dc7d16

feat(agent): cate-control lifecycle executors (get_layout/open/close)

8399711

feat(agent): cate-control management/content/viewport executors + map

692afb5

feat(agent): full cate-control tool surface in the pi extension

86bcbca

feat(agent): guarded/auto toggle for cate-control

5fa6c89

fix(settings): register cateControlEnabled in SETTINGS_SCHEMA (fixes …

0387093

…typecheck)

Merge remote-tracking branch 'origin/main' into feat/agent-controls-cate

109d8a9

Anton-Horn mentioned this pull request May 30, 2026

chore(workspace): drop the "agent edits the workspace" skill #214

Merged

Artur Karapetyan added 4 commits May 31, 2026 15:08

Merge remote-tracking branch 'origin/main' into feat/agent-controls-cate

ae3b18a

Merge remote-tracking branch 'origin/main' into feat/agent-controls-cate

78297ab

architawr marked this pull request as draft May 31, 2026 13:30

Artur Karapetyan added 2 commits May 31, 2026 20:42

architawr marked this pull request as ready for review May 31, 2026 13:57

Anton-Horn mentioned this pull request May 31, 2026

[Question] Any plan to allow the Agent to communicate with the Browser window? #222

Open

architawr and others added 5 commits June 1, 2026 00:18

Merge branch 'main' into feat/agent-controls-cate

cdaebdb

Merge remote-tracking branch 'origin/main' into feat/agent-controls-cate

6967e64

Merge remote-tracking branch 'fork/feat/agent-controls-cate' into fea…

5ab8465

…t/agent-controls-cate

fix(agent): drop removed git/fileExplorer panel types from cate-control

ba7bbcb

PR 0-AI-UG#226 removed git, fileExplorer, projectList from PanelType and deleted createGit/createFileExplorer from AppStore. Remove them from cateExecutors OPENABLE list and execOpenPanel switch to fix typecheck.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): let the Cate agent control panels (cate-control)#211

feat(agent): let the Cate agent control panels (cate-control)#211
architawr wants to merge 26 commits into
0-AI-UG:mainfrom
architawr:feat/agent-controls-cate

architawr commented May 30, 2026 •

edited

Loading

Uh oh!

Anton-Horn commented May 30, 2026

Uh oh!

Anton-Horn commented May 30, 2026

Uh oh!

architawr commented May 31, 2026

Uh oh!

architawr commented May 31, 2026

Uh oh!

Anton-Horn commented May 31, 2026

Uh oh!

architawr commented May 31, 2026

Uh oh!

Anton-Horn commented May 31, 2026

Uh oh!

architawr commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

architawr commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How it works

Tool surface (12 cate_* tools)

Two fixes from live testing

Implementation notes

Testing

Future ideas (brief)

Test plan

Uh oh!

Anton-Horn commented May 30, 2026

Uh oh!

Anton-Horn commented May 30, 2026

Uh oh!

architawr commented May 31, 2026

Uh oh!

architawr commented May 31, 2026

Uh oh!

Anton-Horn commented May 31, 2026

Uh oh!

architawr commented May 31, 2026

Uh oh!

Anton-Horn commented May 31, 2026

Uh oh!

architawr commented May 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

architawr commented May 30, 2026 •

edited

Loading

Tool surface (12 `cate_*` tools)