Skip to content

feat(agent): let the Cate agent control panels (cate-control)#211

Open
architawr wants to merge 26 commits into
0-AI-UG:mainfrom
architawr:feat/agent-controls-cate
Open

feat(agent): let the Cate agent control panels (cate-control)#211
architawr wants to merge 26 commits into
0-AI-UG:mainfrom
architawr:feat/agent-controls-cate

Conversation

@architawr
Copy link
Copy Markdown
Contributor

@architawr architawr commented May 30, 2026

Summary

Adds cate-control: the in-app Cate agent (pi) can now drive the workspace itself — open, arrange, focus/move/resize/close panels and interact with their contents (run a terminal command, navigate a browser, reveal a file at a line, toggle markdown preview) on the canvas it lives in. The agent operates the workspace (what's open, where it sits, what's running, where the camera looks); file edits stay on its normal Edit tool.

12 new cate_* tools, a per-chat Guarded/Auto toggle, and a global on/off setting.

How it works

Transport. pi exposes no generic host RPC — only ctx.ui.select/confirm/input. So a bundled cate-control pi extension piggybacks a structured request on ctx.ui.input() with a sentinel-prefixed JSON payload (@@cate-control@@{...}). The renderer intercepts the sentinel in agentStore.handleEvent before the dialog queue, dispatches it, and replies over the existing agent:uiResponse channel. Pure-visual actions (pan/zoom) are fire-and-forget.

Dispatcher. src/agent/renderer/cateControl.ts resolves the calling chat's workspace/canvas via a context registry that AgentPanel populates (useCanvasStoreApi()), classifies the action (safe vs side-effect), applies the chat's mode policy, runs an executor (cateExecutors.ts), and returns a structured response.

Modes (guarded / auto). A per-chat toggle in the agent chat footer (next to plan-mode), mirroring plan/auto in Claude Code:

  • Guarded (default) — reads + safe panel ops run immediately; side-effects (close, run command, open URL) prompt an inline Allow/Deny card (reuses the existing approval UI).
  • Auto — everything runs without prompting.

A global cateControlEnabled setting gates the whole feature.

Semantic placement. The agent never sends pixels — it states intent (right of X, tile, relativeTo: 'self'); cateControlLayout.ts (pure geometry, unit-tested) computes canvas-space rects.

Self-protection. cate_get_layout marks the agent's own host panel isSelf: true; executors refuse to close/move it and exclude it from arrange unless targeted explicitly by id. The feature is placement-agnostic so a future "global/operator agent" can land without rework.

Tool surface (12 cate_* tools)

  • Query: cate_get_layout
  • Lifecycle: cate_open_panel (editor/terminal/browser/git/fileExplorer/document), cate_close_panel
  • Management: cate_focus_panel, cate_move_panel, cate_resize_panel, cate_arrange (tile/grid/cascade/focus-one)
  • Content: cate_run_in_terminal, cate_open_url, cate_reveal_in_editor, cate_set_markdown_preview
  • Viewport: cate_pan_to, cate_zoom

Two fixes from live testing

  • Terminal commands now actually run. createTerminal's initialInput is intentionally not persisted (it would re-run on session restore), so open_panel(terminal, command) silently dropped the command; run_in_terminal's single 250 ms retry was too short for a fresh node-pty to register. Both now route through writeToTerminalWhenReady, which polls the terminal registry until the PTY is live, then writes the command.
  • Markdown preview is reachable. The app already supported preview (EditorPanelsetPanelMarkdownPreview) but the agent had no tool. Added cate_set_markdown_preview + a preview option on cate_reveal_in_editor.

Implementation notes

  • Bundled extension installs per-workspace on agent-session start (mirrors cate-plan-mode); shipped in prod via the existing extraResources glob.
  • UI change: one small Guarded/Auto toggle button in the agent chat footer (both sidebar + dock render sites).
  • Registered cateControlEnabled in SETTINGS_SCHEMA (src/main/store.ts) so tsc stays clean.

Testing

  • Unit (Vitest): classifier, pure placement geometry, dispatcher gating, per-chat mode state, and all executors (incl. condition-based terminal send + preview). Full suite 453 passing; tsc --noEmit clean; npm run build green.
  • E2E (Playwright — e2e/cate-control.spec.ts): drives the real renderer dispatcher in a live Electron window — run_in_terminal and open_panel(terminal, command) actually execute in a spawned PTY (asserted via command output in the xterm buffer), and set_markdown_preview flips the editor into preview.
  • Manual: verified live (npm run dev) — toggle renders + flips, panels open/arrange, terminal commands run, preview toggles.

Future ideas (brief)

  • Drag a panel into the chat to add it to context (file / terminal output / browser) — unique to the spatial model.
  • Visual session tree — render pi's branch tree as canvas nodes.
  • Render ctx.ui.custom extensions as native panels — unblocks much of the pi package catalog (currently flagged "requires terminal").
  • Global / operator agent — lift the agent out of the canvas as an app-level driver (this work is deliberately placement-agnostic to allow it).
  • Per-task model routing, multi-workspace control (v1 is active-workspace + main-window only), and richer permission rules for unattended Auto mode.

Test plan

  • npm run build passes
  • npm run test passes (453)
  • In-app: toggle Guarded/Auto; ask the agent to open a file, run a terminal command, tile panels, toggle markdown preview
  • Guarded mode prompts Allow/Deny for side-effects; Auto runs without prompts

Artur Karapetyan added 15 commits May 30, 2026 23:31
Replaces the spike sentinel interception in agentStore.handleEvent with a
real dispatchCateRequest round-trip, adds the side-effect cateExecutors
import, registers each chat's CateControlContext from AgentPanel (under the
top-level CanvasStoreProvider, whose store is the active workspace canvas),
and adds requestCateApproval/resolveCateApproval plus the cate:-prefix branch
in handleApproval.

Also makes the cateControl executor holder hoisted-function-based so the
import-cycle (cateControl -> agentStore -> cateExecutors -> cateControl)
registration is TDZ-safe regardless of module entry order, and polyfills
`self` in the node test env so .test.ts suites that transitively import
terminalRegistry (xterm) can load.
- open_panel/run_in_terminal: send the command to the PTY via condition-based
  waiting (poll until node-pty registers) instead of a 250ms guess; stop relying
  on createTerminal's initialInput, which the store never forwards.
- add cate_set_markdown_preview tool + preview option on cate_reveal_in_editor,
  wired to appStore.setPanelMarkdownPreview (the app already supports preview;
  the agent just had no way to trigger it).
…review

Drives the real renderer dispatcher via window.__cateE2E.cateControl (as an
agent tool would) and observes the live app: run_in_terminal and
open_panel(terminal,command) actually execute in a spawned PTY (asserted via
command output in the xterm buffer), and set_markdown_preview flips the editor
into preview. Adds terminalText + cateControl e2e harness hooks.
@Anton-Horn
Copy link
Copy Markdown
Contributor

We have similar behaviour implicit by letting any agent write to the workspace.json file. Not quite the same but it fits in more into the overall application instead of placing the cateControl system on top. I do see the advantages and I like it being a pi extension. What I don't like so much about it: I don't think the agent should control zoom/camera position. Additionally I noticed when creating a new panel instead of focusing that one it just pan's to a random location.

two things I would like to see here:

  1. make it a bit more basic: remove use actions like panning and zooming. Add in a terminal read (so this is not just a canvas edit feature but also supports agent orchestration). Make sure the tools are all optimised for agents (keep them lean and focused, maybe reduce count, 12 feels very heavy)

  2. include custom tool renderings. This is quite a "flashy" feature so we should display it as it. (custom tool rendering's, not just displaying raw json, approval workflows ). Make sure it fit's to the general UI of the agent panel.

@Anton-Horn
Copy link
Copy Markdown
Contributor

Removed the skill.md for workspaces in #214 so this one is supposed to be the new way how agents can control the workspace/do orchestration. Will be merged once the changes are in and the feature is polished. Thanks!

Artur Karapetyan added 4 commits May 31, 2026 15:08
…en-focus

Addresses review on 0-AI-UG#211 (keep the agent toolset lean + focused).

- Drop camera-control tools: remove `pan_to` (was identical to `focus_panel` —
  both just focusAndCenter) and `zoom` (the agent shouldn't drive zoom/viewport).
- Fold `reveal_in_editor` into `open_panel`: open_panel now focuses+centers what
  it opens and accepts target.preview for markdown, so the dedicated reveal tool
  was redundant.
- Add `read_terminal`: read a terminal panel's recent buffer (visible screen +
  scrollback) as text, so an agent can inspect output it ran via run_in_terminal
  — the other half of terminal orchestration.

Net 13 → 11 tools.

Fix the "new panel pans to a random location" bug: execOpenPanel never focused
the panel it created and estimated the viewport center as the centroid of all
nodes (could be far off-screen). Now it centers on the real viewport (via
viewToCanvas + containerSize) and focusAndCenters the opened panel so it lands
in view.
Addresses review on 0-AI-UG#211 (render tool calls as custom UI, not raw JSON; make the
approval workflow fit the agent panel).

- cate-control calls are now surfaced in the chat thread as compact, accent-tinted
  CateToolCards (icon + verb + summary, expandable to params/result) instead of
  being silent round-trips. Status tracks running → success / denied / error.
- The guarded-mode ApprovalCard renders cate actions with the same icon + a
  human-readable request ("Let Cate run `npm test`?") rather than a raw
  `cate:<action>` name + JSON dump.
- New cateToolDisplay maps (action, params) → { icon, verb, summary } and is
  shared by both the thread card and the approval card so they stay consistent.
@architawr
Copy link
Copy Markdown
Contributor Author

Addressed both points:

1. Leaner, more agent-focused toolset (13 → 11).

  • Removed pan_to (it was literally identical to focus_panel — both just focusAndCenter) and zoom (the agent shouldn't drive zoom/camera).
  • Folded reveal_in_editor into open_panel: open_panel now focuses + centers what it opens and accepts target.preview for markdown, so the dedicated reveal tool was redundant.
  • Added read_terminal — reads a terminal panel's recent buffer (screen + scrollback) as text, so the agent can inspect output it ran via run_in_terminal (the other half of orchestration).
  • Also fixed the "new panel pans to a random location" bug: open_panel no longer estimates the viewport as the centroid of all nodes — it centers on the real viewport (containerSize + viewToCanvas) and focuses the panel it opened.

2. Custom tool renderings + approval workflow.

  • cate-control calls now render in the thread as compact, accent-tinted cards (icon + verb + summary, expandable to params/result) with running → success/denied/error status — instead of being silent round-trips.
  • The guarded-mode approval card renders the action with an icon + human-readable request ("Let Cate run npm test?") rather than cate:<action> + a JSON dump.
  • A shared cateToolDisplay maps (action, params) → { icon, verb, summary } so the thread card and approval card stay consistent.

Merged latest main; typecheck / test / build all green.

@architawr
Copy link
Copy Markdown
Contributor Author

@PaulHorn — would appreciate your eyes on this one too when you have a moment 🙏

@Anton-Horn
Copy link
Copy Markdown
Contributor

I'll check it later. 11 tools still seem like a lot of complexity / token usage. Will give you more detailed feedback then. Thanks for the update.

@architawr architawr marked this pull request as draft May 31, 2026 13:30
Artur Karapetyan added 2 commits May 31, 2026 20:42
Follow-up to review feedback on 0-AI-UG#211 (Anton-Horn: "11 tools still seem like a
lot of complexity / token usage"). Collapses the surface the agent sees from 11
tools to 4, grouped by concept rather than per-verb:

- cate_layout   {op: get|arrange}            — read the canvas / rearrange panels
- cate_panel    {op: open|focus|move|resize|close|preview} — single-panel lifecycle
- cate_browser  {panelId?, url}              — navigate a browser panel (room to grow)
- cate_terminal {op: run|read}               — run a command / read output

Implementation: thin op-routers (execLayout/execPanel/execBrowser/execTerminal)
delegate to the same focused executors as before, so per-op behavior and the
self-protection guards (won't close/move the host agent panel) are unchanged.
classifyCateAction still escalates only destructive (close) and outbound (run a
command, navigate/open a remote url) ops to guarded-mode approval.

`arrange` moved out of panel into `layout` (it's a canvas-wide op, not per-panel);
`navigate` moved into the new `browser` tool; `editor` preview stays a panel op
(thin — can be split out symmetrically with browser later if it grows).

cateToolDisplay + the thread/approval cards updated for the new actions. Shared
protocol, extension tool defs, executors, e2e spec, and all unit tests migrated.
The extension is copied into each workspace's pi-agent extensions dir, where pi
loads it at agent start. installCateControl used copyIfMissing (skip-if-exists),
so once installed the copy never refreshed — after the toolset was consolidated
the agent kept loading the OLD extension and emitted action names (open_panel,
close_panel, …) the renderer dispatcher no longer handles, so every cate tool
call failed with "Unknown or unimplemented action".

This is also a latent prod bug: shipping a new extension version would never
reach users who already had an older copy installed.

Fix: copyIfChanged overwrites the installed copy whenever its bytes differ from
the bundled source. The extension's action protocol is coupled to the renderer,
so the bundled copy is authoritative — there's no user-customization to preserve.
Adds a unit test (missing → write, differing → overwrite, identical → skip).

Note: the e2e suite drives the renderer dispatcher directly (window.__cateE2E),
bypassing the installed extension, which is why it didn't catch the skew.
@architawr architawr marked this pull request as ready for review May 31, 2026 13:57
@architawr
Copy link
Copy Markdown
Contributor Author

@Anton-Horn — update on the complexity / token-usage point.

Consolidated 11 → 4 tools. The agent now sees only:

  • cate_layout {op: get|arrange} — read the canvas / rearrange panels
  • cate_panel {op: open|focus|move|resize|close|preview} — single-panel lifecycle
  • cate_browser {panelId?, url} — navigate a browser panel (room to grow: reload/back/JS)
  • cate_terminal {op: run|read} — run a command / read its output

Thin op-routers delegate to the same focused executors, so per-op behavior and the self-protection guards (won't close/move the agent's own host panel) are unchanged. classifyCateAction still escalates only destructive (close) and outbound (run a command, navigate to a remote url) ops to guarded-mode approval; reads/focus/layout stay safe.

Custom rendering carries over from the earlier round: cate actions render as compact accent cards in the thread (icon + verb + summary, expandable to params/result), and guarded side-effects show a human-readable “Let Cate run ?” prompt instead of raw JSON.

Also fixed a bug found while testing this: the extension is copied into each workspace's pi-agent dir, and the installer used skip-if-exists — so after the toolset changed, the agent kept loading the stale copy and emitted action names the renderer no longer handled (“Unknown or unimplemented action”). Switched to refresh-on-change (copyIfChanged); this also fixes the latent case where a shipped extension update would never reach users who already had a copy.

typecheck / test (541) / build all green. Out of draft now — ready for your detailed look.

architawr and others added 5 commits June 1, 2026 00:18
PR 0-AI-UG#226 removed git, fileExplorer, projectList from PanelType and deleted
createGit/createFileExplorer from AppStore. Remove them from cateExecutors
OPENABLE list and execOpenPanel switch to fix typecheck.
…-control

PR 0-AI-UG#226 dropped git/fileExplorer/projectList panels. Beyond the executor
switch, clean up the rest of cate-control's references to them:
- tool schema description no longer advertises git|fileExplorer as openable
- cateToolDisplay drops their icon entries (+ unused GitBranch/TreeStructure imports)
- tests drop the dead createGit/createFileExplorer mocks and the git example
@Anton-Horn
Copy link
Copy Markdown
Contributor

I'm on it now. Would like to push some code and steer this pr myself. If that's fine with you.

@architawr
Copy link
Copy Markdown
Contributor Author

Ok, let's go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants