gridaco · softmarshmallow · Mar 29, 2026 · Mar 28, 2026 · Mar 28, 2026 · Mar 28, 2026
diff --git a/.agents/skills/cg-perf/SKILL.md b/.agents/skills/cg-perf/SKILL.md
@@ -134,6 +134,36 @@ reports `min/p50/p95/p99/MAX` plus per-stage breakdown and settle cost.
 | `frameloop`       | 16/50/80/120/200/300/500ms interval                 | **Real FrameLoop path** — the only bench that captures stable-frame jank during panning (see below)                |
 | `resize`          | alternating viewport sizes                          | `--resize` flag. Measures `resize()` + `redraw()` cost per cycle (layout rebuild + cache invalidation + repaint)   |
 
+**SurfaceUI overlay measurement (`--overlay`):**
+
+By default, benchmarks measure content rendering only — the SurfaceUI
+overlay (frame titles, node badges, hit regions) is **not** included.
+Pass `--overlay` to include overlay drawing after each content flush,
+matching the real `Application::frame()` pipeline where
+`draw_and_flush_devtools_overlay()` runs after `Renderer::flush()`.
+
+```sh
+# A/B test: content only vs content + overlay (MUST be sequential, never parallel)
+cargo run -p grida-dev --release -- bench ./fixtures/test-grida/L0.grida --scene 22 --frames 200 && \
+cargo run -p grida-dev --release -- bench ./fixtures/test-grida/L0.grida --scene 22 --frames 200 --overlay
+
+# Bulk report with overlay
+cargo run -p grida-dev --release -- bench-report ./fixtures/ --frames 100 --overlay --output overlay.json
+```
+
+The overlay cost is opt-in because it is a devtools feature, not user
+content. Overlay cost scales with visible labeled nodes — viewport
+culling skips off-screen labels, so zoomed-in views are nearly free.
+At fit-zoom on large scenes (yrr-main, 437 labels visible), overlay
+adds ~1.8ms per frame (paragraph layout dominates). At typical editing
+zoom, the cost drops to ~190µs or less.
+
+| Question                                 | Flag           |
+| ---------------------------------------- | -------------- |
+| Is content rendering fast enough?        | (no flag)      |
+| Is the overlay adding visible latency?   | `--overlay`    |
+| Compare content-only vs content+overlay? | Run both, diff |
+
 The `realtime` scenarios use actual `thread::sleep()` between frames
 and simulate the native viewer's 240Hz tick thread + settle countdown.
 These produce frame timings that match what users actually see,
@@ -187,13 +217,19 @@ of scenes, configs, and operations. The naming convention is
 | Are there frame drops during gestures?        | Check `p99` and `MAX` in scenario stats          |
 | Is slow panning janky (stable frame spikes)?  | `frameloop` scenarios (real FrameLoop path)      |
 | Is resize janky?                              | Single-scene GPU bench with `--resize`           |
+| Is the SurfaceUI overlay causing slowdowns?   | A/B with `--overlay` flag on GPU bench           |
 
 ---
 
 ## The Verification Workflow
 
 **Every performance change follows this sequence. No exceptions.**
 
+**Critical: all GPU benchmarks must run sequentially.** Never run two
+bench processes at the same time — GPU contention, CPU cache thrashing,
+and memory bandwidth competition produce unreliable numbers. Chain A/B
+runs with `&&`.
+
 ### Step 1: Baseline
 
 Run the bulk benchmark report BEFORE any changes. Save the JSON output
@@ -418,6 +454,29 @@ loading a scene.
 These are failure modes learned from experience. Each one has caused
 real bugs or wasted time.
 
+### Never run GPU benchmarks in parallel
+
+GPU benchmarks must run **sequentially**, one at a time. Running two
+bench processes simultaneously on the same machine causes GPU pipeline
+contention, CPU cache thrashing, and memory bandwidth competition —
+all of which distort timing data. When doing A/B comparisons (e.g.
+with/without `--overlay`, before/after an optimization), always chain
+the runs with `&&`:
+
+```sh
+# CORRECT — sequential
+cargo run -p grida-dev --release -- bench file.grida --scene 0 --frames 100 && \
+cargo run -p grida-dev --release -- bench file.grida --scene 0 --frames 100 --overlay
+
+# WRONG — parallel (results are unreliable)
+cargo run ... --frames 100 &
+cargo run ... --frames 100 --overlay &
+```
+
+This applies to all GPU benchmark invocations: `bench`, `bench-report`,
+and any combination thereof. Criterion (CPU raster) is less sensitive
+but should still be run alone for best accuracy.
+
 ### GPU and raster backends behave differently
 
 An optimization that helps on GPU may hurt on raster, and vice versa.
@@ -482,6 +541,20 @@ frame gets a cache hit. Without recapture, every frame after settle
 is also a full draw, producing 7fps instead of 100+fps. The capture
 guard should be `if self.backend.is_gpu()` — NOT `if !plan.stable`.
 
+### SurfaceUI overlay is not free
+
+The SurfaceUI overlay (frame titles, node badges, hit regions) runs
+after content `flush()` and requires a second GPU flush. The overlay
+cost is dominated by Skia paragraph creation (one per visible label) —
+viewport culling skips off-screen labels, and style objects are hoisted
+out of the per-label loop. On scenes with many labeled nodes at
+fit-zoom (e.g. yrr-main with 437 labels), the overlay adds ~1.8ms per
+frame. At typical editing zoom, most labels are culled and cost drops
+to ~190µs. Standard benchmarks exclude overlay by default — use
+`--overlay` to include it. If the app feels slower after adding new
+overlay features (badges, labels, decorations), use the A/B overlay
+benchmark to quantify the regression before optimizing.
+
 ### Layout is the cold-start bottleneck, not rendering
 
 For large documents (100K+ nodes), `load_scene` dominates cold start

diff --git a/.agents/skills/vision/SKILL.md b/.agents/skills/vision/SKILL.md
@@ -0,0 +1,171 @@
+---
+name: vision
+description: >
+  Query images with a local Ollama vision model without loading the image
+  into the main agent context. Use when you need to describe a screenshot,
+  check whether rendered content is present, detect overlapping elements, or
+  ask any visual question about a PNG/JPEG/WebP file. Requires Ollama running
+  locally with a vision-capable model (qwen3.5, gemma3, llava, etc.).
+  Script: .agents/skills/vision/scripts/ask.py.
+  Trigger phrases: "describe image", "what does this screenshot show",
+  "does the canvas contain content", "check screenshot visually",
+  "look at this image", "any overlapping elements", "vision query".
+---
+
+# Vision — Local Image Querying via Ollama
+
+Ask natural-language questions about images without passing them to the main
+agent as visual input. Useful for verifying screenshots, annotating assets,
+or building automated checks around visual output.
+
+## When to Use This Skill
+
+- Describing a screenshot for a PR description or user-facing document
+- Checking whether an automated browser run produced visible canvas content
+- Asking "do any elements overlap?" on a rendered output
+- Any question where the answer is in the pixels but you don't want to use
+  vision tokens in the main context
+
+---
+
+## Quick Reference
+
+All commands use `uv run` — dependencies are installed automatically.
+
+```sh
+SCRIPT=.agents/skills/vision/scripts/ask.py
+
+# health check (fast, no image, confirms Ollama + model respond)
+uv run $SCRIPT --ping
+
+# system info — memory, storage, installed models
+uv run $SCRIPT --info
+uv run $SCRIPT --memory
+uv run $SCRIPT --storage
+
+# describe an image (default prompt)
+uv run $SCRIPT path/to/image.png
+
+# explicit shortcut
+uv run $SCRIPT path/to/image.png describe
+
+# custom question
+uv run $SCRIPT path/to/image.png \
+  --prompt "Do you see any overlapping UI elements?"
+
+uv run $SCRIPT canvas.png \
+  --prompt "Does this canvas contain any designed content, or is it empty?"
+
+# use a specific model
+uv run $SCRIPT image.png --model gemma3
+
+# list available vision-capable models
+uv run $SCRIPT --list-models
+```
+
+---
+
+## Prerequisites
+
+Ollama must be running locally. The script connects to `http://localhost:11434`
+and fails immediately if it cannot reach it.
+
+```sh
+# start Ollama (if not already running)
+ollama serve
+
+# install a vision model (first time, pick one)
+ollama pull qwen3.5        # recommended — best low-cost vision as of 2026-03
+ollama pull gemma3         # alternative
+ollama pull llava          # widely available fallback
+```
+
+The script **does not install models**. If no vision model is available it
+prints the list of installed models and a `pull` suggestion, then exits.
+
+**`uv` is required** to run the script (handles dependency installation
+automatically). No `requirements.txt` or manual `pip install` needed.
+
+---
+
+## Model Selection
+
+When `--model` is not specified the script picks the first installed model
+from this preference list (updated periodically):
+
+| Priority | Model             | Notes                                    |
+| -------- | ----------------- | ---------------------------------------- |
+| 1        | `qwen3.5`         | Best low-cost vision model as of 2026-03 |
+| 2        | `qwen2.5vl`       | Previous generation, still strong        |
+| 3        | `gemma3`          | Strong alternative, multimodal           |
+| 4        | `llama3.2-vision` | Meta vision variant                      |
+| 5        | `llava`           | Widely installed fallback                |
+
+A model is considered vision-capable when its name contains: `qwen`, `gemma3`,
+`vl`, `vision`, `llava`, `moondream`, `minicpm-v`, or similar fragments.
+Non-vision text models are filtered out automatically.
+
+---
+
+## System Info
+
+Before running a heavy query, check whether the machine has enough resources.
+This is optional — the script does not enforce limits — but useful context
+for deciding whether to proceed or skip.
+
+```sh
+uv run $SCRIPT --info      # memory + storage + model list
+uv run $SCRIPT --memory    # just memory
+uv run $SCRIPT --storage   # just storage
+```
+
+Tip: on machines with ≤8 GB RAM, large vision models may cause swapping or
+OOM. Consider using a smaller model variant or skipping the query.
+
+---
+
+## Behavior
+
+- **Fails fast** if Ollama is unreachable or no vision model is installed.
+  Exit code is non-zero; the error message includes a `hint` or `pull` command.
+- **Sequential only** — Ollama is a single-worker process. Never call `ask.py`
+  in parallel (e.g. two concurrent tool calls). Queue calls one at a time.
+- **No side effects** beyond the local Ollama process.
+- **Auto-installs deps** via `uv` inline script metadata (PEP 723). Only
+  dependency is the `ollama` Python package.
+- Supported formats: `.png`, `.jpg`, `.jpeg`, `.webp`, `.gif`, `.bmp`.
+
+---
+
+## Typical Agent Workflow
+
+1. A tool (browser automation, screenshot capture, golden renderer) writes
+   an image to disk.
+2. Call `ask.py` with a targeted prompt suited to the task.
+3. Parse the text response to decide the next action.
+
+```sh
+# Quick sanity check first
+uv run $SCRIPT --ping
+
+# Verify a browser screenshot has content before including it in a doc
+uv run $SCRIPT /tmp/preview.png \
+  --prompt "Answer with YES or NO: does this screenshot show any visible UI content, shapes, or text?"
+
+# Describe a golden render for a PR description
+uv run $SCRIPT crates/grida-canvas/goldens/progressive_blur.png \
+  --prompt "Describe what visual effect is shown. Be specific about blur, colors, and shapes."
+```
+
+---
+
+## Troubleshooting
+
+| Symptom                          | Cause                            | Fix                                       |
+| -------------------------------- | -------------------------------- | ----------------------------------------- |
+| `cannot reach Ollama`            | Ollama not running               | `ollama serve`                            |
+| `no vision-capable models found` | Only text models installed       | `ollama pull qwen3.5`                     |
+| `model 'X' is not available`     | Model name typo or not installed | `--list-models` to see what's installed   |
+| Slow response                    | Large model on CPU               | Use a smaller variant (e.g. `qwen3.5:3b`) |
+| Vague or wrong answer            | Generic prompt                   | Write a more specific `--prompt`          |
+| `'ollama' package not found`     | Not using `uv run`               | Run with `uv run ask.py` instead          |