diff --git a/.agents/prompts/cg-htmlcss-feature.md b/.agents/prompts/cg-htmlcss-feature.md
new file mode 100644
index 000000000..b67bd08e6
--- /dev/null
+++ b/.agents/prompts/cg-htmlcss-feature.md
@@ -0,0 +1,316 @@
+# cg-htmlcss — feature loop prompt
+
+**What this is.** A pastable prompt template for driving a single CSS
+feature forward in the cg htmlcss renderer. Paste the template at the
+bottom into a new task; the reference above it is context an agent
+can read to follow the loop honestly.
+
+**Why this is a prompt and not a skill.** The 5-phase loop is
+deliberately heavy — audit + ground + fixture + implement + verify.
+It's overkill for small fixes, and it's already a conductor over
+`/research`, `/fixtures`, `/cg-reftest`, which auto-trigger correctly
+on their own. Opt-in invocation is right: paste it when you want the
+full cycle; skip it for paper-cuts.
+
+**Lifecycle.** Expect this file to grow as new divergence patterns
+surface. It will likely go stale in parts once htmlcss hits
+Chromium-parity on L0/L1; treat the _phase structure_ as durable and
+the _property-specific callouts_ as advisory.
+
+---
+
+## The five phases
+
+```text
+┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
+│ 1. AUDIT │→ │2. GROUND │→ │3. FIXTURE│→ │ 4. IMPL │→ │5. VERIFY │
+└──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘
+ │ │
+ └───── ← ─── loop ← ─── score < floor ← ─── diff ← ───────┘
+```
+
+Each phase has a **question it answers**, a **deliverable**, and an
+**exit criterion**. Don't skip forward; don't linger past the exit
+criterion. The loop closes at verify — if the score is below the
+gate, return to phase 3 or 4 with a specific hypothesis, not a vibe.
+
+### 1. Audit — "what's the actual state of this feature?"
+
+**Question.** Where is the feature on the cg side today? What
+renders wrong, what doesn't render at all, what renders
+coincidentally-correctly but by the wrong path?
+
+**Actions.**
+
+- Scan `crates/grida-canvas/src/htmlcss/` for the property name in
+ stylo enum mapping, paint emit, layout feed. A property can be
+ parsed-but-dropped, emitted-but-wrong, or unhandled — each has a
+ different fix shape.
+- Enumerate existing fixtures that touch the feature
+ (`fixtures/test-html/L0/`). Run them under `L0.coverage` and
+ record current similarity per fixture. This is the before-number.
+- Check `docs/wg/feat-2d/htmlcss.md` and any related design notes
+ for a prior decision or deliberate gap.
+- List sibling properties likely to break the same way (e.g.
+ `border-radius` %-values implied `border-image-slice` %-values).
+
+**Deliverable.** A short audit note inside the task prompt or the
+PR draft:
+
+- _Current support level_: not-parsed / parsed-but-dropped /
+ partial / Chromium-parity-except-X.
+- _Fixtures touching it_: list with current similarity scores.
+- _Priority bucket_: easy-and-important / easy-low-value /
+ hard-important / hard-low-value. Pick from the top-left by
+ default; only go hard-important when called out.
+
+**Exit when.** You can state the feature's current renderer state
+in one paragraph with file references. If you can't, you don't
+know enough yet — read more, don't guess.
+
+### 2. Ground — "how do real engines solve this?"
+
+**Question.** What's the canonical implementation strategy for
+this feature in a mature engine? We are not inventing; we are
+adapting.
+
+**Actions.** Invoke `/research`. Three engines are the usual
+references:
+
+- **Servo + stylo** — Rust, most readable. Especially useful for
+ parsing, cascade, inheritance, computed-value rules.
+- **Chromium / Blink** — C++. Authoritative for layout and paint
+ divergence calls. The renderer we diff against.
+- **WebKit** — C++. Third voice; useful when Blink has
+ controversial behavior (Safari-only bugs / features).
+
+For a new property, read the **spec first** (CSS Backgrounds,
+CSS Display, CSS Values 4, etc.). Then look up:
+
+- How stylo represents the property's computed value.
+- How Blink paints or lays out against that representation.
+- What WPT section exercises it (for free fixtures later).
+
+**Deliverable.** A research note — either inline in the PR
+description or under `docs/wg/feat-2d/` if substantial — with:
+
+- The spec section(s) that govern behavior.
+- The 3–6 line summary of how stylo/Blink structure the solution.
+- The explicit deviation, if any, and why.
+
+**Exit when.** You can defend the implementation shape by pointing
+at prior art, not just "it compiles and the fixture passes." If
+the only justification is the fixture, you've over-fit.
+
+### 3. Fixture — "what's the smallest test that proves it?"
+
+**Question.** What HTML/CSS input demonstrates the feature
+unambiguously, and what does the ideal rendered output look like?
+
+**Actions.** Invoke `/fixtures` for authoring rules; `/cg-reftest`
+for the suite manifest. In short:
+
+- One concept per file. `paint--.html` naming.
+- Probe-friendly palette (≤3 colors, round coordinates) when the
+ feature is pixel-precision rather than paint-rich.
+- **Paint vs. layout decision.** Paint fixtures fix body size to
+ the preset (via `min-height`); layout fixtures let content size
+ itself and carry an explicit `viewport` in the suite entry.
+ See `fixtures/test-html/README.md`.
+- Inject `hide-text.css` via `extra_css` when text is incidental
+ (labels for humans, not the subject under test). This is the
+ single biggest lever against noise.
+- WPT fixtures are fair game — prefer pulling an established WPT
+ test into the suite over authoring one from scratch when the
+ section is mature.
+
+**Deliverable.**
+
+- One or more fixtures under `fixtures/test-html/L0/`.
+- Entries in `fixtures/test-html/suites/L0.coverage.json`. Only
+ put in `L0.exact.json` after verify phase confirms 100.00%.
+- For layout fixtures: the measured `viewport.height` from the cg
+ natural cull.
+
+**Exit when.** The fixture runs through both producers and
+produces PNGs of identical dimensions. Dimension mismatch → stop;
+the suite config is wrong and the score will be zero.
+
+### 4. Implement — "what code change realizes the behavior?"
+
+**Question.** What is the minimum set of edits in
+`crates/grida-canvas/src/htmlcss/` to make the fixture match?
+
+**Actions.**
+
+- Touch the smallest surface that can possibly work. Avoid
+ "refactor + feature" in one commit; the reftest cannot tell you
+ which change caused which delta.
+- Trace the pipeline end-to-end for the property:
+ parse → compute → layout feed → paint. A feature can fail at
+ any stage; diagnose before editing.
+- Add unit tests where behavior is data-assertable (computed
+ value, resolved length, layout position). Data tests are free
+ and catch regressions the reftest can't (e.g. "this resolves
+ to `12px` in _both_ Chromium and us, for the right reason").
+- When in doubt, mirror the Blink / stylo structure. Deviations
+ cost reviewer attention; prior-art parity is free.
+
+**Deliverable.**
+
+- Code change scoped to the feature.
+- Any new data tests for the computed-value surface.
+- A one-line entry in the PR description for each user-facing
+ behavior change, written in spec terms, not implementation
+ terms.
+
+**Exit when.** `cargo check -p cg` is clean, existing tests pass,
+and the fixture renders through `golden_htmlcss --suite` without
+error. Similarity score is measured in phase 5 — do not gate on
+it here.
+
+### 5. Verify — "does it actually match Chromium?"
+
+**Question.** Is the rendered output Chromium-parity at the
+fixture's tolerance gate?
+
+**Actions.** This is `/cg-reftest`'s core loop. For each fixture
+in the change:
+
+1. Render expecteds (Playwright Chromium) into
+ `target/refbrowser//expected`.
+2. Render actuals (`cargo run -p cg --example golden_htmlcss --
+--suite …`).
+3. Diff with `@grida/reftest`, threshold 0 (the strict default).
+4. Read similarity against the suite's `gate.floor`.
+
+**Don't trust the score naively** — see "Reading the score" in the
+cg-reftest skill. A 96% score on a sparse fixture can mask a
+completely broken subject. Eyeball the diff PNG every time. A
+single round of verification without visual inspection is not
+verification.
+
+**Close the loop:**
+
+- Score ≥ `gate.floor`? Promote the fixture to `L0.exact.json`
+ if it reached 100.00%; otherwise leave in coverage and document
+ the residual delta in the PR description.
+- Score < floor? Return to phase 3 (fixture too noisy / wrong
+ subject) or phase 4 (renderer bug) with a specific hypothesis.
+ Do _not_ lower the gate to fit the result; the gate exists so
+ regressions are loud.
+
+**Deliverable.** The PR description, written honestly:
+
+- Before/after similarity numbers for every affected fixture.
+- Diff PNGs attached or linked for any score < 1.0.
+- The specific divergence surface (rounding, AA, layout math,
+ etc.) if below 100.00%. "Renderer choice differs from Blink at
+ " beats "close enough."
+
+**Exit when.** The PR description can be read by someone who has
+never seen the code and they know exactly what's now supported,
+what's still broken, and what the score proves.
+
+---
+
+## Handoffs and artifacts
+
+The phases are designed so an agent can stop, a second agent can
+pick up, and no context is lost. The durable artifacts:
+
+| Phase | Artifact | Location |
+| --------- | -------------------------------------------------------- | ------------------------------------------------------ |
+| Audit | Current-state note, priority bucket | PR description / task prompt |
+| Ground | Research note (spec + engine cross-ref) | PR description or `docs/wg/feat-2d/` |
+| Fixture | `.html` fixture(s), suite entries, viewport measurement | `fixtures/test-html/L0/`, `fixtures/test-html/suites/` |
+| Implement | Code change, data tests, behavior summary | `crates/grida-canvas/src/htmlcss/` |
+| Verify | Before/after scores, diff PNG review, divergence surface | PR description |
+
+If a phase's artifact is missing, the phase isn't done — even if
+the code "works."
+
+---
+
+## Gate policy — the part that makes automation safe
+
+The only reason this loop can be automated is that phase 5 has a
+**numeric, unambiguous, byte-exact** pass condition. Everything
+upstream is advisory; verify is the truth.
+
+- `L0.exact.json`: `gate.floor = 1.0`, `threshold = 0`, `aa = off`.
+ Any regression is a real renderer change we made differently
+ from Blink. No tolerance inflation — ever.
+- `L0.coverage.json`: informational scores, no gate. Landing a
+ fixture here is "we know about this case and intend to fix it."
+ Promoting to exact is "we now match Blink."
+
+Automation rules downstream of this prompt (CI gating, auto-merge,
+etc.) must assert on the `report.json` emitted by `@grida/reftest`
+and **not** on free-text agent assertions. The agent's job is to
+drive the loop; the report is the contract.
+
+### What "destructive" means here
+
+A change is destructive if it:
+
+- Lowers `gate.floor` in `L0.exact.json`.
+- Removes an entry from `L0.exact.json` without a corresponding
+ `coverage` entry (or documented reason).
+- Increases `--threshold` or enables `--aa` to absorb real
+ divergence.
+- Suppresses a fixture to dodge a failing score.
+
+None of these are acceptable without explicit human approval. The
+loop fails loudly instead.
+
+---
+
+## Anti-patterns
+
+| Anti-pattern | Why it fails | Instead |
+| ----------------------------------------------- | ------------------------------------------------------------------------------------------- | ------------------------------------------------------------------ |
+| Skipping audit, starting with "fix this bug" | The bug is a symptom; the broken pipeline stage may be a different property. | Trace parse→compute→layout→paint first. Name the stage. |
+| Skipping ground, implementing from intuition | CSS is full of non-obvious spec requirements. "Looks right" to a human ≠ spec-correct. | Read the spec. Cross-ref one real engine. |
+| Combining refactor + feature in one PR | Reftest deltas can't be attributed. | Land the refactor alone first (score must not drop). |
+| Raising threshold to "just pass" | Hides real bugs. Turns the harness into a rubber stamp. | Fix the divergence. If out of scope, document + leave in coverage. |
+| Using text-heavy fixtures to test non-text feat | Font shaping noise dominates the score; you're measuring the wrong thing. | Inject `hide-text.css`. Or use probe-friendly fixtures. |
+| Promoting to `exact` at 99.xx% | The exact suite is a byte-exact contract. Near-passes belong in coverage with a delta note. | Wait for 100.00%. Or fix the residual. |
+| Claiming "verified" without reading the diff | A similarity score is a coarse index; the diff image is the truth. | Eyeball every sub-100 diff. Record the specific divergence. |
+| Inventing new fixtures when WPT covers it | Duplicates work; WPT has reviewed spec-intent pass criteria. | Import the WPT fixture; cite it in the suite entry. |
+
+---
+
+## The template — paste this to kick off a cycle
+
+Fill in the brackets. The agent you hand it to should produce all
+five artifacts before declaring done. Expect to run the loop in
+passes (audit+ground+fixture → implement → verify), with a
+checkpoint at each pass that future-you or a reviewer can read
+without the conversation.
+
+```text
+Drive the htmlcss feature loop for: .
+Follow .agents/prompts/cg-htmlcss-feature.md.
+
+Scope:
+- Feature:
+- Hypothesis:
+- Expected:
+
+Produce, in order:
+
+1. Audit note: current support level, file references, before-scores.
+2. Ground note: spec section(s), stylo/Blink strategy summary.
+3. Fixture(s): `.html` + suite entries. Paint or layout? Declare it.
+4. Implementation: minimal diff. Data tests where assertable.
+5. Verify report: before/after similarity per fixture, diff PNG
+ review for any sub-1.0 score, promoted fixtures listed.
+
+Gate: L0.exact must stay at floor 1.0, threshold 0, aa off. Do not
+relax the gate. If the feature doesn't reach 100.00%, leave it in
+coverage with a specific divergence-surface note.
+
+Use /research for phase 2, /fixtures for phase 3, /cg-reftest for
+phases 3 and 5.
+```
diff --git a/.agents/skills/cg-reftest/SKILL.md b/.agents/skills/cg-reftest/SKILL.md
index 998683440..5ecd4b19d 100644
--- a/.agents/skills/cg-reftest/SKILL.md
+++ b/.agents/skills/cg-reftest/SKILL.md
@@ -28,25 +28,26 @@ How to design, name, and review visual rendering tests in this repo.
Use these terms precisely. Misusing them erodes trust in test results.
-| Term | Definition |
-| ---------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| **Reftest** | A test that compares renderer output against an **independent reference** (oracle) whose correctness is established outside this project — e.g. a W3C-provided PNG for an SVG test case. The oracle is the source of truth; a mismatch means our renderer is wrong. |
-| **Independent reference / oracle** | A rendering produced by a separate, trusted implementation or defined by a specification. We do not control its content. |
-| **Golden test** | A test that compares renderer output against a **previously accepted snapshot** produced by our own renderer. There is no external truth — the golden file _is_ the expected output because a human reviewed and approved it. Also called a snapshot test. |
-| **Snapshot test** | Synonym for golden test. The snapshot is a frozen output that we assert has not changed. |
-| **Render regression test** | Any test whose purpose is to detect _unintended changes_ in rendering output. Golden tests are regression tests. Reftests are correctness tests. |
-| **Pixel diff** | Byte-level comparison of two raster images. A single differing channel value is a failure (at zero tolerance). |
-| **Perceptual diff** | Comparison in a perceptual color space (e.g. YIQ via the `dify` crate). Weights differences by human visual sensitivity. More forgiving than raw pixel diff but still quantifiable. |
-| **rendiff** | Rust crate (`rendiff` v0.2) for histogram-based pixel diffing. Computes a per-channel difference histogram; thresholds are expressed as `[(max_diff, max_count), ...]` pairs. Used in `flatten_rendiff.rs` for equivalence tests. Dep in `crates/grida-canvas/`. |
-| **dify** | Rust crate for perceptual image comparison in YIQ color space. Used by `grida-dev reftest` for SVG reftests. Supports `--threshold` and `--aa` (anti-aliasing detection) flags. |
-| **pixelmatch** | Pure-JS perceptual image comparison library. YIQ-based, AA-aware. Used by `@grida/reftest`. Zero native deps; same conceptual model as dify, slightly different threshold semantics — see parity notes below. |
-| **`@grida/reftest`** | General-purpose, language-agnostic TS reftest CLI + library at `packages/grida-reftest/`. Takes two directories of PNGs, diffs, scores, writes the same bucket layout and JSON report as the Rust `grida-dev reftest`. Does NOT render anything — producers upstream. |
-| **`grida-dev reftest`** | Rust reftest runner at `crates/grida-dev/src/reftest/`. SVG-specific: renders SVG via our own cg pipeline, then diffs against a reference PNG. Canonical for SVG. For non-SVG formats, use `@grida/reftest` with an upstream renderer. |
-| **refig** | Short for "Figma reftest." Fixture suites under `fixtures/local/refig/` containing `.fig` + `document.json` + `images/` + `exports/` (oracle PNGs from Figma's Images API). Consumed by a TS render step + `@grida/reftest`. See `fixtures/local/refig/README.md`. |
-| **Tolerance / fuzz** | A configured threshold below which pixel differences are ignored. Expressed as a histogram threshold (rendiff) or a YIQ distance (dify / pixelmatch). Required when rasterization is non-deterministic across platforms. |
-| **Data test** | A test that asserts on the scene graph or computed values directly — no rendering needed. E.g. bounding box, resolved transform matrix, computed style. The cheapest possible assertion. |
-| **Probe test** | A test that asserts correctness by reading pixel values at specific coordinates in the rendered output. Requires a purpose-built fixture with a minimal color palette and documented probe points. No full-image comparison needed. |
-| **Probe-friendly fixture** | A fixture explicitly designed for probe testing: minimal colors, no decorative elements, shapes at known coordinates. Often accompanied by a `.probe.json` file declaring expected pixel values at specific points. |
+| Term | Definition |
+| ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| **Reftest** | A test that compares renderer output against an **independent reference** (oracle) whose correctness is established outside this project — e.g. a W3C-provided PNG for an SVG test case. The oracle is the source of truth; a mismatch means our renderer is wrong. |
+| **Independent reference / oracle** | A rendering produced by a separate, trusted implementation or defined by a specification. We do not control its content. |
+| **Golden test** | A test that compares renderer output against a **previously accepted snapshot** produced by our own renderer. There is no external truth — the golden file _is_ the expected output because a human reviewed and approved it. Also called a snapshot test. |
+| **Snapshot test** | Synonym for golden test. The snapshot is a frozen output that we assert has not changed. |
+| **Render regression test** | Any test whose purpose is to detect _unintended changes_ in rendering output. Golden tests are regression tests. Reftests are correctness tests. |
+| **Pixel diff** | Byte-level comparison of two raster images. A single differing channel value is a failure (at zero tolerance). |
+| **Perceptual diff** | Comparison in a perceptual color space (e.g. YIQ via the `dify` crate). Weights differences by human visual sensitivity. More forgiving than raw pixel diff but still quantifiable. |
+| **rendiff** | Rust crate (`rendiff` v0.2) for histogram-based pixel diffing. Computes a per-channel difference histogram; thresholds are expressed as `[(max_diff, max_count), ...]` pairs. Used in `flatten_rendiff.rs` for equivalence tests. Dep in `crates/grida-canvas/`. |
+| **dify** | Rust crate for perceptual image comparison in YIQ color space. Used by `grida-dev reftest` for SVG reftests. Supports `--threshold` and `--aa` (anti-aliasing detection) flags. |
+| **pixelmatch** | Pure-JS perceptual image comparison library. YIQ-based, AA-aware. Used by `@grida/reftest`. Zero native deps; same conceptual model as dify, slightly different threshold semantics — see parity notes below. |
+| **`@grida/reftest`** | General-purpose, language-agnostic TS reftest CLI + library at `packages/grida-reftest/`. Takes two directories of PNGs, diffs, scores, writes the same bucket layout and JSON report as the Rust `grida-dev reftest`. Does NOT render anything — producers upstream. |
+| **`grida-dev reftest`** | Rust reftest runner at `crates/grida-dev/src/reftest/`. SVG-specific: renders SVG via our own cg pipeline, then diffs against a reference PNG. Canonical for SVG. For non-SVG formats, use `@grida/reftest` with an upstream renderer. |
+| **refig** | Short for "Figma reftest." Fixture suites under `fixtures/local/refig/` containing `.fig` + `document.json` + `images/` + `exports/` (oracle PNGs from Figma's Images API). Consumed by a TS render step + `@grida/reftest`. See `fixtures/local/refig/README.md`. |
+| **refbrowser** | Short for "headless-browser reftest." HTML/CSS fixtures under `fixtures/test-html/L0/` rendered by Playwright Chromium as the oracle vs. our `cg` htmlcss renderer. Producer script: `.agents/skills/cg-reftest/scripts/refbrowser_render.ts`; diff via `@grida/reftest`. |
+| **Tolerance / fuzz** | A configured threshold below which pixel differences are ignored. Expressed as a histogram threshold (rendiff) or a YIQ distance (dify / pixelmatch). Required when rasterization is non-deterministic across platforms. |
+| **Data test** | A test that asserts on the scene graph or computed values directly — no rendering needed. E.g. bounding box, resolved transform matrix, computed style. The cheapest possible assertion. |
+| **Probe test** | A test that asserts correctness by reading pixel values at specific coordinates in the rendered output. Requires a purpose-built fixture with a minimal color palette and documented probe points. No full-image comparison needed. |
+| **Probe-friendly fixture** | A fixture explicitly designed for probe testing: minimal colors, no decorative elements, shapes at known coordinates. Often accompanied by a `.probe.json` file declaring expected pixel values at specific points. |
---
@@ -266,6 +267,301 @@ is the threshold where the renderer needs attention.
`document.json` — not from a suite-wide viewport. The render step
must honor each node's preset.
+### HTML/CSS — the refbrowser reftest pipeline
+
+HTML/CSS fixtures have **no pre-baked oracle** — the oracle is a real
+browser engine. Like refig, refbrowser renders the same fixture in two
+places and diffs the PNGs; unlike refig, both renders are reproducible
+locally (no cloud round-trip).
+
+```
+fixtures/test-html/
+├── L0/.html ── fixtures
+├── _reftest/hide-text.css ── shared helper stylesheets
+└── suites/
+ ├── L0.exact.json ── must pass 100.00%; CI gate
+ └── L0.coverage.json ── aspirational scope; tracks progress
+
+ │
+ ├── cargo run -p cg --example golden_htmlcss -- --suite
+ │ └─► $TMPDIR/grida-htmlcss-goldens/.png (cg actual)
+ │
+ └── refbrowser_render.ts --suite
+ └─► target/refbrowser//expected/.png (Chromium oracle)
+
+ ▼
+ reftest --actual-dir … --expected-dir … --threshold 0
+ └─► target/reftests//report.json + buckets
+```
+
+**Oracle**: headless Chromium via Playwright. Chromium's Blink is the
+reference implementation for most CSS features; divergence from Blink
+is a gap in our `cg` htmlcss pipeline (or a known difference documented
+in `docs/wg/feat-2d/htmlcss.md`).
+
+> **See also: web-platform-tests (WPT).** The W3C's
+> [wpt.live](https://wpt.live) suite is the standards-body reftest
+> harness — same concept as refbrowser, but cross-engine (Blink,
+> WebKit, Gecko) and backed by spec-author-written fixtures with
+> explicit pass criteria. Consider pulling WPT fixtures into
+> `fixtures/test-html/` when a CSS feature has a mature WPT section
+> and you want spec-conformance signal rather than just "matches
+> Chromium." Out of scope for this skill today; refbrowser is the
+> faster local loop.
+
+#### Suites: `L0.exact` vs `L0.coverage`
+
+Everything is driven by **suite JSON files** at
+`fixtures/test-html/suites/`. A suite enumerates fixtures, their
+per-fixture render config, and the gate policy.
+
+| Suite | What it contains | Gate |
+| ------------------ | --------------------------------------------------------------------------------------------- | -------------------------- |
+| `L0.exact.json` | Fixtures currently at 100.00% byte-exact parity with Chromium. Any drop is a real regression. | `floor: 1.0`, strict diff. |
+| `L0.coverage.json` | All aspirational L0 fixtures — the full backlog. Scores land wherever they land. | Informational only. |
+
+**Promoting a fixture to `exact`** — once a fixture reaches 100.00%
+against the current suite config, move its entry from `coverage` →
+`exact`. Do **not** lower the exact suite's floor to fit new entries;
+the bar exists so regressions are loud.
+
+Per-fixture `.reftest.json` sidecars **do not exist** anymore. All
+config lives in the suite file.
+
+#### Suite JSON shape
+
+```json
+{
+ "name": "L0.exact",
+ "description": "Byte-exact fixtures; any drop = regression.",
+ "gate": { "threshold": 0, "aa": false, "floor": 1.0 },
+ "defaults": {
+ "wait_for": ["fonts", "networkidle"],
+ "extra_css": ["../_reftest/hide-text.css"],
+ "full_page": true
+ },
+ "fixtures": [
+ {
+ "path": "../L0/box-dimensions.html",
+ "viewport": { "width": 600, "height": 522 }
+ }
+ ]
+}
+```
+
+- `defaults` — applied to every fixture. Each fixture entry can override any field.
+- `fixtures[].path` and every `extra_css[]` path resolve **relative to the suite file**.
+- `viewport.height` must match cg's cull height for the diff to succeed; render cg once and read `WxH` to calibrate.
+- `gate.threshold` / `gate.aa` are inputs to the pixelmatch diff; `gate.floor` is the aggregate pass bar on similarity.
+
+#### The three-step pipeline
+
+**1. Render expecteds (browser oracle)**
+
+```sh
+# one-time: install Chromium for Playwright
+pnpm --filter @grida/reftest exec playwright install chromium
+
+# render the whole suite
+pnpm --filter @grida/reftest exec tsx \
+ .agents/skills/cg-reftest/scripts/refbrowser_render.ts \
+ --suite fixtures/test-html/suites/L0.exact.json \
+ --out-dir target/refbrowser/L0.exact/expected
+```
+
+Ad-hoc single-file render (no suite, defaults only) — useful while authoring a fixture:
+
+```sh
+pnpm --filter @grida/reftest exec tsx \
+ .agents/skills/cg-reftest/scripts/refbrowser_render.ts \
+ --fixture fixtures/test-html/L0/paint-background-solid.html \
+ --out-dir /tmp/refbrowser-verify
+```
+
+**2. Render actuals (our pipeline)** — the `golden_htmlcss` example
+reads the same suite JSON, resolves `extra_css` relative to the suite
+file, and applies each stylesheet via
+`htmlcss::with_extra_stylesheets` before rendering, so the cascade is
+symmetric with Chromium.
+
+```sh
+cargo run -p cg --example golden_htmlcss -- \
+ --suite fixtures/test-html/suites/L0.exact.json
+
+mkdir -p target/refbrowser/L0.exact/actual
+cp "${TMPDIR:-/tmp}/grida-htmlcss-goldens/"*.png target/refbrowser/L0.exact/actual/
+```
+
+**3. Diff via `@grida/reftest`** — format-agnostic, same bucket layout
+and `report.json` schema as the Rust and refig runners.
+
+Default refbrowser diff: **`--threshold 0`** (pixelmatch strictest,
+AA off). Pass each fixture's similarity against the suite's
+`gate.floor` — for `L0.exact`, that's `1.0` (100.00% byte-exact).
+
+```sh
+pnpm --filter @grida/reftest exec reftest \
+ --actual-dir target/refbrowser/L0.exact/actual \
+ --expected-dir target/refbrowser/L0.exact/expected \
+ --output-dir target/reftests/L0.exact \
+ --bg white \
+ --threshold 0
+```
+
+> **Gate enforcement is not yet wired into the CLI.** Today, read
+> `report.json` and assert every `tests[].similarity_score ≥
+gate.floor` in a wrapper script or CI step. A `--suite` flag on
+> `@grida/reftest` that does this automatically is a pending
+> follow-up.
+
+Output: `S99/S95/S90/S75/err/` bucket directories + `report.json`.
+Pass bar: the suite's `gate.floor`. For `L0.exact`, anything below
+100.00% is a real divergence from Blink (rounding policy, layout
+math, AA emission, etc.) — not noise. See "Reading the score" below.
+
+### Reading the score — do not trust it naively
+
+The similarity score is `1 - diff_pixels / scoring_pixels`, where
+`scoring_pixels ≈ width × height` of the screenshot. **The denominator
+is the whole canvas, not the subject under test.**
+
+This has two consequences you must internalize before reading any
+report:
+
+1. **Background dominates the score.** A fixture that paints a
+ 100×100 subject on a 600×800 canvas has 92% background. A renderer
+ that emits _nothing_ for the subject still scores ~92%. A
+ renderer that paints the subject at 50% accuracy scores ~96%.
+ Neither number means what it naively looks like.
+2. **Small fixtures inflate. Full-bleed fixtures are honest.** A
+ card-in-corner composition will always look "good" on the score
+ even when broken; a composition that fills the viewport gives
+ numeric feedback proportional to real error.
+
+**Fixture-authoring rule:** size the fixture so the subject under
+test fills as much of the canvas as practical. Viewport height
+tuned to the subject's bounding box (via the suite entry's
+`viewport.height`) is the usual lever. Padding/margins around the
+subject are scoring dead weight — use them only when the test is
+_about_ spacing.
+
+**Reviewing rule:** never report a similarity number without
+eyeballing the diff PNG. A 96% score on a sparse fixture and a 96%
+score on a full-bleed fixture are _orders of magnitude_ apart in
+severity. The diff image is the source of truth; the score is a
+coarse index.
+
+For a true "fraction of the subject that matches," author a
+probe-friendly fixture (see the probe test section) and assert on
+specific pixels, or mask the background to transparent so
+`mask: alpha` counts only subject pixels. Plain refbrowser scores
+cannot give you that signal.
+
+**Per-fixture fields inside a suite entry** — all optional,
+defaults shown; any field set on an entry overrides `defaults`.
+
+```json
+{
+ "path": "../L0/.html",
+ "viewport": { "width": 600, "height": 800 },
+ "wait_for": ["fonts", "networkidle"],
+ "extra_css": [],
+ "full_page": true
+}
+```
+
+- `viewport` — Chromium viewport (px). Set height to match cg's cull
+ height; mismatched dims score 0.0 at diff time (`@grida/reftest`
+ requires identical dimensions).
+- `wait_for` — `"fonts"` awaits `document.fonts.ready`, `"networkidle"`
+ awaits 500ms of no-network-activity.
+- `extra_css` — CSS files to inject into **both** sides. Paths resolve
+ relative to the suite file. Playwright applies them via `addStyleTag`;
+ cg applies them via `htmlcss::with_extra_stylesheets` before rendering,
+ so the cascade is symmetric. Fields only meaningful to Chromium
+ (`viewport`, `wait_for`, `full_page`) are ignored by cg.
+- `full_page` — capture full scrollable area (default) vs. viewport.
+
+**Pre-built helper stylesheets** under `fixtures/test-html/_reftest/`:
+
+| File | Effect |
+| --------------- | ------------------------------------------------------------------------------------------------------------------------------ |
+| `hide-text.css` | `color: transparent` + `line-height: 1`. Zeros glyph coverage and pins line-box height. Use when a fixture isn't testing text. |
+
+Add more helpers here as divergence patterns emerge. Keep each one
+scoped to a single concern (hide text, normalize scrollbars, force
+web fonts, etc.) so suites can compose them.
+
+**When to reach for `hide-text.css`** — any fixture whose subject is
+paint, layout, box model, flex, grid, or positioning. The text in
+those fixtures is typically decorative labels; its glyph rendering
+and `line-height: normal` metrics diverge between Blink and Skia and
+will dominate the diff otherwise.
+
+**When NOT to use it** — fixtures whose subject IS text:
+`text-decoration`, `text-shadow`, `text-align`, bidi, `writing-mode`,
+font features. For these, leave `extra_css` empty and accept a
+below-100 score; the reftest's value there is human review of the
+diff image, not the numeric score.
+
+**Authoring workflow** for a new fixture:
+
+1. Write the `.html` fixture under `fixtures/test-html/L0/`.
+2. Add an entry to `suites/L0.coverage.json` with at least
+ `{ "path": "../L0/.html" }`.
+3. Render it once via `--suite L0.coverage.json` on the cg side; note
+ the reported `WxH` in the log.
+4. Set `viewport.height = H` on the entry; add `extra_css` helpers if
+ relevant (e.g. `hide-text.css` for non-text fixtures). `defaults`
+ in the suite likely already cover the common case.
+5. Run the refbrowser producer + diff against the same suite. Review
+ the diff PNG — if the diff is dominated by a known divergence zone
+ (see below), record it in the PR description, don't suppress it.
+6. If the fixture reaches 100.00% byte-exact, move its entry from
+ `L0.coverage.json` to `L0.exact.json`.
+
+**Known divergence surfaces** — areas where cg is not yet Blink-exact.
+These are **backlog items, not tolerance excuses**. Do not tune
+thresholds to suppress them. Document the specific divergence in the
+PR description; let the score carry the truth.
+
+- **Alpha compositing rounding** — `rgba()` backgrounds, `opacity`.
+ cg and Blink choose different rounding rules (half-up vs banker's,
+ premul vs straight, operand order), producing 1-unit channel
+ deltas. Small-delta territory but still a real policy divergence.
+- **Layout math under non-uniform padding / intrinsic sizing** —
+ block widths resolving 1-3 px off when computed through flex
+ children, asymmetric padding, or `width: auto` on transparent
+ content. Shows up as diff brackets at box edges.
+- **Text** — glyph rasterization (shaper version, subpixel positioning,
+ hinting) and line-box metrics (`line-height: normal` ascent/descent)
+ diverge. For non-text fixtures inject `hide-text.css`. For
+ text-subject fixtures, accept a below-100 score and rely on the
+ diff image for review.
+- **Antialiasing on curves** — rounded corners, circles, ellipses,
+ stroke ends. cg's path flattener emits different coverage values
+ than Blink's for the same geometry.
+- **Percentage border-radius** — `border-radius: 50%` and the `H / V`
+ two-value form currently render as square in cg. Fixed-length radii
+ (`12px`, `9999px`) work.
+- **Gradients** — linear, radial, conic, repeating. Color-stop
+ interpolation and color-space handling differ; banding and
+ transition boundaries don't match.
+- **Filters and shadows** — `filter: blur`, `backdrop-filter`,
+ `box-shadow` with large blur radii. Kernel and sampling divergence
+ dominates scores.
+- **`` fallbacks** — our `ImageProvider` renders a placeholder
+ rect; Chromium renders broken-image chrome. Prefer fixtures with
+ real image fills or none.
+- **System-font fallback** — bundle fonts with `@font-face` + local
+ paths when the fixture specifically tests font rendering.
+- **Scrollbar width** — default `full_page: true` captures document
+ height and sidesteps scrollbar chrome; flip only when testing
+ scrollbar geometry.
+- **Dimension drift** — changing a fixture's layout invalidates its
+ `viewport.height` in the suite entry. Re-run `golden_htmlcss` with
+ `--suite`, update the entry's `viewport.height`, re-run refbrowser.
+
**Oracle type summary:**
| Input format | Oracle source | Test type |
@@ -274,8 +570,135 @@ is the threshold where the renderer needs attention.
| SVG (arbitrary, no PNG) | resvg-rendered PNG | Reftest |
| SVG (Grida extensions) | Our own prior output | Golden test |
| Figma REST / .fig | Figma-exported PNG | Reftest |
+| HTML / CSS (embed) | Playwright Chromium PNG | Reftest |
| `.grida` native | Our own prior output | Golden test |
+---
+
+## Heuristic techniques (future work)
+
+Two techniques that scale reftesting beyond "fixture in, score out."
+Both are format-agnostic — they apply anywhere we control the oracle
+pipeline (refbrowser, refsvg-via-resvg), and both are **unimplemented
+today**. They're documented here so the design is shared before
+anyone starts building.
+
+### Subtree bisection — diff attribution
+
+> Aliases: _diff attribution_, _culprit isolation_. Delta debugging
+> applied to rendering.
+>
+> **TODO — tooling not ready.** Manual application only today.
+
+A reftest gives you a single similarity score and a diff PNG. For a
+minimal fixture that's enough — you eyeball the diff and the culprit
+is obvious. As fixtures scale (multi-element compositions, nested
+layout, overlapping subtrees), you know _that_ there's a divergence
+but not _which_ element owns it.
+
+**The technique** narrows "something in this fixture diverges" to
+"this specific element's rendering is wrong," in two modes:
+
+1. **Region → element (fast path).** Extract the bbox of high-delta
+ regions from the diff PNG (connected-components or simple
+ threshold pass). Match each bbox against element bounds in the
+ fixture — confidently possible when elements are absolutely
+ positioned or when the layout tree has dumped bounds available.
+ One-shot lookup; names the culprit directly.
+
+2. **Isolation bisection (slow path).** When region→element is
+ ambiguous (overlapping elements, pure flow layout), generate
+ temporary scoped-down fixtures by injecting override CSS that
+ hides all siblings / cousins of a candidate subtree
+ (`display: none` on the rest, or `visibility: hidden` if
+ layout must be preserved). Re-run the reftest on each isolated
+ view. Iterate through the element tree to produce per-subtree
+ scores and converge on the offending node.
+
+The two-path split matters because mode (1) is O(1) in reftest runs
+and mode (2) is O(log n) at best — prefer (1) whenever bbox→element
+is unambiguous.
+
+**Applicability.**
+
+| Reftest | Oracle controllable? | Subtree bisection viable? |
+| -------------- | -------------------- | ------------------------- |
+| refbrowser | Yes (Playwright) | ✅ Yes |
+| refsvg (resvg) | Yes (local CLI) | ✅ Yes |
+| W3C SVG suite | No (pre-baked PNG) | ❌ No |
+| refig (Figma) | No (manual export) | ❌ No |
+
+Figma is explicitly out: isolating a subtree would require
+re-exporting from the Figma app, which is an upstream human step.
+
+**Tooling shape (when built).** A script that:
+
+1. Reads a reftest's diff PNG.
+2. Extracts high-delta bounding boxes.
+3. Attempts region→element match against a parsed fixture tree.
+4. On ambiguity, writes override CSS for each candidate subtree,
+ re-runs the producer + diff, accumulates per-subtree scores.
+5. Outputs a JSON report keyed by element selector, with a score
+ and a small preview diff per subtree.
+
+Not unique to htmlcss — the same pattern works for any tree-structured
+oracle with controllable input (SVG `` subtrees, scene graph nodes
+in .grida, etc.).
+
+### Viewport sweep — width-matrix for layout fixtures
+
+> Aliases: _width sweep_, _responsive sweep_, _width matrix_.
+>
+> **TODO — tooling not ready.** Single-width runs only today.
+
+A single-viewport reftest catches a layout bug at that one width. It
+misses bugs that only manifest at a different width — which for CSS
+layout is most bugs (flex basis resolution, wrap points, grid
+`auto-fill`, `min-content` / `max-content` interaction,
+percentage-sized children against unusual parent widths).
+
+**The technique.** Render the same layout fixture at a list of
+viewport widths and diff each independently. A typical sweep:
+
+```
+widths: [320, 600, 768, 1024, 1280] // mobile → desktop span
+```
+
+Produces N PNG pairs per fixture and N similarity scores. A fixture
+passes only if _every_ width passes.
+
+**Why width, not height.** CSS content flows vertically as a function
+of the containing block's width; height is mostly an output, not an
+input. Width variance exercises most layout regimes. Height variance
+is relevant only for `min-height`/`max-height`/vh-based cases, which
+are narrower and better covered by dedicated single-width fixtures.
+
+**Applicability.** Layout-category fixtures only. Paint fixtures
+(color, opacity, shadow, gradient, border-radius) render a fixed-size
+subject inside a fixed canvas — sweeping widths adds no signal and
+just multiplies work.
+
+**Tooling shape (when built).** Suite schema grows a `widths` array
+on layout entries:
+
+```json
+{
+ "path": "../L0/box-dimensions.html",
+ "widths": [320, 600, 1024]
+}
+```
+
+Producers loop over `widths`, emitting PNGs named
+`@.png`. `@grida/reftest` treats each as a separate
+test. No per-width `viewport.height` — let each width produce its
+natural cull height (the measurement _is_ the output).
+
+This technique is also format-agnostic — responsive SVG, responsive
+refbrowser, and responsive .grida scenes all benefit from the same
+width-sweep harness.
+
+---
+
### Golden tests — native/proprietary/internal formats
Use when **no external truth exists**:
@@ -582,6 +1005,38 @@ pnpm --filter @grida/reftest exec reftest \
In a PR: _"refig(refig-standard): auto-layout row spacing fix, average
similarity 0.81 → 0.94, 612 tests S75→S95."_
+### True reftest — HTML/CSS refbrowser against Playwright Chromium
+
+```bash
+# Pre-requisite: Chromium installed for Playwright
+pnpm --filter @grida/reftest exec playwright install chromium
+
+# 1. Render expecteds via Playwright Chromium
+pnpm --filter @grida/reftest exec tsx .agents/skills/cg-reftest/scripts/refbrowser_render.ts \
+ --suite fixtures/test-html/suites/L0.exact.json \
+ --out-dir target/refbrowser/expected
+
+# 2. Render actuals via our cg pipeline
+cargo run -p cg --example golden_htmlcss -- \
+ --suite fixtures/test-html/suites/L0.exact.json
+mkdir -p target/refbrowser/actual
+cp "${TMPDIR:-/tmp}/grida-htmlcss-goldens/"*.png target/refbrowser/actual/
+
+# 3. Diff actuals against Chromium oracle, write bucketed report
+pnpm --filter @grida/reftest exec reftest \
+ --actual-dir target/refbrowser/actual \
+ --expected-dir target/refbrowser/expected \
+ --output-dir target/reftests/htmlcss \
+ --bg white
+
+# Result: target/reftests/htmlcss/report.json + S99/S95/S90/S75/err/ buckets.
+# A score < 1.0 means our htmlcss renderer diverges from Chromium.
+# This is a genuine reftest — Playwright Chromium is the oracle.
+```
+
+In a PR: _"refbrowser(htmlcss): background-repeat space/round landed,
+average similarity 0.72 → 0.91 across 6 repeat fixtures."_
+
### Golden/regression test — custom effect
```bash
diff --git a/.agents/skills/cg-reftest/scripts/refbrowser_render.ts b/.agents/skills/cg-reftest/scripts/refbrowser_render.ts
new file mode 100644
index 000000000..46b88178e
--- /dev/null
+++ b/.agents/skills/cg-reftest/scripts/refbrowser_render.ts
@@ -0,0 +1,314 @@
+#!/usr/bin/env -S pnpm dlx tsx
+/**
+ * refbrowser_render.ts — headless Chromium oracle for HTML/CSS reftests.
+ *
+ * Renders each fixture in a suite through Playwright's Chromium and
+ * writes a PNG per fixture to `--out-dir`. The output is the reference
+ * oracle for cg's htmlcss renderer — Chromium is the ground truth.
+ *
+ * ┌────────────────┐ ┌─────────────────────┐ ┌──────────────────┐
+ * │ .json │ → │ Playwright Chromium │ → │ expected/.png │
+ * │ + helper CSS │ │ (full-page screen) │ │ │
+ * └────────────────┘ └─────────────────────┘ └──────────────────┘
+ *
+ * Pair with `cargo run -p cg --example golden_htmlcss --suite` on the
+ * actual side, then diff via `@grida/reftest`.
+ *
+ * ## Usage
+ *
+ * ```sh
+ * pnpm --filter @grida/reftest exec tsx \
+ * .agents/skills/cg-reftest/scripts/refbrowser_render.ts \
+ * --suite fixtures/test-html/suites/L0.exact.json \
+ * --out-dir target/refbrowser/L0.exact/expected
+ * ```
+ *
+ * Ad-hoc single-file render (no suite, defaults only):
+ *
+ * ```sh
+ * pnpm --filter @grida/reftest exec tsx \
+ * .agents/skills/cg-reftest/scripts/refbrowser_render.ts \
+ * --fixture fixtures/test-html/L0/paint-background-solid.html \
+ * --out-dir /tmp/refbrowser-verify
+ * ```
+ *
+ * ## Dependencies
+ *
+ * - Node 20+.
+ * - `@playwright/test` (devDependency of `@grida/reftest`).
+ * - Chromium binary (one-time):
+ * `pnpm --filter @grida/reftest exec playwright install chromium`
+ *
+ * ## Suite JSON shape
+ *
+ * ```json
+ * {
+ * "name": "L0.exact",
+ * "gate": { "threshold": 0, "aa": false, "floor": 1.0 },
+ * "defaults": {
+ * "viewport": { "width": 600, "height": 800 },
+ * "wait_for": ["fonts", "networkidle"],
+ * "extra_css": ["../_reftest/hide-text.css"],
+ * "full_page": true
+ * },
+ * "fixtures": [
+ * { "path": "../L0/box-dimensions.html",
+ * "viewport": { "width": 600, "height": 522 } }
+ * ]
+ * }
+ * ```
+ *
+ * Per-fixture entries inherit and override `defaults` field-by-field.
+ * All paths (`fixtures[].path`, `extra_css[]`) resolve **relative to
+ * the suite file**. `gate` is consumed by the diff step, not here.
+ *
+ * ## Caveats
+ *
+ * - `document.fonts.ready` waits for ``/inline `@font-face` loads;
+ * system-font fallbacks still differ from Skia. Inject
+ * `_reftest/hide-text.css` for non-text fixtures.
+ * - Each fixture is rendered in a fresh incognito context.
+ */
+import { promises as fs } from "node:fs";
+import * as path from "node:path";
+import { fileURLToPath, pathToFileURL } from "node:url";
+// `@playwright/test` re-exports `chromium` from `playwright-core` and is
+// the package actually installed in this repo (via `editor`). Using it
+// here avoids depending on a separate `playwright` package.
+import { chromium, type Browser, type BrowserContext } from "@playwright/test";
+
+type FixtureConfig = {
+ viewport?: { width?: number; height?: number };
+ wait_for?: Array<"fonts" | "networkidle">;
+ extra_css?: string[];
+ full_page?: boolean;
+};
+
+type FixtureEntry = FixtureConfig & { path: string };
+
+type SuiteFile = {
+ name?: string;
+ description?: string;
+ gate?: unknown; // consumed by the diff step, not here
+ defaults?: FixtureConfig;
+ fixtures: FixtureEntry[];
+};
+
+type ResolvedConfig = {
+ viewport: { width: number; height: number };
+ wait_for: Array<"fonts" | "networkidle">;
+ extra_css: string[];
+ full_page: boolean;
+};
+
+const DEFAULTS: ResolvedConfig = {
+ viewport: { width: 600, height: 800 },
+ wait_for: ["fonts", "networkidle"],
+ extra_css: [],
+ full_page: true,
+};
+
+function mergeConfig(
+ defaults: FixtureConfig | undefined,
+ entry: FixtureConfig
+): ResolvedConfig {
+ const pick = (key: K): ResolvedConfig[K] => {
+ const a = entry[key] as ResolvedConfig[K] | undefined;
+ const b = defaults?.[key] as ResolvedConfig[K] | undefined;
+ return (a ?? b ?? DEFAULTS[key]) as ResolvedConfig[K];
+ };
+ const vp = entry.viewport ?? defaults?.viewport ?? DEFAULTS.viewport;
+ return {
+ viewport: {
+ width: vp?.width ?? DEFAULTS.viewport.width,
+ height: vp?.height ?? DEFAULTS.viewport.height,
+ },
+ wait_for: pick("wait_for"),
+ extra_css: pick("extra_css"),
+ full_page: pick("full_page"),
+ };
+}
+
+type Resolved = {
+ htmlPath: string;
+ stem: string;
+ config: ResolvedConfig;
+};
+
+async function resolveSuite(suitePath: string): Promise {
+ const raw = await fs.readFile(suitePath, "utf8");
+ const suite = JSON.parse(raw) as SuiteFile;
+ if (!Array.isArray(suite.fixtures)) {
+ throw new Error(`suite ${suitePath}: missing fixtures[]`);
+ }
+ const suiteDir = path.dirname(path.resolve(suitePath));
+ return suite.fixtures.map((entry) => {
+ const htmlPath = path.resolve(suiteDir, entry.path);
+ const merged = mergeConfig(suite.defaults, entry);
+ // Resolve extra_css paths relative to the suite file.
+ const extra_css = merged.extra_css.map((rel) =>
+ path.resolve(suiteDir, rel)
+ );
+ const stem = path.basename(entry.path).replace(/\.html?$/i, "");
+ return { htmlPath, stem, config: { ...merged, extra_css } };
+ });
+}
+
+async function loadCssCached(
+ cache: Map,
+ abs: string
+): Promise {
+ const hit = cache.get(abs);
+ if (hit !== undefined) return hit;
+ try {
+ const content = await fs.readFile(abs, "utf8");
+ cache.set(abs, content);
+ return content;
+ } catch (e) {
+ console.error(` warn: failed to read ${abs}: ${(e as Error).message}`);
+ return null;
+ }
+}
+
+async function renderOne(
+ ctx: BrowserContext,
+ r: Resolved,
+ outDir: string,
+ cssCache: Map
+): Promise<{ file: string; cssCount: number }> {
+ const { htmlPath, stem, config } = r;
+
+ const page = await ctx.newPage();
+ await page.setViewportSize(config.viewport);
+
+ // `file://` URL so relative resources resolve from the fixture's dir.
+ // `pathToFileURL` handles Windows drive letters and percent-encodes
+ // spaces/non-ASCII, which plain string concatenation does not.
+ const fileUrl = pathToFileURL(path.resolve(htmlPath)).href;
+ await page.goto(fileUrl, { waitUntil: "load" });
+
+ if (config.wait_for.includes("networkidle")) {
+ await page.waitForLoadState("networkidle");
+ }
+ if (config.wait_for.includes("fonts")) {
+ // Await inside the page context; `document.fonts.ready` resolves
+ // to a `FontFaceSet` which Playwright cannot serialize across the
+ // boundary. Return void so the wait is effective and typed.
+ await page.evaluate(async () => {
+ await document.fonts.ready;
+ });
+ }
+
+ let cssCount = 0;
+ for (const abs of config.extra_css) {
+ const content = await loadCssCached(cssCache, abs);
+ if (content !== null) {
+ await page.addStyleTag({ content });
+ cssCount++;
+ }
+ }
+
+ const outPath = path.join(outDir, `${stem}.png`);
+ await fs.mkdir(outDir, { recursive: true });
+
+ await page.screenshot({
+ path: outPath,
+ fullPage: config.full_page,
+ animations: "disabled",
+ caret: "hide",
+ });
+
+ await page.close();
+ return { file: outPath, cssCount };
+}
+
+function parseArgs(argv: string[]): {
+ suite?: string;
+ fixture?: string;
+ outDir: string;
+} {
+ const args: Record = {};
+ for (let i = 0; i < argv.length; i += 2) {
+ const key = argv[i]?.replace(/^--/, "");
+ const val = argv[i + 1];
+ if (key && val) args[key] = val;
+ }
+ if (!args["out-dir"]) {
+ throw new Error("--out-dir is required");
+ }
+ if (!args["suite"] && !args["fixture"]) {
+ throw new Error("must pass --suite or --fixture ");
+ }
+ return {
+ suite: args["suite"],
+ fixture: args["fixture"],
+ outDir: path.resolve(args["out-dir"]),
+ };
+}
+
+async function main() {
+ const args = parseArgs(process.argv.slice(2));
+ let resolved: Resolved[];
+ if (args.suite) {
+ resolved = await resolveSuite(args.suite);
+ console.log(
+ `refbrowser: rendering ${resolved.length} fixture(s) from ${args.suite}`
+ );
+ } else {
+ const htmlPath = path.resolve(args.fixture!);
+ const stem = path.basename(htmlPath).replace(/\.html?$/i, "");
+ resolved = [{ htmlPath, stem, config: DEFAULTS }];
+ console.log(`refbrowser: rendering 1 fixture (ad-hoc, defaults only)`);
+ }
+ console.log(` out-dir: ${args.outDir}`);
+
+ let browser: Browser | null = null;
+ const cssCache = new Map();
+ try {
+ browser = await chromium.launch();
+
+ for (const r of resolved) {
+ const rel = path.relative(process.cwd(), r.htmlPath);
+ // Fresh incognito context per fixture — no cookie/storage/SW
+ // leakage between fixtures, so order can't mask real renderer
+ // changes.
+ let ctx: BrowserContext | null = null;
+ try {
+ ctx = await browser.newContext({
+ // Deterministic: force light color-scheme, standard locale/timezone.
+ colorScheme: "light",
+ locale: "en-US",
+ timezoneId: "UTC",
+ reducedMotion: "reduce",
+ });
+ const { file, cssCount } = await renderOne(
+ ctx,
+ r,
+ args.outDir,
+ cssCache
+ );
+ const hint = cssCount > 0 ? ` [+${cssCount} css]` : "";
+ console.log(` ${rel} → ${file}${hint}`);
+ } catch (e) {
+ console.error(` ${rel}: FAILED`);
+ console.error(e);
+ process.exitCode = 1;
+ } finally {
+ await ctx?.close();
+ }
+ }
+ } finally {
+ await browser?.close();
+ }
+}
+
+// Only run when invoked directly (not when imported).
+const invoked =
+ process.argv[1] &&
+ fileURLToPath(import.meta.url) === path.resolve(process.argv[1]);
+if (invoked) {
+ main().catch((e) => {
+ console.error(e);
+ process.exit(1);
+ });
+}
diff --git a/.agents/skills/fixtures/SKILL.md b/.agents/skills/fixtures/SKILL.md
index be1fcee83..953c090a5 100644
--- a/.agents/skills/fixtures/SKILL.md
+++ b/.agents/skills/fixtures/SKILL.md
@@ -43,6 +43,16 @@ edge case** that the codebase supports or intends to support. This includes:
filename alone should tell you what's being tested.
- **Labeled specimens.** Within a fixture, label each test case with the
value being exercised so both humans and heuristics can identify regions.
+- **Match the fixture's subject to the viewport policy.** For refbrowser
+ fixtures under `fixtures/test-html/`, **paint / visual-property**
+ fixtures should size their root to a preset viewport (via `min-height`)
+ so cg's cull and Chromium's screenshot have identical dimensions.
+ **Layout** fixtures (box-model, flex, grid, intrinsic sizing) must
+ NOT force a body size — the output dimensions _are_ what the test
+ measures; a `min-height` hack contaminates the result. See
+ [`fixtures/test-html/README.md`](../../../fixtures/test-html/README.md)
+ for the preset list, the paint-vs-layout rule, and the per-fixture
+ `viewport` workflow for layout tests.
- **Don't duplicate.** Before adding a fixture, check if an existing one
already covers the behavior. Extend or split rather than duplicate.
diff --git a/crates/grida-canvas/examples/golden_htmlcss.rs b/crates/grida-canvas/examples/golden_htmlcss.rs
index 38da7915b..81ada4db3 100644
--- a/crates/grida-canvas/examples/golden_htmlcss.rs
+++ b/crates/grida-canvas/examples/golden_htmlcss.rs
@@ -4,28 +4,127 @@
/// temporary directory (printed to stderr) so generated images don't
/// bloat the repository.
///
-/// Usage:
+/// ## Usage
+///
+/// cargo run -p cg --example golden_htmlcss -- \
+/// --suite fixtures/test-html/suites/L0.exact.json
+///
/// cargo run -p cg --example golden_htmlcss -- [FILE_OR_DIR...]
///
-/// If no arguments given, renders built-in test fixtures.
-/// If a directory is given, renders all .html/.htm files in it.
+/// If no arguments given, renders built-in L0 fixtures.
+/// If FILE_OR_DIR is given, renders ad-hoc (no sidecar config).
+///
+/// ## Suite JSON shape
+///
+/// {
+/// "defaults": {
+/// "viewport": { "width": 600, "height": 800 },
+/// "extra_css": ["../_reftest/hide-text.css"]
+/// },
+/// "fixtures": [
+/// { "path": "../L0/box-dimensions.html",
+/// "viewport": { "width": 600, "height": 522 } }
+/// ]
+/// }
+///
+/// Per-fixture entries inherit and override `defaults`. All paths
+/// (`fixtures[].path`, `extra_css[]`) resolve **relative to the suite
+/// file**. `gate` and other fields unknown to this tool are ignored.
use cg::htmlcss;
use cg::resources::ByteStore;
use cg::runtime::font_repository::FontRepository;
+use serde::Deserialize;
use skia_safe::{surfaces, Color};
+use std::collections::HashMap;
use std::path::{Path, PathBuf};
use std::sync::{Arc, Mutex};
-fn fonts() -> FontRepository {
+fn build_fonts() -> FontRepository {
let mut repo = FontRepository::new(Arc::new(Mutex::new(ByteStore::new())));
repo.enable_system_fallback();
repo
}
-fn render_to_png(html: &str, width: f32, name: &str, out_dir: &Path) {
- let fonts = fonts();
+#[derive(Debug, Default, Clone, Copy, Deserialize)]
+#[serde(default)]
+struct Viewport {
+ width: Option,
+ height: Option,
+}
+
+#[derive(Debug, Default, Deserialize)]
+#[serde(default)]
+struct FixtureConfig {
+ extra_css: Vec,
+ viewport: Viewport,
+}
+
+#[derive(Debug, Deserialize)]
+struct SuiteEntry {
+ path: String,
+ #[serde(default)]
+ extra_css: Option>,
+ #[serde(default)]
+ viewport: Option,
+}
+
+#[derive(Debug, Default, Deserialize)]
+#[serde(default)]
+struct SuiteFile {
+ defaults: FixtureConfig,
+ fixtures: Vec,
+}
+
+const DEFAULT_WIDTH: f32 = 600.0;
+const DEFAULT_HEIGHT: f32 = 600.0;
+
+/// Resolve a fixture entry against suite defaults. Suite-relative
+/// paths are anchored at `suite_dir`. Viewport width/height inherit
+/// from `defaults` and fall back to the built-in defaults.
+fn resolve_entry(
+ entry: &SuiteEntry,
+ defaults: &FixtureConfig,
+ suite_dir: &Path,
+) -> (PathBuf, Vec, f32, f32) {
+ let html = suite_dir.join(&entry.path);
+ let css_rel: &[String] = entry.extra_css.as_deref().unwrap_or(&defaults.extra_css);
+ let css_abs: Vec = css_rel.iter().map(|r| suite_dir.join(r)).collect();
+ let vp = entry.viewport.unwrap_or(defaults.viewport);
+ let width = vp
+ .width
+ .or(defaults.viewport.width)
+ .unwrap_or(DEFAULT_WIDTH);
+ let height = vp
+ .height
+ .or(defaults.viewport.height)
+ .unwrap_or(DEFAULT_HEIGHT);
+ (html, css_abs, width, height)
+}
+
+/// Populate `cache[abs]` if absent. Missing files warn; absent keys
+/// are treated as a no-op at injection time.
+fn ensure_css_cached(cache: &mut HashMap, abs: &Path) {
+ if cache.contains_key(abs) {
+ return;
+ }
+ match std::fs::read_to_string(abs) {
+ Ok(s) => {
+ cache.insert(abs.to_path_buf(), s);
+ }
+ Err(e) => eprintln!(" warn: failed to read {}: {e}", abs.display()),
+ }
+}
+
+fn render_to_png(
+ html: &str,
+ width: f32,
+ height: f32,
+ name: &str,
+ out_dir: &Path,
+ fonts: &FontRepository,
+) {
let picture =
- htmlcss::render(html, width, 600.0, &fonts, &htmlcss::NoImages).expect("render failed");
+ htmlcss::render(html, width, height, fonts, &htmlcss::NoImages).expect("render failed");
let cull = picture.cull_rect();
let w = cull.width().max(1.0) as i32;
let h = cull.height().max(1.0) as i32;
@@ -44,42 +143,177 @@ fn render_to_png(html: &str, width: f32, name: &str, out_dir: &Path) {
eprintln!(" {name}: {w}x{h} → {}", path.display());
}
-fn render_html_file(path: &Path, out_dir: &Path) {
- let html = std::fs::read_to_string(path).expect("failed to read HTML file");
- let name = path
+fn render_with_extras(
+ html_path: &Path,
+ extras_abs: &[PathBuf],
+ width: f32,
+ height: f32,
+ out_dir: &Path,
+ fonts: &FontRepository,
+ css_cache: &mut HashMap,
+) {
+ let html = match std::fs::read_to_string(html_path) {
+ Ok(s) => s,
+ Err(e) => {
+ eprintln!(" warn: failed to read {}: {e}", html_path.display());
+ return;
+ }
+ };
+ let name = html_path
.file_stem()
.map(|s| s.to_string_lossy().to_string())
.unwrap_or_else(|| "unknown".to_string());
- render_to_png(&html, 600.0, &name, out_dir);
+
+ for abs in extras_abs {
+ ensure_css_cached(css_cache, abs);
+ }
+ let extras: Vec<&str> = extras_abs
+ .iter()
+ .filter_map(|p| css_cache.get(p).map(String::as_str))
+ .collect();
+ let html = if extras.is_empty() {
+ html
+ } else {
+ htmlcss::with_extra_stylesheets(&html, &extras)
+ };
+
+ render_to_png(&html, width, height, &name, out_dir, fonts);
+}
+
+fn render_suite(suite_path: &Path, out_dir: &Path, fonts: &FontRepository) {
+ let raw = std::fs::read_to_string(suite_path)
+ .unwrap_or_else(|e| panic!("failed to read {}: {e}", suite_path.display()));
+ let suite: SuiteFile = serde_json::from_str(&raw)
+ .unwrap_or_else(|e| panic!("failed to parse {}: {e}", suite_path.display()));
+ let suite_dir = suite_path.parent().unwrap_or(Path::new("."));
+
+ eprintln!(
+ "Rendering {} fixture(s) from suite {}",
+ suite.fixtures.len(),
+ suite_path.display()
+ );
+ let mut css_cache: HashMap = HashMap::new();
+ for entry in &suite.fixtures {
+ let (html_path, extras_abs, width, height) =
+ resolve_entry(entry, &suite.defaults, suite_dir);
+ render_with_extras(
+ &html_path,
+ &extras_abs,
+ width,
+ height,
+ out_dir,
+ fonts,
+ &mut css_cache,
+ );
+ }
+}
+
+fn render_directory(dir: &Path, out_dir: &Path, fonts: &FontRepository) {
+ let mut entries: Vec = std::fs::read_dir(dir)
+ .expect("failed to read directory")
+ .filter_map(|e| e.ok().map(|e| e.path()))
+ .filter(|p| {
+ p.extension()
+ .map(|ext| ext == "html" || ext == "htm")
+ .unwrap_or(false)
+ })
+ .collect();
+ entries.sort();
+
+ eprintln!(
+ "Rendering {} HTML files from {}",
+ entries.len(),
+ dir.display()
+ );
+ let mut css_cache: HashMap = HashMap::new();
+ for path in &entries {
+ render_with_extras(
+ path,
+ &[],
+ DEFAULT_WIDTH,
+ DEFAULT_HEIGHT,
+ out_dir,
+ fonts,
+ &mut css_cache,
+ );
+ }
+}
+
+/// Parse `argv` into (`suite_path`, positional args). If `--suite P`
+/// is present, those two tokens are removed from the positional list.
+fn parse_args(argv: &[String]) -> (Option, Vec) {
+ let mut suite: Option = None;
+ let mut positional: Vec = Vec::new();
+ let mut i = 0;
+ while i < argv.len() {
+ let a = &argv[i];
+ if a == "--suite" {
+ let v = argv
+ .get(i + 1)
+ .unwrap_or_else(|| panic!("--suite requires a path argument"));
+ suite = Some(v.clone());
+ i += 2;
+ } else if a.starts_with("--") {
+ // Unknown long flag. If the next token looks like a value
+ // (doesn't start with `-`) swallow it too, so `--foo bar`
+ // doesn't leak `bar` into the positional stream and get
+ // treated as a file path.
+ match argv.get(i + 1) {
+ Some(next) if !next.starts_with('-') => i += 2,
+ _ => i += 1,
+ }
+ } else {
+ positional.push(a.clone());
+ i += 1;
+ }
+ }
+ (suite, positional)
}
fn main() {
- let args: Vec = std::env::args().skip(1).collect();
+ let argv: Vec = std::env::args().skip(1).collect();
+ let (suite, positional) = parse_args(&argv);
// Output to system temp directory
let out_dir = std::env::temp_dir().join("grida-htmlcss-goldens");
std::fs::create_dir_all(&out_dir).expect("failed to create output directory");
eprintln!("Output: {}", out_dir.display());
- if args.is_empty() {
- // Render built-in test fixtures from fixtures/test-html/L0/
+ let fonts = build_fonts();
+
+ if let Some(suite_path) = suite {
+ render_suite(Path::new(&suite_path), &out_dir, &fonts);
+ eprintln!("Done. Files in: {}", out_dir.display());
+ return;
+ }
+
+ if positional.is_empty() {
let fixture_dir = PathBuf::from(concat!(
env!("CARGO_MANIFEST_DIR"),
"/../../fixtures/test-html/L0"
));
if fixture_dir.is_dir() {
- render_directory(&fixture_dir, &out_dir);
+ render_directory(&fixture_dir, &out_dir, &fonts);
} else {
eprintln!("No fixture directory found at {}", fixture_dir.display());
- eprintln!("Pass HTML files as arguments instead.");
+ eprintln!("Pass --suite or HTML files as arguments.");
}
} else {
- for arg in &args {
+ let mut css_cache: HashMap = HashMap::new();
+ for arg in &positional {
let path = PathBuf::from(arg);
if path.is_dir() {
- render_directory(&path, &out_dir);
+ render_directory(&path, &out_dir, &fonts);
} else if path.is_file() {
- render_html_file(&path, &out_dir);
+ render_with_extras(
+ &path,
+ &[],
+ DEFAULT_WIDTH,
+ DEFAULT_HEIGHT,
+ &out_dir,
+ &fonts,
+ &mut css_cache,
+ );
} else {
eprintln!("Skipping {}: not a file or directory", path.display());
}
@@ -88,25 +322,3 @@ fn main() {
eprintln!("Done. Files in: {}", out_dir.display());
}
-
-fn render_directory(dir: &Path, out_dir: &Path) {
- let mut entries: Vec = std::fs::read_dir(dir)
- .expect("failed to read directory")
- .filter_map(|e| e.ok().map(|e| e.path()))
- .filter(|p| {
- p.extension()
- .map(|ext| ext == "html" || ext == "htm")
- .unwrap_or(false)
- })
- .collect();
- entries.sort();
-
- eprintln!(
- "Rendering {} HTML files from {}",
- entries.len(),
- dir.display()
- );
- for path in &entries {
- render_html_file(path, out_dir);
- }
-}
diff --git a/crates/grida-canvas/src/htmlcss/mod.rs b/crates/grida-canvas/src/htmlcss/mod.rs
index 8ce3d9042..a69adc5b1 100644
--- a/crates/grida-canvas/src/htmlcss/mod.rs
+++ b/crates/grida-canvas/src/htmlcss/mod.rs
@@ -128,6 +128,45 @@ impl ImageProvider for PreloadedImages {
}
}
+/// Inject one or more author stylesheets into an HTML document string.
+///
+/// Concatenates `css_bodies` into a single `");
+
+ // HTML tag names are ASCII, so lowercasing the haystack is safe.
+ // One pass, correct for any casing of ``.
+ if let Some(idx) = html.to_ascii_lowercase().find("") {
+ let mut out = String::with_capacity(html.len() + combined.len());
+ out.push_str(&html[..idx]);
+ out.push_str(&combined);
+ out.push_str(&html[idx..]);
+ return out;
+ }
+ format!("{combined}{html}")
+}
+
/// Render HTML+CSS to a Skia Picture.
///
/// Images referenced by `` or `background-image: url()` are
diff --git a/fixtures/test-html/L0/paint-background-solid.html b/fixtures/test-html/L0/paint-background-solid.html
index 8cbc4d3ba..e63bd3f7c 100644
--- a/fixtures/test-html/L0/paint-background-solid.html
+++ b/fixtures/test-html/L0/paint-background-solid.html
@@ -4,6 +4,11 @@
Paint: Solid Background