diff --git a/.agents/prompts/cg-htmlcss-feature.md b/.agents/prompts/cg-htmlcss-feature.md new file mode 100644 index 000000000..b67bd08e6 --- /dev/null +++ b/.agents/prompts/cg-htmlcss-feature.md @@ -0,0 +1,316 @@ +# cg-htmlcss — feature loop prompt + +**What this is.** A pastable prompt template for driving a single CSS +feature forward in the cg htmlcss renderer. Paste the template at the +bottom into a new task; the reference above it is context an agent +can read to follow the loop honestly. + +**Why this is a prompt and not a skill.** The 5-phase loop is +deliberately heavy — audit + ground + fixture + implement + verify. +It's overkill for small fixes, and it's already a conductor over +`/research`, `/fixtures`, `/cg-reftest`, which auto-trigger correctly +on their own. Opt-in invocation is right: paste it when you want the +full cycle; skip it for paper-cuts. + +**Lifecycle.** Expect this file to grow as new divergence patterns +surface. It will likely go stale in parts once htmlcss hits +Chromium-parity on L0/L1; treat the _phase structure_ as durable and +the _property-specific callouts_ as advisory. + +--- + +## The five phases + +```text +┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ +│ 1. AUDIT │→ │2. GROUND │→ │3. FIXTURE│→ │ 4. IMPL │→ │5. VERIFY │ +└──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ + │ │ + └───── ← ─── loop ← ─── score < floor ← ─── diff ← ───────┘ +``` + +Each phase has a **question it answers**, a **deliverable**, and an +**exit criterion**. Don't skip forward; don't linger past the exit +criterion. The loop closes at verify — if the score is below the +gate, return to phase 3 or 4 with a specific hypothesis, not a vibe. + +### 1. Audit — "what's the actual state of this feature?" + +**Question.** Where is the feature on the cg side today? What +renders wrong, what doesn't render at all, what renders +coincidentally-correctly but by the wrong path? + +**Actions.** + +- Scan `crates/grida-canvas/src/htmlcss/` for the property name in + stylo enum mapping, paint emit, layout feed. A property can be + parsed-but-dropped, emitted-but-wrong, or unhandled — each has a + different fix shape. +- Enumerate existing fixtures that touch the feature + (`fixtures/test-html/L0/`). Run them under `L0.coverage` and + record current similarity per fixture. This is the before-number. +- Check `docs/wg/feat-2d/htmlcss.md` and any related design notes + for a prior decision or deliberate gap. +- List sibling properties likely to break the same way (e.g. + `border-radius` %-values implied `border-image-slice` %-values). + +**Deliverable.** A short audit note inside the task prompt or the +PR draft: + +- _Current support level_: not-parsed / parsed-but-dropped / + partial / Chromium-parity-except-X. +- _Fixtures touching it_: list with current similarity scores. +- _Priority bucket_: easy-and-important / easy-low-value / + hard-important / hard-low-value. Pick from the top-left by + default; only go hard-important when called out. + +**Exit when.** You can state the feature's current renderer state +in one paragraph with file references. If you can't, you don't +know enough yet — read more, don't guess. + +### 2. Ground — "how do real engines solve this?" + +**Question.** What's the canonical implementation strategy for +this feature in a mature engine? We are not inventing; we are +adapting. + +**Actions.** Invoke `/research`. Three engines are the usual +references: + +- **Servo + stylo** — Rust, most readable. Especially useful for + parsing, cascade, inheritance, computed-value rules. +- **Chromium / Blink** — C++. Authoritative for layout and paint + divergence calls. The renderer we diff against. +- **WebKit** — C++. Third voice; useful when Blink has + controversial behavior (Safari-only bugs / features). + +For a new property, read the **spec first** (CSS Backgrounds, +CSS Display, CSS Values 4, etc.). Then look up: + +- How stylo represents the property's computed value. +- How Blink paints or lays out against that representation. +- What WPT section exercises it (for free fixtures later). + +**Deliverable.** A research note — either inline in the PR +description or under `docs/wg/feat-2d/` if substantial — with: + +- The spec section(s) that govern behavior. +- The 3–6 line summary of how stylo/Blink structure the solution. +- The explicit deviation, if any, and why. + +**Exit when.** You can defend the implementation shape by pointing +at prior art, not just "it compiles and the fixture passes." If +the only justification is the fixture, you've over-fit. + +### 3. Fixture — "what's the smallest test that proves it?" + +**Question.** What HTML/CSS input demonstrates the feature +unambiguously, and what does the ideal rendered output look like? + +**Actions.** Invoke `/fixtures` for authoring rules; `/cg-reftest` +for the suite manifest. In short: + +- One concept per file. `paint--.html` naming. +- Probe-friendly palette (≤3 colors, round coordinates) when the + feature is pixel-precision rather than paint-rich. +- **Paint vs. layout decision.** Paint fixtures fix body size to + the preset (via `min-height`); layout fixtures let content size + itself and carry an explicit `viewport` in the suite entry. + See `fixtures/test-html/README.md`. +- Inject `hide-text.css` via `extra_css` when text is incidental + (labels for humans, not the subject under test). This is the + single biggest lever against noise. +- WPT fixtures are fair game — prefer pulling an established WPT + test into the suite over authoring one from scratch when the + section is mature. + +**Deliverable.** + +- One or more fixtures under `fixtures/test-html/L0/`. +- Entries in `fixtures/test-html/suites/L0.coverage.json`. Only + put in `L0.exact.json` after verify phase confirms 100.00%. +- For layout fixtures: the measured `viewport.height` from the cg + natural cull. + +**Exit when.** The fixture runs through both producers and +produces PNGs of identical dimensions. Dimension mismatch → stop; +the suite config is wrong and the score will be zero. + +### 4. Implement — "what code change realizes the behavior?" + +**Question.** What is the minimum set of edits in +`crates/grida-canvas/src/htmlcss/` to make the fixture match? + +**Actions.** + +- Touch the smallest surface that can possibly work. Avoid + "refactor + feature" in one commit; the reftest cannot tell you + which change caused which delta. +- Trace the pipeline end-to-end for the property: + parse → compute → layout feed → paint. A feature can fail at + any stage; diagnose before editing. +- Add unit tests where behavior is data-assertable (computed + value, resolved length, layout position). Data tests are free + and catch regressions the reftest can't (e.g. "this resolves + to `12px` in _both_ Chromium and us, for the right reason"). +- When in doubt, mirror the Blink / stylo structure. Deviations + cost reviewer attention; prior-art parity is free. + +**Deliverable.** + +- Code change scoped to the feature. +- Any new data tests for the computed-value surface. +- A one-line entry in the PR description for each user-facing + behavior change, written in spec terms, not implementation + terms. + +**Exit when.** `cargo check -p cg` is clean, existing tests pass, +and the fixture renders through `golden_htmlcss --suite` without +error. Similarity score is measured in phase 5 — do not gate on +it here. + +### 5. Verify — "does it actually match Chromium?" + +**Question.** Is the rendered output Chromium-parity at the +fixture's tolerance gate? + +**Actions.** This is `/cg-reftest`'s core loop. For each fixture +in the change: + +1. Render expecteds (Playwright Chromium) into + `target/refbrowser//expected`. +2. Render actuals (`cargo run -p cg --example golden_htmlcss -- +--suite …`). +3. Diff with `@grida/reftest`, threshold 0 (the strict default). +4. Read similarity against the suite's `gate.floor`. + +**Don't trust the score naively** — see "Reading the score" in the +cg-reftest skill. A 96% score on a sparse fixture can mask a +completely broken subject. Eyeball the diff PNG every time. A +single round of verification without visual inspection is not +verification. + +**Close the loop:** + +- Score ≥ `gate.floor`? Promote the fixture to `L0.exact.json` + if it reached 100.00%; otherwise leave in coverage and document + the residual delta in the PR description. +- Score < floor? Return to phase 3 (fixture too noisy / wrong + subject) or phase 4 (renderer bug) with a specific hypothesis. + Do _not_ lower the gate to fit the result; the gate exists so + regressions are loud. + +**Deliverable.** The PR description, written honestly: + +- Before/after similarity numbers for every affected fixture. +- Diff PNGs attached or linked for any score < 1.0. +- The specific divergence surface (rounding, AA, layout math, + etc.) if below 100.00%. "Renderer choice differs from Blink at + " beats "close enough." + +**Exit when.** The PR description can be read by someone who has +never seen the code and they know exactly what's now supported, +what's still broken, and what the score proves. + +--- + +## Handoffs and artifacts + +The phases are designed so an agent can stop, a second agent can +pick up, and no context is lost. The durable artifacts: + +| Phase | Artifact | Location | +| --------- | -------------------------------------------------------- | ------------------------------------------------------ | +| Audit | Current-state note, priority bucket | PR description / task prompt | +| Ground | Research note (spec + engine cross-ref) | PR description or `docs/wg/feat-2d/` | +| Fixture | `.html` fixture(s), suite entries, viewport measurement | `fixtures/test-html/L0/`, `fixtures/test-html/suites/` | +| Implement | Code change, data tests, behavior summary | `crates/grida-canvas/src/htmlcss/` | +| Verify | Before/after scores, diff PNG review, divergence surface | PR description | + +If a phase's artifact is missing, the phase isn't done — even if +the code "works." + +--- + +## Gate policy — the part that makes automation safe + +The only reason this loop can be automated is that phase 5 has a +**numeric, unambiguous, byte-exact** pass condition. Everything +upstream is advisory; verify is the truth. + +- `L0.exact.json`: `gate.floor = 1.0`, `threshold = 0`, `aa = off`. + Any regression is a real renderer change we made differently + from Blink. No tolerance inflation — ever. +- `L0.coverage.json`: informational scores, no gate. Landing a + fixture here is "we know about this case and intend to fix it." + Promoting to exact is "we now match Blink." + +Automation rules downstream of this prompt (CI gating, auto-merge, +etc.) must assert on the `report.json` emitted by `@grida/reftest` +and **not** on free-text agent assertions. The agent's job is to +drive the loop; the report is the contract. + +### What "destructive" means here + +A change is destructive if it: + +- Lowers `gate.floor` in `L0.exact.json`. +- Removes an entry from `L0.exact.json` without a corresponding + `coverage` entry (or documented reason). +- Increases `--threshold` or enables `--aa` to absorb real + divergence. +- Suppresses a fixture to dodge a failing score. + +None of these are acceptable without explicit human approval. The +loop fails loudly instead. + +--- + +## Anti-patterns + +| Anti-pattern | Why it fails | Instead | +| ----------------------------------------------- | ------------------------------------------------------------------------------------------- | ------------------------------------------------------------------ | +| Skipping audit, starting with "fix this bug" | The bug is a symptom; the broken pipeline stage may be a different property. | Trace parse→compute→layout→paint first. Name the stage. | +| Skipping ground, implementing from intuition | CSS is full of non-obvious spec requirements. "Looks right" to a human ≠ spec-correct. | Read the spec. Cross-ref one real engine. | +| Combining refactor + feature in one PR | Reftest deltas can't be attributed. | Land the refactor alone first (score must not drop). | +| Raising threshold to "just pass" | Hides real bugs. Turns the harness into a rubber stamp. | Fix the divergence. If out of scope, document + leave in coverage. | +| Using text-heavy fixtures to test non-text feat | Font shaping noise dominates the score; you're measuring the wrong thing. | Inject `hide-text.css`. Or use probe-friendly fixtures. | +| Promoting to `exact` at 99.xx% | The exact suite is a byte-exact contract. Near-passes belong in coverage with a delta note. | Wait for 100.00%. Or fix the residual. | +| Claiming "verified" without reading the diff | A similarity score is a coarse index; the diff image is the truth. | Eyeball every sub-100 diff. Record the specific divergence. | +| Inventing new fixtures when WPT covers it | Duplicates work; WPT has reviewed spec-intent pass criteria. | Import the WPT fixture; cite it in the suite entry. | + +--- + +## The template — paste this to kick off a cycle + +Fill in the brackets. The agent you hand it to should produce all +five artifacts before declaring done. Expect to run the loop in +passes (audit+ground+fixture → implement → verify), with a +checkpoint at each pass that future-you or a reviewer can read +without the conversation. + +```text +Drive the htmlcss feature loop for: . +Follow .agents/prompts/cg-htmlcss-feature.md. + +Scope: +- Feature: +- Hypothesis: +- Expected: + +Produce, in order: + +1. Audit note: current support level, file references, before-scores. +2. Ground note: spec section(s), stylo/Blink strategy summary. +3. Fixture(s): `.html` + suite entries. Paint or layout? Declare it. +4. Implementation: minimal diff. Data tests where assertable. +5. Verify report: before/after similarity per fixture, diff PNG + review for any sub-1.0 score, promoted fixtures listed. + +Gate: L0.exact must stay at floor 1.0, threshold 0, aa off. Do not +relax the gate. If the feature doesn't reach 100.00%, leave it in +coverage with a specific divergence-surface note. + +Use /research for phase 2, /fixtures for phase 3, /cg-reftest for +phases 3 and 5. +``` diff --git a/.agents/skills/cg-reftest/SKILL.md b/.agents/skills/cg-reftest/SKILL.md index 998683440..5ecd4b19d 100644 --- a/.agents/skills/cg-reftest/SKILL.md +++ b/.agents/skills/cg-reftest/SKILL.md @@ -28,25 +28,26 @@ How to design, name, and review visual rendering tests in this repo. Use these terms precisely. Misusing them erodes trust in test results. -| Term | Definition | -| ---------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| **Reftest** | A test that compares renderer output against an **independent reference** (oracle) whose correctness is established outside this project — e.g. a W3C-provided PNG for an SVG test case. The oracle is the source of truth; a mismatch means our renderer is wrong. | -| **Independent reference / oracle** | A rendering produced by a separate, trusted implementation or defined by a specification. We do not control its content. | -| **Golden test** | A test that compares renderer output against a **previously accepted snapshot** produced by our own renderer. There is no external truth — the golden file _is_ the expected output because a human reviewed and approved it. Also called a snapshot test. | -| **Snapshot test** | Synonym for golden test. The snapshot is a frozen output that we assert has not changed. | -| **Render regression test** | Any test whose purpose is to detect _unintended changes_ in rendering output. Golden tests are regression tests. Reftests are correctness tests. | -| **Pixel diff** | Byte-level comparison of two raster images. A single differing channel value is a failure (at zero tolerance). | -| **Perceptual diff** | Comparison in a perceptual color space (e.g. YIQ via the `dify` crate). Weights differences by human visual sensitivity. More forgiving than raw pixel diff but still quantifiable. | -| **rendiff** | Rust crate (`rendiff` v0.2) for histogram-based pixel diffing. Computes a per-channel difference histogram; thresholds are expressed as `[(max_diff, max_count), ...]` pairs. Used in `flatten_rendiff.rs` for equivalence tests. Dep in `crates/grida-canvas/`. | -| **dify** | Rust crate for perceptual image comparison in YIQ color space. Used by `grida-dev reftest` for SVG reftests. Supports `--threshold` and `--aa` (anti-aliasing detection) flags. | -| **pixelmatch** | Pure-JS perceptual image comparison library. YIQ-based, AA-aware. Used by `@grida/reftest`. Zero native deps; same conceptual model as dify, slightly different threshold semantics — see parity notes below. | -| **`@grida/reftest`** | General-purpose, language-agnostic TS reftest CLI + library at `packages/grida-reftest/`. Takes two directories of PNGs, diffs, scores, writes the same bucket layout and JSON report as the Rust `grida-dev reftest`. Does NOT render anything — producers upstream. | -| **`grida-dev reftest`** | Rust reftest runner at `crates/grida-dev/src/reftest/`. SVG-specific: renders SVG via our own cg pipeline, then diffs against a reference PNG. Canonical for SVG. For non-SVG formats, use `@grida/reftest` with an upstream renderer. | -| **refig** | Short for "Figma reftest." Fixture suites under `fixtures/local/refig/` containing `.fig` + `document.json` + `images/` + `exports/` (oracle PNGs from Figma's Images API). Consumed by a TS render step + `@grida/reftest`. See `fixtures/local/refig/README.md`. | -| **Tolerance / fuzz** | A configured threshold below which pixel differences are ignored. Expressed as a histogram threshold (rendiff) or a YIQ distance (dify / pixelmatch). Required when rasterization is non-deterministic across platforms. | -| **Data test** | A test that asserts on the scene graph or computed values directly — no rendering needed. E.g. bounding box, resolved transform matrix, computed style. The cheapest possible assertion. | -| **Probe test** | A test that asserts correctness by reading pixel values at specific coordinates in the rendered output. Requires a purpose-built fixture with a minimal color palette and documented probe points. No full-image comparison needed. | -| **Probe-friendly fixture** | A fixture explicitly designed for probe testing: minimal colors, no decorative elements, shapes at known coordinates. Often accompanied by a `.probe.json` file declaring expected pixel values at specific points. | +| Term | Definition | +| ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **Reftest** | A test that compares renderer output against an **independent reference** (oracle) whose correctness is established outside this project — e.g. a W3C-provided PNG for an SVG test case. The oracle is the source of truth; a mismatch means our renderer is wrong. | +| **Independent reference / oracle** | A rendering produced by a separate, trusted implementation or defined by a specification. We do not control its content. | +| **Golden test** | A test that compares renderer output against a **previously accepted snapshot** produced by our own renderer. There is no external truth — the golden file _is_ the expected output because a human reviewed and approved it. Also called a snapshot test. | +| **Snapshot test** | Synonym for golden test. The snapshot is a frozen output that we assert has not changed. | +| **Render regression test** | Any test whose purpose is to detect _unintended changes_ in rendering output. Golden tests are regression tests. Reftests are correctness tests. | +| **Pixel diff** | Byte-level comparison of two raster images. A single differing channel value is a failure (at zero tolerance). | +| **Perceptual diff** | Comparison in a perceptual color space (e.g. YIQ via the `dify` crate). Weights differences by human visual sensitivity. More forgiving than raw pixel diff but still quantifiable. | +| **rendiff** | Rust crate (`rendiff` v0.2) for histogram-based pixel diffing. Computes a per-channel difference histogram; thresholds are expressed as `[(max_diff, max_count), ...]` pairs. Used in `flatten_rendiff.rs` for equivalence tests. Dep in `crates/grida-canvas/`. | +| **dify** | Rust crate for perceptual image comparison in YIQ color space. Used by `grida-dev reftest` for SVG reftests. Supports `--threshold` and `--aa` (anti-aliasing detection) flags. | +| **pixelmatch** | Pure-JS perceptual image comparison library. YIQ-based, AA-aware. Used by `@grida/reftest`. Zero native deps; same conceptual model as dify, slightly different threshold semantics — see parity notes below. | +| **`@grida/reftest`** | General-purpose, language-agnostic TS reftest CLI + library at `packages/grida-reftest/`. Takes two directories of PNGs, diffs, scores, writes the same bucket layout and JSON report as the Rust `grida-dev reftest`. Does NOT render anything — producers upstream. | +| **`grida-dev reftest`** | Rust reftest runner at `crates/grida-dev/src/reftest/`. SVG-specific: renders SVG via our own cg pipeline, then diffs against a reference PNG. Canonical for SVG. For non-SVG formats, use `@grida/reftest` with an upstream renderer. | +| **refig** | Short for "Figma reftest." Fixture suites under `fixtures/local/refig/` containing `.fig` + `document.json` + `images/` + `exports/` (oracle PNGs from Figma's Images API). Consumed by a TS render step + `@grida/reftest`. See `fixtures/local/refig/README.md`. | +| **refbrowser** | Short for "headless-browser reftest." HTML/CSS fixtures under `fixtures/test-html/L0/` rendered by Playwright Chromium as the oracle vs. our `cg` htmlcss renderer. Producer script: `.agents/skills/cg-reftest/scripts/refbrowser_render.ts`; diff via `@grida/reftest`. | +| **Tolerance / fuzz** | A configured threshold below which pixel differences are ignored. Expressed as a histogram threshold (rendiff) or a YIQ distance (dify / pixelmatch). Required when rasterization is non-deterministic across platforms. | +| **Data test** | A test that asserts on the scene graph or computed values directly — no rendering needed. E.g. bounding box, resolved transform matrix, computed style. The cheapest possible assertion. | +| **Probe test** | A test that asserts correctness by reading pixel values at specific coordinates in the rendered output. Requires a purpose-built fixture with a minimal color palette and documented probe points. No full-image comparison needed. | +| **Probe-friendly fixture** | A fixture explicitly designed for probe testing: minimal colors, no decorative elements, shapes at known coordinates. Often accompanied by a `.probe.json` file declaring expected pixel values at specific points. | --- @@ -266,6 +267,301 @@ is the threshold where the renderer needs attention. `document.json` — not from a suite-wide viewport. The render step must honor each node's preset. +### HTML/CSS — the refbrowser reftest pipeline + +HTML/CSS fixtures have **no pre-baked oracle** — the oracle is a real +browser engine. Like refig, refbrowser renders the same fixture in two +places and diffs the PNGs; unlike refig, both renders are reproducible +locally (no cloud round-trip). + +``` +fixtures/test-html/ +├── L0/.html ── fixtures +├── _reftest/hide-text.css ── shared helper stylesheets +└── suites/ + ├── L0.exact.json ── must pass 100.00%; CI gate + └── L0.coverage.json ── aspirational scope; tracks progress + + │ + ├── cargo run -p cg --example golden_htmlcss -- --suite + │ └─► $TMPDIR/grida-htmlcss-goldens/.png (cg actual) + │ + └── refbrowser_render.ts --suite + └─► target/refbrowser//expected/.png (Chromium oracle) + + ▼ + reftest --actual-dir … --expected-dir … --threshold 0 + └─► target/reftests//report.json + buckets +``` + +**Oracle**: headless Chromium via Playwright. Chromium's Blink is the +reference implementation for most CSS features; divergence from Blink +is a gap in our `cg` htmlcss pipeline (or a known difference documented +in `docs/wg/feat-2d/htmlcss.md`). + +> **See also: web-platform-tests (WPT).** The W3C's +> [wpt.live](https://wpt.live) suite is the standards-body reftest +> harness — same concept as refbrowser, but cross-engine (Blink, +> WebKit, Gecko) and backed by spec-author-written fixtures with +> explicit pass criteria. Consider pulling WPT fixtures into +> `fixtures/test-html/` when a CSS feature has a mature WPT section +> and you want spec-conformance signal rather than just "matches +> Chromium." Out of scope for this skill today; refbrowser is the +> faster local loop. + +#### Suites: `L0.exact` vs `L0.coverage` + +Everything is driven by **suite JSON files** at +`fixtures/test-html/suites/`. A suite enumerates fixtures, their +per-fixture render config, and the gate policy. + +| Suite | What it contains | Gate | +| ------------------ | --------------------------------------------------------------------------------------------- | -------------------------- | +| `L0.exact.json` | Fixtures currently at 100.00% byte-exact parity with Chromium. Any drop is a real regression. | `floor: 1.0`, strict diff. | +| `L0.coverage.json` | All aspirational L0 fixtures — the full backlog. Scores land wherever they land. | Informational only. | + +**Promoting a fixture to `exact`** — once a fixture reaches 100.00% +against the current suite config, move its entry from `coverage` → +`exact`. Do **not** lower the exact suite's floor to fit new entries; +the bar exists so regressions are loud. + +Per-fixture `.reftest.json` sidecars **do not exist** anymore. All +config lives in the suite file. + +#### Suite JSON shape + +```json +{ + "name": "L0.exact", + "description": "Byte-exact fixtures; any drop = regression.", + "gate": { "threshold": 0, "aa": false, "floor": 1.0 }, + "defaults": { + "wait_for": ["fonts", "networkidle"], + "extra_css": ["../_reftest/hide-text.css"], + "full_page": true + }, + "fixtures": [ + { + "path": "../L0/box-dimensions.html", + "viewport": { "width": 600, "height": 522 } + } + ] +} +``` + +- `defaults` — applied to every fixture. Each fixture entry can override any field. +- `fixtures[].path` and every `extra_css[]` path resolve **relative to the suite file**. +- `viewport.height` must match cg's cull height for the diff to succeed; render cg once and read `WxH` to calibrate. +- `gate.threshold` / `gate.aa` are inputs to the pixelmatch diff; `gate.floor` is the aggregate pass bar on similarity. + +#### The three-step pipeline + +**1. Render expecteds (browser oracle)** + +```sh +# one-time: install Chromium for Playwright +pnpm --filter @grida/reftest exec playwright install chromium + +# render the whole suite +pnpm --filter @grida/reftest exec tsx \ + .agents/skills/cg-reftest/scripts/refbrowser_render.ts \ + --suite fixtures/test-html/suites/L0.exact.json \ + --out-dir target/refbrowser/L0.exact/expected +``` + +Ad-hoc single-file render (no suite, defaults only) — useful while authoring a fixture: + +```sh +pnpm --filter @grida/reftest exec tsx \ + .agents/skills/cg-reftest/scripts/refbrowser_render.ts \ + --fixture fixtures/test-html/L0/paint-background-solid.html \ + --out-dir /tmp/refbrowser-verify +``` + +**2. Render actuals (our pipeline)** — the `golden_htmlcss` example +reads the same suite JSON, resolves `extra_css` relative to the suite +file, and applies each stylesheet via +`htmlcss::with_extra_stylesheets` before rendering, so the cascade is +symmetric with Chromium. + +```sh +cargo run -p cg --example golden_htmlcss -- \ + --suite fixtures/test-html/suites/L0.exact.json + +mkdir -p target/refbrowser/L0.exact/actual +cp "${TMPDIR:-/tmp}/grida-htmlcss-goldens/"*.png target/refbrowser/L0.exact/actual/ +``` + +**3. Diff via `@grida/reftest`** — format-agnostic, same bucket layout +and `report.json` schema as the Rust and refig runners. + +Default refbrowser diff: **`--threshold 0`** (pixelmatch strictest, +AA off). Pass each fixture's similarity against the suite's +`gate.floor` — for `L0.exact`, that's `1.0` (100.00% byte-exact). + +```sh +pnpm --filter @grida/reftest exec reftest \ + --actual-dir target/refbrowser/L0.exact/actual \ + --expected-dir target/refbrowser/L0.exact/expected \ + --output-dir target/reftests/L0.exact \ + --bg white \ + --threshold 0 +``` + +> **Gate enforcement is not yet wired into the CLI.** Today, read +> `report.json` and assert every `tests[].similarity_score ≥ +gate.floor` in a wrapper script or CI step. A `--suite` flag on +> `@grida/reftest` that does this automatically is a pending +> follow-up. + +Output: `S99/S95/S90/S75/err/` bucket directories + `report.json`. +Pass bar: the suite's `gate.floor`. For `L0.exact`, anything below +100.00% is a real divergence from Blink (rounding policy, layout +math, AA emission, etc.) — not noise. See "Reading the score" below. + +### Reading the score — do not trust it naively + +The similarity score is `1 - diff_pixels / scoring_pixels`, where +`scoring_pixels ≈ width × height` of the screenshot. **The denominator +is the whole canvas, not the subject under test.** + +This has two consequences you must internalize before reading any +report: + +1. **Background dominates the score.** A fixture that paints a + 100×100 subject on a 600×800 canvas has 92% background. A renderer + that emits _nothing_ for the subject still scores ~92%. A + renderer that paints the subject at 50% accuracy scores ~96%. + Neither number means what it naively looks like. +2. **Small fixtures inflate. Full-bleed fixtures are honest.** A + card-in-corner composition will always look "good" on the score + even when broken; a composition that fills the viewport gives + numeric feedback proportional to real error. + +**Fixture-authoring rule:** size the fixture so the subject under +test fills as much of the canvas as practical. Viewport height +tuned to the subject's bounding box (via the suite entry's +`viewport.height`) is the usual lever. Padding/margins around the +subject are scoring dead weight — use them only when the test is +_about_ spacing. + +**Reviewing rule:** never report a similarity number without +eyeballing the diff PNG. A 96% score on a sparse fixture and a 96% +score on a full-bleed fixture are _orders of magnitude_ apart in +severity. The diff image is the source of truth; the score is a +coarse index. + +For a true "fraction of the subject that matches," author a +probe-friendly fixture (see the probe test section) and assert on +specific pixels, or mask the background to transparent so +`mask: alpha` counts only subject pixels. Plain refbrowser scores +cannot give you that signal. + +**Per-fixture fields inside a suite entry** — all optional, +defaults shown; any field set on an entry overrides `defaults`. + +```json +{ + "path": "../L0/.html", + "viewport": { "width": 600, "height": 800 }, + "wait_for": ["fonts", "networkidle"], + "extra_css": [], + "full_page": true +} +``` + +- `viewport` — Chromium viewport (px). Set height to match cg's cull + height; mismatched dims score 0.0 at diff time (`@grida/reftest` + requires identical dimensions). +- `wait_for` — `"fonts"` awaits `document.fonts.ready`, `"networkidle"` + awaits 500ms of no-network-activity. +- `extra_css` — CSS files to inject into **both** sides. Paths resolve + relative to the suite file. Playwright applies them via `addStyleTag`; + cg applies them via `htmlcss::with_extra_stylesheets` before rendering, + so the cascade is symmetric. Fields only meaningful to Chromium + (`viewport`, `wait_for`, `full_page`) are ignored by cg. +- `full_page` — capture full scrollable area (default) vs. viewport. + +**Pre-built helper stylesheets** under `fixtures/test-html/_reftest/`: + +| File | Effect | +| --------------- | ------------------------------------------------------------------------------------------------------------------------------ | +| `hide-text.css` | `color: transparent` + `line-height: 1`. Zeros glyph coverage and pins line-box height. Use when a fixture isn't testing text. | + +Add more helpers here as divergence patterns emerge. Keep each one +scoped to a single concern (hide text, normalize scrollbars, force +web fonts, etc.) so suites can compose them. + +**When to reach for `hide-text.css`** — any fixture whose subject is +paint, layout, box model, flex, grid, or positioning. The text in +those fixtures is typically decorative labels; its glyph rendering +and `line-height: normal` metrics diverge between Blink and Skia and +will dominate the diff otherwise. + +**When NOT to use it** — fixtures whose subject IS text: +`text-decoration`, `text-shadow`, `text-align`, bidi, `writing-mode`, +font features. For these, leave `extra_css` empty and accept a +below-100 score; the reftest's value there is human review of the +diff image, not the numeric score. + +**Authoring workflow** for a new fixture: + +1. Write the `.html` fixture under `fixtures/test-html/L0/`. +2. Add an entry to `suites/L0.coverage.json` with at least + `{ "path": "../L0/.html" }`. +3. Render it once via `--suite L0.coverage.json` on the cg side; note + the reported `WxH` in the log. +4. Set `viewport.height = H` on the entry; add `extra_css` helpers if + relevant (e.g. `hide-text.css` for non-text fixtures). `defaults` + in the suite likely already cover the common case. +5. Run the refbrowser producer + diff against the same suite. Review + the diff PNG — if the diff is dominated by a known divergence zone + (see below), record it in the PR description, don't suppress it. +6. If the fixture reaches 100.00% byte-exact, move its entry from + `L0.coverage.json` to `L0.exact.json`. + +**Known divergence surfaces** — areas where cg is not yet Blink-exact. +These are **backlog items, not tolerance excuses**. Do not tune +thresholds to suppress them. Document the specific divergence in the +PR description; let the score carry the truth. + +- **Alpha compositing rounding** — `rgba()` backgrounds, `opacity`. + cg and Blink choose different rounding rules (half-up vs banker's, + premul vs straight, operand order), producing 1-unit channel + deltas. Small-delta territory but still a real policy divergence. +- **Layout math under non-uniform padding / intrinsic sizing** — + block widths resolving 1-3 px off when computed through flex + children, asymmetric padding, or `width: auto` on transparent + content. Shows up as diff brackets at box edges. +- **Text** — glyph rasterization (shaper version, subpixel positioning, + hinting) and line-box metrics (`line-height: normal` ascent/descent) + diverge. For non-text fixtures inject `hide-text.css`. For + text-subject fixtures, accept a below-100 score and rely on the + diff image for review. +- **Antialiasing on curves** — rounded corners, circles, ellipses, + stroke ends. cg's path flattener emits different coverage values + than Blink's for the same geometry. +- **Percentage border-radius** — `border-radius: 50%` and the `H / V` + two-value form currently render as square in cg. Fixed-length radii + (`12px`, `9999px`) work. +- **Gradients** — linear, radial, conic, repeating. Color-stop + interpolation and color-space handling differ; banding and + transition boundaries don't match. +- **Filters and shadows** — `filter: blur`, `backdrop-filter`, + `box-shadow` with large blur radii. Kernel and sampling divergence + dominates scores. +- **`` fallbacks** — our `ImageProvider` renders a placeholder + rect; Chromium renders broken-image chrome. Prefer fixtures with + real image fills or none. +- **System-font fallback** — bundle fonts with `@font-face` + local + paths when the fixture specifically tests font rendering. +- **Scrollbar width** — default `full_page: true` captures document + height and sidesteps scrollbar chrome; flip only when testing + scrollbar geometry. +- **Dimension drift** — changing a fixture's layout invalidates its + `viewport.height` in the suite entry. Re-run `golden_htmlcss` with + `--suite`, update the entry's `viewport.height`, re-run refbrowser. + **Oracle type summary:** | Input format | Oracle source | Test type | @@ -274,8 +570,135 @@ is the threshold where the renderer needs attention. | SVG (arbitrary, no PNG) | resvg-rendered PNG | Reftest | | SVG (Grida extensions) | Our own prior output | Golden test | | Figma REST / .fig | Figma-exported PNG | Reftest | +| HTML / CSS (embed) | Playwright Chromium PNG | Reftest | | `.grida` native | Our own prior output | Golden test | +--- + +## Heuristic techniques (future work) + +Two techniques that scale reftesting beyond "fixture in, score out." +Both are format-agnostic — they apply anywhere we control the oracle +pipeline (refbrowser, refsvg-via-resvg), and both are **unimplemented +today**. They're documented here so the design is shared before +anyone starts building. + +### Subtree bisection — diff attribution + +> Aliases: _diff attribution_, _culprit isolation_. Delta debugging +> applied to rendering. +> +> **TODO — tooling not ready.** Manual application only today. + +A reftest gives you a single similarity score and a diff PNG. For a +minimal fixture that's enough — you eyeball the diff and the culprit +is obvious. As fixtures scale (multi-element compositions, nested +layout, overlapping subtrees), you know _that_ there's a divergence +but not _which_ element owns it. + +**The technique** narrows "something in this fixture diverges" to +"this specific element's rendering is wrong," in two modes: + +1. **Region → element (fast path).** Extract the bbox of high-delta + regions from the diff PNG (connected-components or simple + threshold pass). Match each bbox against element bounds in the + fixture — confidently possible when elements are absolutely + positioned or when the layout tree has dumped bounds available. + One-shot lookup; names the culprit directly. + +2. **Isolation bisection (slow path).** When region→element is + ambiguous (overlapping elements, pure flow layout), generate + temporary scoped-down fixtures by injecting override CSS that + hides all siblings / cousins of a candidate subtree + (`display: none` on the rest, or `visibility: hidden` if + layout must be preserved). Re-run the reftest on each isolated + view. Iterate through the element tree to produce per-subtree + scores and converge on the offending node. + +The two-path split matters because mode (1) is O(1) in reftest runs +and mode (2) is O(log n) at best — prefer (1) whenever bbox→element +is unambiguous. + +**Applicability.** + +| Reftest | Oracle controllable? | Subtree bisection viable? | +| -------------- | -------------------- | ------------------------- | +| refbrowser | Yes (Playwright) | ✅ Yes | +| refsvg (resvg) | Yes (local CLI) | ✅ Yes | +| W3C SVG suite | No (pre-baked PNG) | ❌ No | +| refig (Figma) | No (manual export) | ❌ No | + +Figma is explicitly out: isolating a subtree would require +re-exporting from the Figma app, which is an upstream human step. + +**Tooling shape (when built).** A script that: + +1. Reads a reftest's diff PNG. +2. Extracts high-delta bounding boxes. +3. Attempts region→element match against a parsed fixture tree. +4. On ambiguity, writes override CSS for each candidate subtree, + re-runs the producer + diff, accumulates per-subtree scores. +5. Outputs a JSON report keyed by element selector, with a score + and a small preview diff per subtree. + +Not unique to htmlcss — the same pattern works for any tree-structured +oracle with controllable input (SVG `` subtrees, scene graph nodes +in .grida, etc.). + +### Viewport sweep — width-matrix for layout fixtures + +> Aliases: _width sweep_, _responsive sweep_, _width matrix_. +> +> **TODO — tooling not ready.** Single-width runs only today. + +A single-viewport reftest catches a layout bug at that one width. It +misses bugs that only manifest at a different width — which for CSS +layout is most bugs (flex basis resolution, wrap points, grid +`auto-fill`, `min-content` / `max-content` interaction, +percentage-sized children against unusual parent widths). + +**The technique.** Render the same layout fixture at a list of +viewport widths and diff each independently. A typical sweep: + +``` +widths: [320, 600, 768, 1024, 1280] // mobile → desktop span +``` + +Produces N PNG pairs per fixture and N similarity scores. A fixture +passes only if _every_ width passes. + +**Why width, not height.** CSS content flows vertically as a function +of the containing block's width; height is mostly an output, not an +input. Width variance exercises most layout regimes. Height variance +is relevant only for `min-height`/`max-height`/vh-based cases, which +are narrower and better covered by dedicated single-width fixtures. + +**Applicability.** Layout-category fixtures only. Paint fixtures +(color, opacity, shadow, gradient, border-radius) render a fixed-size +subject inside a fixed canvas — sweeping widths adds no signal and +just multiplies work. + +**Tooling shape (when built).** Suite schema grows a `widths` array +on layout entries: + +```json +{ + "path": "../L0/box-dimensions.html", + "widths": [320, 600, 1024] +} +``` + +Producers loop over `widths`, emitting PNGs named +`@.png`. `@grida/reftest` treats each as a separate +test. No per-width `viewport.height` — let each width produce its +natural cull height (the measurement _is_ the output). + +This technique is also format-agnostic — responsive SVG, responsive +refbrowser, and responsive .grida scenes all benefit from the same +width-sweep harness. + +--- + ### Golden tests — native/proprietary/internal formats Use when **no external truth exists**: @@ -582,6 +1005,38 @@ pnpm --filter @grida/reftest exec reftest \ In a PR: _"refig(refig-standard): auto-layout row spacing fix, average similarity 0.81 → 0.94, 612 tests S75→S95."_ +### True reftest — HTML/CSS refbrowser against Playwright Chromium + +```bash +# Pre-requisite: Chromium installed for Playwright +pnpm --filter @grida/reftest exec playwright install chromium + +# 1. Render expecteds via Playwright Chromium +pnpm --filter @grida/reftest exec tsx .agents/skills/cg-reftest/scripts/refbrowser_render.ts \ + --suite fixtures/test-html/suites/L0.exact.json \ + --out-dir target/refbrowser/expected + +# 2. Render actuals via our cg pipeline +cargo run -p cg --example golden_htmlcss -- \ + --suite fixtures/test-html/suites/L0.exact.json +mkdir -p target/refbrowser/actual +cp "${TMPDIR:-/tmp}/grida-htmlcss-goldens/"*.png target/refbrowser/actual/ + +# 3. Diff actuals against Chromium oracle, write bucketed report +pnpm --filter @grida/reftest exec reftest \ + --actual-dir target/refbrowser/actual \ + --expected-dir target/refbrowser/expected \ + --output-dir target/reftests/htmlcss \ + --bg white + +# Result: target/reftests/htmlcss/report.json + S99/S95/S90/S75/err/ buckets. +# A score < 1.0 means our htmlcss renderer diverges from Chromium. +# This is a genuine reftest — Playwright Chromium is the oracle. +``` + +In a PR: _"refbrowser(htmlcss): background-repeat space/round landed, +average similarity 0.72 → 0.91 across 6 repeat fixtures."_ + ### Golden/regression test — custom effect ```bash diff --git a/.agents/skills/cg-reftest/scripts/refbrowser_render.ts b/.agents/skills/cg-reftest/scripts/refbrowser_render.ts new file mode 100644 index 000000000..46b88178e --- /dev/null +++ b/.agents/skills/cg-reftest/scripts/refbrowser_render.ts @@ -0,0 +1,314 @@ +#!/usr/bin/env -S pnpm dlx tsx +/** + * refbrowser_render.ts — headless Chromium oracle for HTML/CSS reftests. + * + * Renders each fixture in a suite through Playwright's Chromium and + * writes a PNG per fixture to `--out-dir`. The output is the reference + * oracle for cg's htmlcss renderer — Chromium is the ground truth. + * + * ┌────────────────┐ ┌─────────────────────┐ ┌──────────────────┐ + * │ .json │ → │ Playwright Chromium │ → │ expected/.png │ + * │ + helper CSS │ │ (full-page screen) │ │ │ + * └────────────────┘ └─────────────────────┘ └──────────────────┘ + * + * Pair with `cargo run -p cg --example golden_htmlcss --suite` on the + * actual side, then diff via `@grida/reftest`. + * + * ## Usage + * + * ```sh + * pnpm --filter @grida/reftest exec tsx \ + * .agents/skills/cg-reftest/scripts/refbrowser_render.ts \ + * --suite fixtures/test-html/suites/L0.exact.json \ + * --out-dir target/refbrowser/L0.exact/expected + * ``` + * + * Ad-hoc single-file render (no suite, defaults only): + * + * ```sh + * pnpm --filter @grida/reftest exec tsx \ + * .agents/skills/cg-reftest/scripts/refbrowser_render.ts \ + * --fixture fixtures/test-html/L0/paint-background-solid.html \ + * --out-dir /tmp/refbrowser-verify + * ``` + * + * ## Dependencies + * + * - Node 20+. + * - `@playwright/test` (devDependency of `@grida/reftest`). + * - Chromium binary (one-time): + * `pnpm --filter @grida/reftest exec playwright install chromium` + * + * ## Suite JSON shape + * + * ```json + * { + * "name": "L0.exact", + * "gate": { "threshold": 0, "aa": false, "floor": 1.0 }, + * "defaults": { + * "viewport": { "width": 600, "height": 800 }, + * "wait_for": ["fonts", "networkidle"], + * "extra_css": ["../_reftest/hide-text.css"], + * "full_page": true + * }, + * "fixtures": [ + * { "path": "../L0/box-dimensions.html", + * "viewport": { "width": 600, "height": 522 } } + * ] + * } + * ``` + * + * Per-fixture entries inherit and override `defaults` field-by-field. + * All paths (`fixtures[].path`, `extra_css[]`) resolve **relative to + * the suite file**. `gate` is consumed by the diff step, not here. + * + * ## Caveats + * + * - `document.fonts.ready` waits for ``/inline `@font-face` loads; + * system-font fallbacks still differ from Skia. Inject + * `_reftest/hide-text.css` for non-text fixtures. + * - Each fixture is rendered in a fresh incognito context. + */ +import { promises as fs } from "node:fs"; +import * as path from "node:path"; +import { fileURLToPath, pathToFileURL } from "node:url"; +// `@playwright/test` re-exports `chromium` from `playwright-core` and is +// the package actually installed in this repo (via `editor`). Using it +// here avoids depending on a separate `playwright` package. +import { chromium, type Browser, type BrowserContext } from "@playwright/test"; + +type FixtureConfig = { + viewport?: { width?: number; height?: number }; + wait_for?: Array<"fonts" | "networkidle">; + extra_css?: string[]; + full_page?: boolean; +}; + +type FixtureEntry = FixtureConfig & { path: string }; + +type SuiteFile = { + name?: string; + description?: string; + gate?: unknown; // consumed by the diff step, not here + defaults?: FixtureConfig; + fixtures: FixtureEntry[]; +}; + +type ResolvedConfig = { + viewport: { width: number; height: number }; + wait_for: Array<"fonts" | "networkidle">; + extra_css: string[]; + full_page: boolean; +}; + +const DEFAULTS: ResolvedConfig = { + viewport: { width: 600, height: 800 }, + wait_for: ["fonts", "networkidle"], + extra_css: [], + full_page: true, +}; + +function mergeConfig( + defaults: FixtureConfig | undefined, + entry: FixtureConfig +): ResolvedConfig { + const pick = (key: K): ResolvedConfig[K] => { + const a = entry[key] as ResolvedConfig[K] | undefined; + const b = defaults?.[key] as ResolvedConfig[K] | undefined; + return (a ?? b ?? DEFAULTS[key]) as ResolvedConfig[K]; + }; + const vp = entry.viewport ?? defaults?.viewport ?? DEFAULTS.viewport; + return { + viewport: { + width: vp?.width ?? DEFAULTS.viewport.width, + height: vp?.height ?? DEFAULTS.viewport.height, + }, + wait_for: pick("wait_for"), + extra_css: pick("extra_css"), + full_page: pick("full_page"), + }; +} + +type Resolved = { + htmlPath: string; + stem: string; + config: ResolvedConfig; +}; + +async function resolveSuite(suitePath: string): Promise { + const raw = await fs.readFile(suitePath, "utf8"); + const suite = JSON.parse(raw) as SuiteFile; + if (!Array.isArray(suite.fixtures)) { + throw new Error(`suite ${suitePath}: missing fixtures[]`); + } + const suiteDir = path.dirname(path.resolve(suitePath)); + return suite.fixtures.map((entry) => { + const htmlPath = path.resolve(suiteDir, entry.path); + const merged = mergeConfig(suite.defaults, entry); + // Resolve extra_css paths relative to the suite file. + const extra_css = merged.extra_css.map((rel) => + path.resolve(suiteDir, rel) + ); + const stem = path.basename(entry.path).replace(/\.html?$/i, ""); + return { htmlPath, stem, config: { ...merged, extra_css } }; + }); +} + +async function loadCssCached( + cache: Map, + abs: string +): Promise { + const hit = cache.get(abs); + if (hit !== undefined) return hit; + try { + const content = await fs.readFile(abs, "utf8"); + cache.set(abs, content); + return content; + } catch (e) { + console.error(` warn: failed to read ${abs}: ${(e as Error).message}`); + return null; + } +} + +async function renderOne( + ctx: BrowserContext, + r: Resolved, + outDir: string, + cssCache: Map +): Promise<{ file: string; cssCount: number }> { + const { htmlPath, stem, config } = r; + + const page = await ctx.newPage(); + await page.setViewportSize(config.viewport); + + // `file://` URL so relative resources resolve from the fixture's dir. + // `pathToFileURL` handles Windows drive letters and percent-encodes + // spaces/non-ASCII, which plain string concatenation does not. + const fileUrl = pathToFileURL(path.resolve(htmlPath)).href; + await page.goto(fileUrl, { waitUntil: "load" }); + + if (config.wait_for.includes("networkidle")) { + await page.waitForLoadState("networkidle"); + } + if (config.wait_for.includes("fonts")) { + // Await inside the page context; `document.fonts.ready` resolves + // to a `FontFaceSet` which Playwright cannot serialize across the + // boundary. Return void so the wait is effective and typed. + await page.evaluate(async () => { + await document.fonts.ready; + }); + } + + let cssCount = 0; + for (const abs of config.extra_css) { + const content = await loadCssCached(cssCache, abs); + if (content !== null) { + await page.addStyleTag({ content }); + cssCount++; + } + } + + const outPath = path.join(outDir, `${stem}.png`); + await fs.mkdir(outDir, { recursive: true }); + + await page.screenshot({ + path: outPath, + fullPage: config.full_page, + animations: "disabled", + caret: "hide", + }); + + await page.close(); + return { file: outPath, cssCount }; +} + +function parseArgs(argv: string[]): { + suite?: string; + fixture?: string; + outDir: string; +} { + const args: Record = {}; + for (let i = 0; i < argv.length; i += 2) { + const key = argv[i]?.replace(/^--/, ""); + const val = argv[i + 1]; + if (key && val) args[key] = val; + } + if (!args["out-dir"]) { + throw new Error("--out-dir is required"); + } + if (!args["suite"] && !args["fixture"]) { + throw new Error("must pass --suite or --fixture "); + } + return { + suite: args["suite"], + fixture: args["fixture"], + outDir: path.resolve(args["out-dir"]), + }; +} + +async function main() { + const args = parseArgs(process.argv.slice(2)); + let resolved: Resolved[]; + if (args.suite) { + resolved = await resolveSuite(args.suite); + console.log( + `refbrowser: rendering ${resolved.length} fixture(s) from ${args.suite}` + ); + } else { + const htmlPath = path.resolve(args.fixture!); + const stem = path.basename(htmlPath).replace(/\.html?$/i, ""); + resolved = [{ htmlPath, stem, config: DEFAULTS }]; + console.log(`refbrowser: rendering 1 fixture (ad-hoc, defaults only)`); + } + console.log(` out-dir: ${args.outDir}`); + + let browser: Browser | null = null; + const cssCache = new Map(); + try { + browser = await chromium.launch(); + + for (const r of resolved) { + const rel = path.relative(process.cwd(), r.htmlPath); + // Fresh incognito context per fixture — no cookie/storage/SW + // leakage between fixtures, so order can't mask real renderer + // changes. + let ctx: BrowserContext | null = null; + try { + ctx = await browser.newContext({ + // Deterministic: force light color-scheme, standard locale/timezone. + colorScheme: "light", + locale: "en-US", + timezoneId: "UTC", + reducedMotion: "reduce", + }); + const { file, cssCount } = await renderOne( + ctx, + r, + args.outDir, + cssCache + ); + const hint = cssCount > 0 ? ` [+${cssCount} css]` : ""; + console.log(` ${rel} → ${file}${hint}`); + } catch (e) { + console.error(` ${rel}: FAILED`); + console.error(e); + process.exitCode = 1; + } finally { + await ctx?.close(); + } + } + } finally { + await browser?.close(); + } +} + +// Only run when invoked directly (not when imported). +const invoked = + process.argv[1] && + fileURLToPath(import.meta.url) === path.resolve(process.argv[1]); +if (invoked) { + main().catch((e) => { + console.error(e); + process.exit(1); + }); +} diff --git a/.agents/skills/fixtures/SKILL.md b/.agents/skills/fixtures/SKILL.md index be1fcee83..953c090a5 100644 --- a/.agents/skills/fixtures/SKILL.md +++ b/.agents/skills/fixtures/SKILL.md @@ -43,6 +43,16 @@ edge case** that the codebase supports or intends to support. This includes: filename alone should tell you what's being tested. - **Labeled specimens.** Within a fixture, label each test case with the value being exercised so both humans and heuristics can identify regions. +- **Match the fixture's subject to the viewport policy.** For refbrowser + fixtures under `fixtures/test-html/`, **paint / visual-property** + fixtures should size their root to a preset viewport (via `min-height`) + so cg's cull and Chromium's screenshot have identical dimensions. + **Layout** fixtures (box-model, flex, grid, intrinsic sizing) must + NOT force a body size — the output dimensions _are_ what the test + measures; a `min-height` hack contaminates the result. See + [`fixtures/test-html/README.md`](../../../fixtures/test-html/README.md) + for the preset list, the paint-vs-layout rule, and the per-fixture + `viewport` workflow for layout tests. - **Don't duplicate.** Before adding a fixture, check if an existing one already covers the behavior. Extend or split rather than duplicate. diff --git a/crates/grida-canvas/examples/golden_htmlcss.rs b/crates/grida-canvas/examples/golden_htmlcss.rs index 38da7915b..81ada4db3 100644 --- a/crates/grida-canvas/examples/golden_htmlcss.rs +++ b/crates/grida-canvas/examples/golden_htmlcss.rs @@ -4,28 +4,127 @@ /// temporary directory (printed to stderr) so generated images don't /// bloat the repository. /// -/// Usage: +/// ## Usage +/// +/// cargo run -p cg --example golden_htmlcss -- \ +/// --suite fixtures/test-html/suites/L0.exact.json +/// /// cargo run -p cg --example golden_htmlcss -- [FILE_OR_DIR...] /// -/// If no arguments given, renders built-in test fixtures. -/// If a directory is given, renders all .html/.htm files in it. +/// If no arguments given, renders built-in L0 fixtures. +/// If FILE_OR_DIR is given, renders ad-hoc (no sidecar config). +/// +/// ## Suite JSON shape +/// +/// { +/// "defaults": { +/// "viewport": { "width": 600, "height": 800 }, +/// "extra_css": ["../_reftest/hide-text.css"] +/// }, +/// "fixtures": [ +/// { "path": "../L0/box-dimensions.html", +/// "viewport": { "width": 600, "height": 522 } } +/// ] +/// } +/// +/// Per-fixture entries inherit and override `defaults`. All paths +/// (`fixtures[].path`, `extra_css[]`) resolve **relative to the suite +/// file**. `gate` and other fields unknown to this tool are ignored. use cg::htmlcss; use cg::resources::ByteStore; use cg::runtime::font_repository::FontRepository; +use serde::Deserialize; use skia_safe::{surfaces, Color}; +use std::collections::HashMap; use std::path::{Path, PathBuf}; use std::sync::{Arc, Mutex}; -fn fonts() -> FontRepository { +fn build_fonts() -> FontRepository { let mut repo = FontRepository::new(Arc::new(Mutex::new(ByteStore::new()))); repo.enable_system_fallback(); repo } -fn render_to_png(html: &str, width: f32, name: &str, out_dir: &Path) { - let fonts = fonts(); +#[derive(Debug, Default, Clone, Copy, Deserialize)] +#[serde(default)] +struct Viewport { + width: Option, + height: Option, +} + +#[derive(Debug, Default, Deserialize)] +#[serde(default)] +struct FixtureConfig { + extra_css: Vec, + viewport: Viewport, +} + +#[derive(Debug, Deserialize)] +struct SuiteEntry { + path: String, + #[serde(default)] + extra_css: Option>, + #[serde(default)] + viewport: Option, +} + +#[derive(Debug, Default, Deserialize)] +#[serde(default)] +struct SuiteFile { + defaults: FixtureConfig, + fixtures: Vec, +} + +const DEFAULT_WIDTH: f32 = 600.0; +const DEFAULT_HEIGHT: f32 = 600.0; + +/// Resolve a fixture entry against suite defaults. Suite-relative +/// paths are anchored at `suite_dir`. Viewport width/height inherit +/// from `defaults` and fall back to the built-in defaults. +fn resolve_entry( + entry: &SuiteEntry, + defaults: &FixtureConfig, + suite_dir: &Path, +) -> (PathBuf, Vec, f32, f32) { + let html = suite_dir.join(&entry.path); + let css_rel: &[String] = entry.extra_css.as_deref().unwrap_or(&defaults.extra_css); + let css_abs: Vec = css_rel.iter().map(|r| suite_dir.join(r)).collect(); + let vp = entry.viewport.unwrap_or(defaults.viewport); + let width = vp + .width + .or(defaults.viewport.width) + .unwrap_or(DEFAULT_WIDTH); + let height = vp + .height + .or(defaults.viewport.height) + .unwrap_or(DEFAULT_HEIGHT); + (html, css_abs, width, height) +} + +/// Populate `cache[abs]` if absent. Missing files warn; absent keys +/// are treated as a no-op at injection time. +fn ensure_css_cached(cache: &mut HashMap, abs: &Path) { + if cache.contains_key(abs) { + return; + } + match std::fs::read_to_string(abs) { + Ok(s) => { + cache.insert(abs.to_path_buf(), s); + } + Err(e) => eprintln!(" warn: failed to read {}: {e}", abs.display()), + } +} + +fn render_to_png( + html: &str, + width: f32, + height: f32, + name: &str, + out_dir: &Path, + fonts: &FontRepository, +) { let picture = - htmlcss::render(html, width, 600.0, &fonts, &htmlcss::NoImages).expect("render failed"); + htmlcss::render(html, width, height, fonts, &htmlcss::NoImages).expect("render failed"); let cull = picture.cull_rect(); let w = cull.width().max(1.0) as i32; let h = cull.height().max(1.0) as i32; @@ -44,42 +143,177 @@ fn render_to_png(html: &str, width: f32, name: &str, out_dir: &Path) { eprintln!(" {name}: {w}x{h} → {}", path.display()); } -fn render_html_file(path: &Path, out_dir: &Path) { - let html = std::fs::read_to_string(path).expect("failed to read HTML file"); - let name = path +fn render_with_extras( + html_path: &Path, + extras_abs: &[PathBuf], + width: f32, + height: f32, + out_dir: &Path, + fonts: &FontRepository, + css_cache: &mut HashMap, +) { + let html = match std::fs::read_to_string(html_path) { + Ok(s) => s, + Err(e) => { + eprintln!(" warn: failed to read {}: {e}", html_path.display()); + return; + } + }; + let name = html_path .file_stem() .map(|s| s.to_string_lossy().to_string()) .unwrap_or_else(|| "unknown".to_string()); - render_to_png(&html, 600.0, &name, out_dir); + + for abs in extras_abs { + ensure_css_cached(css_cache, abs); + } + let extras: Vec<&str> = extras_abs + .iter() + .filter_map(|p| css_cache.get(p).map(String::as_str)) + .collect(); + let html = if extras.is_empty() { + html + } else { + htmlcss::with_extra_stylesheets(&html, &extras) + }; + + render_to_png(&html, width, height, &name, out_dir, fonts); +} + +fn render_suite(suite_path: &Path, out_dir: &Path, fonts: &FontRepository) { + let raw = std::fs::read_to_string(suite_path) + .unwrap_or_else(|e| panic!("failed to read {}: {e}", suite_path.display())); + let suite: SuiteFile = serde_json::from_str(&raw) + .unwrap_or_else(|e| panic!("failed to parse {}: {e}", suite_path.display())); + let suite_dir = suite_path.parent().unwrap_or(Path::new(".")); + + eprintln!( + "Rendering {} fixture(s) from suite {}", + suite.fixtures.len(), + suite_path.display() + ); + let mut css_cache: HashMap = HashMap::new(); + for entry in &suite.fixtures { + let (html_path, extras_abs, width, height) = + resolve_entry(entry, &suite.defaults, suite_dir); + render_with_extras( + &html_path, + &extras_abs, + width, + height, + out_dir, + fonts, + &mut css_cache, + ); + } +} + +fn render_directory(dir: &Path, out_dir: &Path, fonts: &FontRepository) { + let mut entries: Vec = std::fs::read_dir(dir) + .expect("failed to read directory") + .filter_map(|e| e.ok().map(|e| e.path())) + .filter(|p| { + p.extension() + .map(|ext| ext == "html" || ext == "htm") + .unwrap_or(false) + }) + .collect(); + entries.sort(); + + eprintln!( + "Rendering {} HTML files from {}", + entries.len(), + dir.display() + ); + let mut css_cache: HashMap = HashMap::new(); + for path in &entries { + render_with_extras( + path, + &[], + DEFAULT_WIDTH, + DEFAULT_HEIGHT, + out_dir, + fonts, + &mut css_cache, + ); + } +} + +/// Parse `argv` into (`suite_path`, positional args). If `--suite P` +/// is present, those two tokens are removed from the positional list. +fn parse_args(argv: &[String]) -> (Option, Vec) { + let mut suite: Option = None; + let mut positional: Vec = Vec::new(); + let mut i = 0; + while i < argv.len() { + let a = &argv[i]; + if a == "--suite" { + let v = argv + .get(i + 1) + .unwrap_or_else(|| panic!("--suite requires a path argument")); + suite = Some(v.clone()); + i += 2; + } else if a.starts_with("--") { + // Unknown long flag. If the next token looks like a value + // (doesn't start with `-`) swallow it too, so `--foo bar` + // doesn't leak `bar` into the positional stream and get + // treated as a file path. + match argv.get(i + 1) { + Some(next) if !next.starts_with('-') => i += 2, + _ => i += 1, + } + } else { + positional.push(a.clone()); + i += 1; + } + } + (suite, positional) } fn main() { - let args: Vec = std::env::args().skip(1).collect(); + let argv: Vec = std::env::args().skip(1).collect(); + let (suite, positional) = parse_args(&argv); // Output to system temp directory let out_dir = std::env::temp_dir().join("grida-htmlcss-goldens"); std::fs::create_dir_all(&out_dir).expect("failed to create output directory"); eprintln!("Output: {}", out_dir.display()); - if args.is_empty() { - // Render built-in test fixtures from fixtures/test-html/L0/ + let fonts = build_fonts(); + + if let Some(suite_path) = suite { + render_suite(Path::new(&suite_path), &out_dir, &fonts); + eprintln!("Done. Files in: {}", out_dir.display()); + return; + } + + if positional.is_empty() { let fixture_dir = PathBuf::from(concat!( env!("CARGO_MANIFEST_DIR"), "/../../fixtures/test-html/L0" )); if fixture_dir.is_dir() { - render_directory(&fixture_dir, &out_dir); + render_directory(&fixture_dir, &out_dir, &fonts); } else { eprintln!("No fixture directory found at {}", fixture_dir.display()); - eprintln!("Pass HTML files as arguments instead."); + eprintln!("Pass --suite or HTML files as arguments."); } } else { - for arg in &args { + let mut css_cache: HashMap = HashMap::new(); + for arg in &positional { let path = PathBuf::from(arg); if path.is_dir() { - render_directory(&path, &out_dir); + render_directory(&path, &out_dir, &fonts); } else if path.is_file() { - render_html_file(&path, &out_dir); + render_with_extras( + &path, + &[], + DEFAULT_WIDTH, + DEFAULT_HEIGHT, + &out_dir, + &fonts, + &mut css_cache, + ); } else { eprintln!("Skipping {}: not a file or directory", path.display()); } @@ -88,25 +322,3 @@ fn main() { eprintln!("Done. Files in: {}", out_dir.display()); } - -fn render_directory(dir: &Path, out_dir: &Path) { - let mut entries: Vec = std::fs::read_dir(dir) - .expect("failed to read directory") - .filter_map(|e| e.ok().map(|e| e.path())) - .filter(|p| { - p.extension() - .map(|ext| ext == "html" || ext == "htm") - .unwrap_or(false) - }) - .collect(); - entries.sort(); - - eprintln!( - "Rendering {} HTML files from {}", - entries.len(), - dir.display() - ); - for path in &entries { - render_html_file(path, out_dir); - } -} diff --git a/crates/grida-canvas/src/htmlcss/mod.rs b/crates/grida-canvas/src/htmlcss/mod.rs index 8ce3d9042..a69adc5b1 100644 --- a/crates/grida-canvas/src/htmlcss/mod.rs +++ b/crates/grida-canvas/src/htmlcss/mod.rs @@ -128,6 +128,45 @@ impl ImageProvider for PreloadedImages { } } +/// Inject one or more author stylesheets into an HTML document string. +/// +/// Concatenates `css_bodies` into a single `"); + + // HTML tag names are ASCII, so lowercasing the haystack is safe. + // One pass, correct for any casing of ``. + if let Some(idx) = html.to_ascii_lowercase().find("") { + let mut out = String::with_capacity(html.len() + combined.len()); + out.push_str(&html[..idx]); + out.push_str(&combined); + out.push_str(&html[idx..]); + return out; + } + format!("{combined}{html}") +} + /// Render HTML+CSS to a Skia Picture. /// /// Images referenced by `` or `background-image: url()` are diff --git a/fixtures/test-html/L0/paint-background-solid.html b/fixtures/test-html/L0/paint-background-solid.html index 8cbc4d3ba..e63bd3f7c 100644 --- a/fixtures/test-html/L0/paint-background-solid.html +++ b/fixtures/test-html/L0/paint-background-solid.html @@ -4,6 +4,11 @@ Paint: Solid Background