I built Screenslop because AI can now generate SwiftUI faster than I can make coffee, which is both impressive and mildly suspicious. The problem is that "it compiles" is a very low bar for Apple UI. A screen can compile, launch, and still feel like it was assembled by a tutorial screenshot with ambition.
Screenslop reviews Apple app UI from runtime evidence. It runs or connects to the app, captures the actual screen, reads the accessibility tree, inspects logs and source hints, then produces findings an agent can fix and verify with a fresh capture.
Works with Codex, Claude Code, Cursor, plain terminal workflows, Baguette-backed capture, and XcodeBuildMCP build/run support. Lower-level xcodebuild / simctl capture fallback is planned, not shipped yet.
npm install -g github:gabelul/screenslop#v0.1.0
screenslop doctorgit clone https://github.com/gabelul/screenslop.git
cd screenslop
npm install
node bin/screenslop.mjs doctor
npm testYou can also run it without linking:
node bin/screenslop.mjs see --dry-run --jsonThe CLI and the agent skill are separate. Install the CLI first, then install the skill so Codex, Claude Code, Cursor, or another agent knows the runtime-first loop.
Preview the packaged skill:
npx skills add gabelul/screenslop --listInstall it:
npx skills add gabelul/screenslop --skill screenslopManual paths and scope notes live in Skill installation.
Screenslop is the boring, scriptable engine that sits between AI agents and real Apple UI evidence.
- Set up the target — detect project metadata with
screenslop setup, then create or migrate.screenslop/config.jsonwith scheme, bundle ID, source root, device, surface, and artifact folder settings. - Capture the screen — use Baguette for the shipped live capture path to write screenshot, AX tree, logs, manifest, and summary files.
- Critique the evidence — produce deterministic findings with proof, not vague taste complaints.
- Fix selected findings — patch only narrow, high-confidence SwiftUI issues.
- Verify with fresh evidence — recapture, critique again, and compare the old finding against the new evidence.
- Stress the first matrix — write a bounded six-cell device/settings report with one evidence bundle per cell.
The current MVP is intentionally conservative. It does not pretend a fixture test proves your real app is fixed. It does not edit the whole codebase because one button has a bad accessibility name. It does the small proven thing, then asks for fresh evidence. Annoying? Slightly. Correct? Yes.
screenslop setup --json --dry-run
screenslop setup --json --yes
screenslop doctor
screenslop see --surface Settings --json
screenslop critique artifacts/<baseline-run> --json
screenslop fix artifacts/<baseline-run> \
--finding <finding-id> \
--source-root <app-source-root> \
--apply \
--yes \
--label "Save settings" \
--json
screenslop see --surface Settings --json
screenslop critique artifacts/<fresh-run> --json
screenslop verify artifacts/<baseline-run> \
--fresh-bundle artifacts/<fresh-run> \
--finding <finding-id> \
--fix-session artifacts/<baseline-run>/fix-session.json \
--jsonNo fresh capture means no verified fix claim. That rule saves a lot of nonsense.
| Command | Status | What it does |
|---|---|---|
screenslop setup |
MVP | Detects project metadata and plans first-use private config. |
screenslop instructions |
MVP | Prints the coding-agent contract and local skill status. |
screenslop self-update |
MVP | Updates the global CLI after confirmation. |
screenslop init |
MVP | Creates or migrates local project config. |
screenslop doctor |
MVP | Checks Baguette, XcodeBuildMCP, Xcode, simctl, Swift, Node, and CLI freshness. |
screenslop see |
MVP | Captures screenshot, accessibility tree, logs, manifest, and summary. |
screenslop critique |
MVP | Turns evidence into findings with proof. |
screenslop fix |
MVP | Plans or applies selected safe SwiftUI fixes. |
screenslop verify |
MVP | Compares baseline findings against fresh critique output. |
screenslop matrix |
MVP | Writes a bounded six-cell matrix report and evidence bundles. |
screenslop learn |
MVP | Learns, checks, and refreshes the private design profile. |
screenslop watch |
Future | Placeholder for the live review loop. Not shipped yet. |
Full command notes live in docs/commands.md.
Screenslop prefers real runtime evidence whenever it can get it:
- Baguette — shipped live capture path for simulator screenshots, AX tree, logs, and runtime control.
- XcodeBuildMCP — shipped build/run path for the sample smoke and matrix live cells.
- xcodebuild + simctl — planned lower-level fallback for local machines.
- Manual evidence — screenshot/source evidence when automation is not available.
The rule is simple: do not critique Apple UI from source alone when runtime evidence can be captured. In v0.1, real see capture still needs Baguette; the rest of the stack is build/run support or future fallback work.
A see run writes a bundle like this:
artifacts/<run-id>/
evidence.json
screenshot.jpg
accessibility.json
logs.ndjson
summary.md
critique, fix, and verify add their own artifacts next to the evidence so agents can pass a single bundle around without losing context.
Local target metadata lives in .screenslop/config.json and is intentionally ignored by git because it can contain private app paths and bundle IDs.
Important fields:
{
"schemaVersion": 1,
"preferredRuntime": "baguette",
"defaultSurface": "Settings",
"defaultScheme": "MyApp",
"defaultBundleId": "com.example.MyApp",
"defaultDevice": "iPhone 17",
"workspacePath": "MyApp.xcworkspace",
"sourceRoot": "MyApp",
"designSources": ["../SharedDesignSystem"],
"artifactsDir": "artifacts"
}schemaVersion: 1 is the v0.1 generation. During 0.x, config changes are allowed, but they need an explicit migration path. No silent drift.
Before claiming the repo is healthy:
node bin/screenslop.mjs doctor
npm test
npm run --silent smoke:e2e -- --fresh-mode fixed
node bin/screenslop.mjs matrix --dry-run --json
node bin/screenslop.mjs matrix --profile examples/matrix/default.json --json
node bin/screenslop.mjs matrix --profile examples/matrix/phone-sizes.json --json
node bin/screenslop.mjs matrix --profile examples/matrix/phone-sizes.json --critique --design --agent-packet --json
npm run cleanup:macos:dry
npm pack --dry-run
npm run --silent smoke:packageWhen Apple runtime tools are available:
npm run smoke:runtimeThat smoke builds and launches examples/runtime-smoke-app, captures Baguette-backed baseline and fresh evidence, applies one narrow fix, and verifies the selected finding. It proves the sample app loop. Your app still needs its own capture because reality insists on being specific.
For non-interactive mobile-size checks, use the phone-size matrix profile:
node bin/screenslop.mjs matrix --profile examples/matrix/phone-sizes.json --critique --jsonThat profile targets small, normal, and large iPhone classes. If the exact simulator names are not installed, copy the profile and update the device values from baguette list --json.
Use that matrix before calling layout-sensitive UI work done. One see capture is fine for a tiny copy, icon, or accessibility fix; it is not enough for SwiftUI spacing, onboarding, paywalls, checkout, settings, compact sheets, tab bars, scroll views, or anything where small and large phones can disagree with each other. Phones are annoying like that.
Screenslop is the public engine repo:
- CLI
- core runtime/finding/fix/verify logic
- schemas
- docs
- agent skill/integration files
- sample app and smoke tests
Screenslop Studio is the future private Mac app wrapper. Studio should consume this engine, not duplicate the critique logic in another corner of the universe because apparently one source of truth was too relaxing.
Studio is deliberately blocked until the engine proves the boring stuff:
- JSON and schema contracts for agent-facing commands
- package smoke from the packed npm tarball
- sample runtime smoke with fresh capture and verified fix
- six-cell matrix output with clear setting status
- configured-target preflight with redacted failures
- one private dogfood finding verified as
verified-fixedfrom fresh real-app evidence - machine-checked redaction before any dogfood lesson becomes public
- agent docs that match what the CLI actually ships
So no apps/mac/ placeholder here, no private wrapper scaffold, and no second
critique engine hiding in a corner. Studio can be pretty later. The engine has
to be trustworthy first.
Read more in docs/repo-strategy.md.
- Getting started
- Agent playbook
- Skill installation
- Command model
- Architecture
- Agent integrations
- Repo strategy
- Known limitations
- Release checklist
- Changelog
Screenslop stands on practical runtime tooling and sibling quality tools:
- Baguette — iOS simulator capture, runtime control, and device-farm work.
- XcodeBuildMCP — agent-native Xcode build/run support.
- Pixelslop — browser-first visual QA sibling.
Formal credit lives in NOTICE.
Other tools for agents that care about quality:
- slopbuster — AI text cleanup for prose, comments, and documentation that sound a bit too machine-polished.
- pixelslop — browser-first visual quality checks for web UI.
- stitch-kit — design skills and workflows around Google Stitch MCP.
- claude-code-skill-activator — skill auto-detection for Claude Code.
Apache-2.0.
Built by Gabi @ Booplex.com because AI-generated UI should still have to pass the "does this feel like a real app?" test.
The current shipped engine stays deterministic by default. It measures runtime evidence and reports findings that can be proved again after a fresh capture.
Design Intelligence is the next module boundary. The learn command now writes and refreshes a project-local design profile at .screenslop/design-profile.json, scans configured designSources, skips build/checkouts and localization/generated noise, and extracts lightweight tokens from Markdown design docs plus common SwiftUI design-system constants. Token records carry confidence metadata, and profileGaps stay visible when credible core tokens are still missing. critique --design can load that profile, report profile gaps, write an agent packet, and import agent-produced design findings. That layer is for app-specific judgment: hierarchy that feels weak for the product, typography that drifts from the app language, copy or badges that contradict visible state, and stale or missing design-system context.
The proof rule stays the same: only measured findings can become verified-fixed automatically. Design findings carry their kind and proof level, then verify as improved, unchanged, regressed, or needs-human-review after fresh evidence plus a fresh design review.
Read the contracts in docs/design-intelligence.md and docs/design-profile-format.md.
