The core engine behind AdSpendSaver — automated landing page QA for paid media. Catches broken purchase flows, dead links, missing UTM parameters, and cross-browser rendering issues — before ad spend is wasted on pages that don't convert.
Status: Early alpha. APIs and DSL schema are subject to change.
Brands spend thousands on ads pointing to product pages that silently break: an "Add to Cart" button doesn't work on Safari mobile, a variant selector fails after a site deploy, UTM parameters get stripped at checkout. These failures are invisible in analytics (the user just bounces) and expensive to diagnose manually across browser/device/variant combinations.
Given a product page URL, the engine autonomously:
- Discovers the purchase flow — An LLM navigates the live page via browser automation, identifies every interactive element (add-to-cart, variant selectors, checkout buttons), and outputs a structured flow definition (DSL)
- Validates the flow — A headless browser verifies every extracted selector actually exists on the page. If selectors are stale or wrong, it feeds errors back to the LLM and re-extracts (up to N retries)
- Executes across 6 browser/device combos — The validated flow runs deterministically (no LLM in the loop) on Chrome, Firefox, and Safari across desktop and mobile viewports
- Sweeps product variants — If the page has color/size selectors, it tests every available combination
- Verifies failures with vision — Failed steps get a screenshot analyzed by a vision model to confirm whether the failure is real or a locator flake
- Produces a structured report — Pass/fail per combo, broken images, console errors, UTM preservation checks, severity classification, and recommended actions
┌─────────────────────────────────┐
POST /api/run │ HTTP Server (Bun) │
─────────────────►│ │
└──────────────┬──────────────────-┘
│
┌──────────────▼──────────────────-┐
│ Stage 1: LLM Extraction │
│ Claude + Playwright MCP │
│ → Navigates page, outputs DSL │
└──────────────┬──────────────────-┘
│ FlowConfig JSON
┌──────────────▼──────────────────-┐
│ Stage 2: Sanity Check │
│ Direct Playwright (no LLM) │
│ → Verifies all selectors exist │
│ → Feeds back missing → retry S1 │
└──────────────┬──────────────────-┘
│ Validated config
┌──────────────▼──────────────────-┐
│ Stage 3: Execution Engine │
│ Deterministic DSL runner (no LLM) │
│ → 6 browser/device combos │
│ → Variant sweep (all combos) │
│ → Screenshot evidence │
└──────────────┬──────────────────-┘
│
┌──────────────▼──────────────────-┐
│ Failure Verification │
│ Gemini Vision on screenshots │
│ → Confirms real vs flaky failures │
└──────────────┬──────────────────-┘
│
┌──────────────▼──────────────────-┐
│ Summary Generation │
│ Claude → structured JSON report │
│ → severity, findings, actions │
└──────────────────────────────────-┘
Key design decision: The LLM is only used for extraction (Stage 1) and analysis (failure verification + summary). The actual test execution is a deterministic switch/case engine running a JSON DSL — no LLM variance, no prompt sensitivity, fully reproducible results.
| Layer | Technology |
|---|---|
| Runtime | Bun (TypeScript, ESM) |
| LLM — extraction & summary | Anthropic Claude (Sonnet) |
| LLM — failure verification | Google Gemini 2.5 Flash |
| Browser automation | Playwright + @playwright/mcp |
| Storage | Firebase Firestore + Cloud Storage |
| Container | Docker (oven/bun:1.2-debian) |
| Browser | Desktop (1920x1080) | Mobile (390x844) |
|---|---|---|
| Chromium | Chrome Desktop | Chrome Android |
| Firefox | Firefox Desktop | Firefox Android |
| WebKit | Safari Desktop | Safari iOS |
Full pipeline — extract flow, validate, execute across all combos.
{
"pageUrl": "https://example.com/products/widget?utm_source=meta&utm_campaign=summer",
"adId": "ad_123",
"clientWebsite": "example.com"
}Re-execute using a previously extracted flow config (skips Stage 1 if the cached config passes sanity checks). Falls back to full pipeline if the cached config is stale.
Returns { "status": "ok" }.
{
"success": true,
"adId": "ad_123",
"pageUrl": "https://example.com/products/widget",
"flowConfigs": [],
"simulationResults": [
{
"comboName": "Chrome Desktop",
"browser": "chromium",
"device": "desktop",
"variantCombination": "Red / M",
"overallSuccess": true,
"adParamsPreserved": true,
"steps": [],
"consoleErrors": []
}
],
"additionalChecks": {
"is404": false,
"brokenImages": [],
"consoleErrors": [],
"adParamCheck": { "preserved": true, "missingParams": [] }
},
"summary": {
"overallStatus": "pass",
"severity": "none",
"findings": [],
"recommendedActions": []
},
"durationMs": 142000
}- Bun v1.2+
- Anthropic API key
- Google Gemini API key
- Firebase service account JSON + Storage bucket
cp .env.example .env
# Fill in API keys and Firebase credentials
bun install
bun run devcp .env.example .env
# Fill in credentials
docker compose up --buildThe container installs Chromium, Firefox, and WebKit browsers and requires 2GB shared memory for headless rendering.
All configuration via environment variables (see .env.example):
| Variable | Required | Default | Description |
|---|---|---|---|
ANTHROPIC_API_KEY |
Yes | — | Claude API key |
GEMINI_API_KEY |
Yes | — | Gemini API key |
FIREBASE_SERVICE_ACCOUNT_JSON |
Yes | — | Path to JSON file or inline JSON |
FIREBASE_STORAGE_BUCKET |
Yes | — | GCS bucket for evidence screenshots |
PORT |
No | 3000 | HTTP server port |
EXPLORATION_MODEL |
No | claude-sonnet-4-6 | Model for page exploration |
SUMMARY_MODEL |
No | claude-haiku-4-5 | Model for report generation |
GEMINI_MODEL |
No | gemini-2.5-flash | Model for failure verification |
MAX_SANITY_RETRIES |
No | 3 | Stage 1→2 retry attempts |
PIPELINE_TIMEOUT_MS |
No | 900000 | Full pipeline timeout (15 min) |
src/
├── index.ts # HTTP server
├── pipeline.ts # Orchestrator (Stage 1→2→3 loop)
├── config/
│ ├── env.ts # Environment variable validation
│ └── firebase.ts # Firebase init
├── engine/
│ ├── executor.ts # Deterministic DSL engine
│ ├── actions.ts # Playwright action implementations
│ ├── evidence.ts # Screenshot capture + upload
│ └── failure-verifier.ts # Gemini vision verification
├── stages/
│ ├── stage1-extract.ts # LLM + MCP flow extraction
│ ├── stage2-sanity.ts # Selector validation
│ ├── stage3-execute.ts # Cross-browser execution
│ └── stage3b-variant-sweep.ts # Variant combination sweep
├── mcp/
│ ├── client.ts # Playwright MCP client
│ └── prompts.ts # LLM prompt templates
├── firebase/
│ ├── firestore.ts # Run/result persistence
│ └── storage.ts # Evidence screenshot upload
├── types/
│ ├── flow-config.ts # FlowConfig DSL schema
│ ├── pipeline.ts # Pipeline I/O types
│ ├── simulation.ts # Execution result types
│ ├── checks.ts # Additional check types
│ └── variant-sweep.ts # Variant sweep types
└── utils/
├── devices.ts # Browser/device combo definitions
├── logger.ts # Structured logging
├── snapshot-diff.ts # DOM diff utilities
└── url.ts # URL manipulation helpers