Autonoma-AI · tomaspiaggio · Apr 23, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -16,8 +16,11 @@ agents/
   scenario-generator.md        # Step 3: Scenarios
   env-factory-generator.md     # Step 4: Environment Factory implementation
   scenario-validator.md        # Step 5: Scenario lifecycle validation
+  auth-login-validator.md      # Step 5: agentic fallback for login probe
   test-case-generator.md       # Step 6: E2E tests
   focused-test-case-generator.md
+skills/agent-browser/SKILL.md        # agent-browser CLI reference
+skills/validate-auth-login/SKILL.md  # headless browser login probe
 hooks/
   hooks.json
   pipeline-kickoff.sh
@@ -54,7 +57,8 @@ Validators are in `hooks/validators/`.
 | `validate_endpoint_implemented.py` | `*/autonoma/.endpoint-implemented` | handler path and factory integrity |
 | `validate_creation_file_immutable.py` | `*/autonoma/.endpoint-implemented` | accepted audit creation files were not rewritten unsafely |
 | `validate_factory_fidelity.py` | `*/autonoma/.endpoint-implemented` | semantic per-model factory fidelity |
-| `validate_scenario_validation.py` | `*/autonoma/.scenario-validation.json` | Step 5 terminal-state contract |
+| `validate_scenario_validation.py` | `*/autonoma/.scenario-validation.json` | Step 5 terminal-state contract (incl. `loginProbe`) |
+| `login_probe.py` | invoked by `scenario-validator.md` between `up` and `down` | headless-browser login verification via `agent-browser` |
 | `validate_scenario_recipes.py` | `*/autonoma/scenario-recipes.json` | recipe schema |
 | `validate_test_index.py` | `*/autonoma/qa-tests/INDEX.md` | test totals and folder sums |
 | `validate_directory_structure.py` | `*/autonoma/qa-tests/INDEX.md` | test directory structure |
@@ -76,5 +80,5 @@ pytest
 
 - Step 4 implements the Environment Factory and may edit target backend code.
 - Step 4 writes `autonoma/.endpoint-implemented` only after discover smoke and factory-integrity checks pass.
-- Step 5 validates signed `discover` / `up` / `down` for every scenario and may fix handler bugs or reconcile `scenarios.md`.
+- Step 5 validates signed `discover` / `up` / `down` for every scenario and may fix handler bugs or reconcile `scenarios.md`. Between `up` and `down` on the first auth-carrying scenario it also runs the login probe (`hooks/validators/login_probe.py`), which drives headless Chrome via [`agent-browser`](https://github.com/vercel-labs/agent-browser) to prove the returned credentials actually reach a logged-in page. Install via `brew install agent-browser` or `npm install -g agent-browser`; the probe skips cleanly if the binary is missing.
 - Step 6 is gated on `autonoma/.endpoint-validated`.
diff --git a/agents/auth-login-validator.md b/agents/auth-login-validator.md
@@ -0,0 +1,47 @@
+---
+description: >
+  Login probe subagent invoked via `claude -p` from the scenario-validator
+  (Step 5). Uses the agent-browser CLI to verify that auth credentials
+  returned by the Environment Factory's `up` action actually reach an
+  authenticated page. Headless only.
+tools:
+  - Bash
+  - Read
+maxTurns: 20
+---
+
+# Auth Login Validator
+
+You are a login probe. You have `agent-browser` available as a Bash CLI.
+Read the `skills/agent-browser/SKILL.md` reference if you need the full
+command surface. Your job: verify that the auth payload you receive actually
+produces a logged-in browser session.
+
+## Rules
+
+- Always use `--session login-probe-<label>` and `--json`.
+- Always headless — never pass `--headed`.
+- Use `agent-browser snapshot -i` to discover form fields when selectors
+  are unknown — don't guess selectors.
+- Close the session when done: `agent-browser --session ... close`
+- Do not modify any files outside `autonoma/.login-probe/`.
+
+## Output
+
+Print EXACTLY one JSON object to stdout when done — no markdown fences, no
+extra text before or after.
+
+Success:
+```json
+{"ok": true, "mode": "cookies|token|form", "evidence": {"final_url": "...", "screenshot": "..."}, "scenario": "<label>"}
+```
+
+Failure:
+```json
+{"ok": false, "mode": "cookies|token|form", "failure": {"category": "<cat>", "detail": "one sentence", "screenshot_path": "..."}, "evidence": {}}
+```
+
+Categories: `redirected_to_login`, `cookie_not_sent`, `marker_missing`,
+`bad_credentials`, `open_failed`, `fill_failed`, `submit_failed`, `unknown_ui`.
+
+Take a screenshot before reporting.
diff --git a/agents/scenario-validator.md b/agents/scenario-validator.md
@@ -118,8 +118,54 @@ Repeat until all three actions succeed for every scenario OR you exhaust 5 itera
       - **Auth check**: `auth` MUST be non-null and contain at least one of `{ cookies, headers, token, user }`. If empty, the auth callback is not wired — fix it and restart.
       - **Refs check**: every top-level model in the `create` tree MUST appear in `refs`.
    5. Verify DB state with a read-only `SELECT` for at least one refs id.
-   6. POST `{action:"down", refsToken}`. Expect `{ok:true}`.
-   7. Verify the refs rows are gone.
+   6. **Login probe** (once per run — on the first scenario whose `auth`
+      carries credentials, then remember the verdict):
+      Drive a real headless Chrome session through `hooks/validators/login_probe.py`
+      to prove that the returned `auth` actually reaches a logged-in page. This
+      catches subtle auth-callback bugs (wrong cookie domain, missing CSRF seed,
+      token not honored, Set-Cookie attrs stripped) that lifecycle checks miss.
+      ```bash
+      # $KB is autonoma/AUTONOMA.md. Extract loginPath + protectedPath from the
+      # `flows: login` section there. `markerText` is optional — if the KB lists
+      # a known post-login text fragment ("Dashboard", username echo, etc.) pass it.
+      python3 "$(cat /tmp/autonoma-plugin-root)/hooks/validators/login_probe.py" \
+        --input - <<JSON
+      {
+        "baseUrl": "$BASE_URL",
+        "loginPath": "$LOGIN_PATH",
+        "protectedPath": "$PROTECTED_PATH",
+        "markerText": "$MARKER_TEXT",
+        "screenshotDir": "autonoma/.login-probe",
+        "label": "$SCENARIO_NAME",
+        "auth": $AUTH_JSON
+      }
+      JSON
+      ```
+      Interpret the JSON verdict (same file — `{ok, mode, failure.category, ...}`):
+        - `ok: true` → record `loginProbe: { ok: true, mode, scenario: "$SCENARIO_NAME", evidence }`
+          in the terminal artifact (step 7) and skip the probe for the remaining
+          scenarios — one successful probe per run is sufficient signal.
+        - `skipped: true` (no cookies/headers/user OR `agent-browser` not installed) →
+          record the skip payload verbatim and continue. Do not treat as failure.
+        - `ok: false` → this is a **handler bug** (path 3a above). The failure
+          `category` tells you what to fix:
+            - `redirected_to_login` → the cookie/token reached the server but was
+              rejected. Check the auth callback's signing/session secret and cookie
+              value format.
+            - `cookie_not_sent` → browser refused to attach the cookie. Check
+              `domain`, `path`, `Secure`, `SameSite`, `HttpOnly` attrs on Set-Cookie.
+            - `marker_missing` → redirect succeeded but page didn't render the
+              expected post-login marker. Either the marker is wrong (update KB)
+              or a downstream load fails (inspect screenshot, fix handler).
+            - `bad_credentials` → form submit with the user's password didn't
+              authenticate. The `user` payload the auth callback returns doesn't
+              match the real credentials stored in the DB.
+            - `open_failed` / `fill_failed` / `submit_failed` → browser-side
+              infrastructure issue, inspect `failure.detail`.
+          Fix the handler and restart the loop. Do NOT move on to `down` for this
+          scenario — the session artifacts from a broken `up` aren't trustworthy.
+   7. POST `{action:"down", refsToken}`. Expect `{ok:true}`.
+   8. Verify the refs rows are gone.
 
 5. After every scenario passes cleanly, emit the scenario recipes.
 
@@ -208,10 +254,20 @@ Repeat until all three actions succeed for every scenario OR you exhaust 5 itera
      "blockingIssues": [],
      "recipePath": "autonoma/scenario-recipes.json",
      "validationMode": "endpoint-lifecycle",
-     "endpointUrl": "http://localhost:3000/api/autonoma"
+     "endpointUrl": "http://localhost:3000/api/autonoma",
+     "loginProbe": {
+       "ok": true,
+       "mode": "cookies",
+       "scenario": "standard",
+       "evidence": { "final_url": "http://localhost:3000/dashboard" }
+     }
    }
    ```
 
+   `loginProbe` is REQUIRED when `status == "ok"`. Use the verdict from step 4.6.
+   If the probe was skipped (no auth material or `agent-browser` unavailable) record
+   `{ "ok": false, "skipped": true, "reason": "..." }` — that satisfies the schema.
+
    On failure keep the same shape with `status: "failed"`, `preflightPassed: false` when
    preflight did not pass, populated `failedScenarios`, and concrete `blockingIssues`.