Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,11 @@ agents/
scenario-generator.md # Step 3: Scenarios
env-factory-generator.md # Step 4: Environment Factory implementation
scenario-validator.md # Step 5: Scenario lifecycle validation
auth-login-validator.md # Step 5: agentic fallback for login probe
test-case-generator.md # Step 6: E2E tests
focused-test-case-generator.md
skills/agent-browser/SKILL.md # agent-browser CLI reference
skills/validate-auth-login/SKILL.md # headless browser login probe
hooks/
hooks.json
pipeline-kickoff.sh
Expand Down Expand Up @@ -54,7 +57,8 @@ Validators are in `hooks/validators/`.
| `validate_endpoint_implemented.py` | `*/autonoma/.endpoint-implemented` | handler path and factory integrity |
| `validate_creation_file_immutable.py` | `*/autonoma/.endpoint-implemented` | accepted audit creation files were not rewritten unsafely |
| `validate_factory_fidelity.py` | `*/autonoma/.endpoint-implemented` | semantic per-model factory fidelity |
| `validate_scenario_validation.py` | `*/autonoma/.scenario-validation.json` | Step 5 terminal-state contract |
| `validate_scenario_validation.py` | `*/autonoma/.scenario-validation.json` | Step 5 terminal-state contract (incl. `loginProbe`) |
| `login_probe.py` | invoked by `scenario-validator.md` between `up` and `down` | headless-browser login verification via `agent-browser` |
| `validate_scenario_recipes.py` | `*/autonoma/scenario-recipes.json` | recipe schema |
| `validate_test_index.py` | `*/autonoma/qa-tests/INDEX.md` | test totals and folder sums |
| `validate_directory_structure.py` | `*/autonoma/qa-tests/INDEX.md` | test directory structure |
Expand All @@ -76,5 +80,5 @@ pytest

- Step 4 implements the Environment Factory and may edit target backend code.
- Step 4 writes `autonoma/.endpoint-implemented` only after discover smoke and factory-integrity checks pass.
- Step 5 validates signed `discover` / `up` / `down` for every scenario and may fix handler bugs or reconcile `scenarios.md`.
- Step 5 validates signed `discover` / `up` / `down` for every scenario and may fix handler bugs or reconcile `scenarios.md`. Between `up` and `down` on the first auth-carrying scenario it also runs the login probe (`hooks/validators/login_probe.py`), which drives headless Chrome via [`agent-browser`](https://github.com/vercel-labs/agent-browser) to prove the returned credentials actually reach a logged-in page. Install via `brew install agent-browser` or `npm install -g agent-browser`; the probe skips cleanly if the binary is missing.
- Step 6 is gated on `autonoma/.endpoint-validated`.
47 changes: 47 additions & 0 deletions agents/auth-login-validator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
description: >
Login probe subagent invoked via `claude -p` from the scenario-validator
(Step 5). Uses the agent-browser CLI to verify that auth credentials
returned by the Environment Factory's `up` action actually reach an
authenticated page. Headless only.
tools:
- Bash
- Read
maxTurns: 20
---

# Auth Login Validator

You are a login probe. You have `agent-browser` available as a Bash CLI.
Read the `skills/agent-browser/SKILL.md` reference if you need the full
command surface. Your job: verify that the auth payload you receive actually
produces a logged-in browser session.

## Rules

- Always use `--session login-probe-<label>` and `--json`.
- Always headless — never pass `--headed`.
- Use `agent-browser snapshot -i` to discover form fields when selectors
are unknown — don't guess selectors.
- Close the session when done: `agent-browser --session ... close`
- Do not modify any files outside `autonoma/.login-probe/`.

## Output

Print EXACTLY one JSON object to stdout when done — no markdown fences, no
extra text before or after.

Success:
```json
{"ok": true, "mode": "cookies|token|form", "evidence": {"final_url": "...", "screenshot": "..."}, "scenario": "<label>"}
```

Failure:
```json
{"ok": false, "mode": "cookies|token|form", "failure": {"category": "<cat>", "detail": "one sentence", "screenshot_path": "..."}, "evidence": {}}
```

Categories: `redirected_to_login`, `cookie_not_sent`, `marker_missing`,
`bad_credentials`, `open_failed`, `fill_failed`, `submit_failed`, `unknown_ui`.

Take a screenshot before reporting.
62 changes: 59 additions & 3 deletions agents/scenario-validator.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,8 +118,54 @@ Repeat until all three actions succeed for every scenario OR you exhaust 5 itera
- **Auth check**: `auth` MUST be non-null and contain at least one of `{ cookies, headers, token, user }`. If empty, the auth callback is not wired — fix it and restart.
- **Refs check**: every top-level model in the `create` tree MUST appear in `refs`.
5. Verify DB state with a read-only `SELECT` for at least one refs id.
6. POST `{action:"down", refsToken}`. Expect `{ok:true}`.
7. Verify the refs rows are gone.
6. **Login probe** (once per run — on the first scenario whose `auth`
carries credentials, then remember the verdict):
Drive a real headless Chrome session through `hooks/validators/login_probe.py`
to prove that the returned `auth` actually reaches a logged-in page. This
catches subtle auth-callback bugs (wrong cookie domain, missing CSRF seed,
token not honored, Set-Cookie attrs stripped) that lifecycle checks miss.
```bash
# $KB is autonoma/AUTONOMA.md. Extract loginPath + protectedPath from the
# `flows: login` section there. `markerText` is optional — if the KB lists
# a known post-login text fragment ("Dashboard", username echo, etc.) pass it.
python3 "$(cat /tmp/autonoma-plugin-root)/hooks/validators/login_probe.py" \
--input - <<JSON
{
"baseUrl": "$BASE_URL",
"loginPath": "$LOGIN_PATH",
"protectedPath": "$PROTECTED_PATH",
"markerText": "$MARKER_TEXT",
"screenshotDir": "autonoma/.login-probe",
"label": "$SCENARIO_NAME",
"auth": $AUTH_JSON
}
JSON
```
Interpret the JSON verdict (same file — `{ok, mode, failure.category, ...}`):
- `ok: true` → record `loginProbe: { ok: true, mode, scenario: "$SCENARIO_NAME", evidence }`
in the terminal artifact (step 7) and skip the probe for the remaining
scenarios — one successful probe per run is sufficient signal.
- `skipped: true` (no cookies/headers/user OR `agent-browser` not installed) →
record the skip payload verbatim and continue. Do not treat as failure.
- `ok: false` → this is a **handler bug** (path 3a above). The failure
`category` tells you what to fix:
- `redirected_to_login` → the cookie/token reached the server but was
rejected. Check the auth callback's signing/session secret and cookie
value format.
- `cookie_not_sent` → browser refused to attach the cookie. Check
`domain`, `path`, `Secure`, `SameSite`, `HttpOnly` attrs on Set-Cookie.
- `marker_missing` → redirect succeeded but page didn't render the
expected post-login marker. Either the marker is wrong (update KB)
or a downstream load fails (inspect screenshot, fix handler).
- `bad_credentials` → form submit with the user's password didn't
authenticate. The `user` payload the auth callback returns doesn't
match the real credentials stored in the DB.
- `open_failed` / `fill_failed` / `submit_failed` → browser-side
infrastructure issue, inspect `failure.detail`.
Fix the handler and restart the loop. Do NOT move on to `down` for this
scenario — the session artifacts from a broken `up` aren't trustworthy.
7. POST `{action:"down", refsToken}`. Expect `{ok:true}`.
8. Verify the refs rows are gone.

5. After every scenario passes cleanly, emit the scenario recipes.

Expand Down Expand Up @@ -208,10 +254,20 @@ Repeat until all three actions succeed for every scenario OR you exhaust 5 itera
"blockingIssues": [],
"recipePath": "autonoma/scenario-recipes.json",
"validationMode": "endpoint-lifecycle",
"endpointUrl": "http://localhost:3000/api/autonoma"
"endpointUrl": "http://localhost:3000/api/autonoma",
"loginProbe": {
"ok": true,
"mode": "cookies",
"scenario": "standard",
"evidence": { "final_url": "http://localhost:3000/dashboard" }
}
}
```

`loginProbe` is REQUIRED when `status == "ok"`. Use the verdict from step 4.6.
If the probe was skipped (no auth material or `agent-browser` unavailable) record
`{ "ok": false, "skipped": true, "reason": "..." }` — that satisfies the schema.

On failure keep the same shape with `status: "failed"`, `preflightPassed: false` when
preflight did not pass, populated `failedScenarios`, and concrete `blockingIssues`.

Expand Down
Loading
Loading