diff --git a/.gitignore b/.gitignore index 241bbddc..c3688147 100644 --- a/.gitignore +++ b/.gitignore @@ -28,6 +28,14 @@ # committed; its output is not. /qa/findings.jsonl +# Local QA artifact index (qa/scripts/backfill_index.py output) — auto-built catalog of +# every playtest run + play-state + transcript. Per-developer (each local tree has its own +# artifacts). The indexer/backfill/find_run scripts and INDEX_SCHEMA.md ARE committed; +# the INDEX.jsonl itself is not. Query with `qa/scripts/find_run.py`. +/qa/INDEX.jsonl +/qa/INDEX.jsonl.new +/qa/INDEX.jsonl.lock + # Privately-imported, user-owned adventures — NEVER commit (may be copyrighted). # Only original / CC-licensed content belongs under content/campaigns/. /content/campaigns/_imported/ diff --git a/AGENTS.md b/AGENTS.md index 913be649..3131b1f0 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -29,6 +29,18 @@ - The VM cannot prove Mac-only surfaces. `WorldOS.app` build/launch, native #356, and built-app UI play evidence stay on this Mac or macOS CI. - VM artifacts can feed RRI only when `run.json`, `score.json`, `session_surface.final.json`, network/image evidence, palette-live evidence, and build SHA are explicit. Otherwise the result remains partial/harness-contaminated. +## QA Artifact Index + +- Past playtest runs, play-state snapshots, and transcripts are indexed at `qa/INDEX.jsonl` (gitignored, per-dev). Query with `qa/scripts/find_run.py`, not raw `ls`/grep. +- On a fresh clone (or after manual artifact moves): `python3 qa/scripts/backfill_index.py` rebuilds the local index. Idempotent. Takes <1s. +- Common queries: + - `qa/scripts/find_run.py --since 2026-05-25 --kind run --failed` — recent failed runs + - `qa/scripts/find_run.py --sha ` — every artifact for a commit + - `qa/scripts/find_run.py --persona newbie --gave-up` — runs where the persona gave up + - `qa/scripts/find_run.py --scored` — runs that also have a curated `qa/scores.db` verdict +- Two layers, two stores: `qa/INDEX.jsonl` is the RAW artifact catalog (~800 rows); `qa/scores.db` (rendered to `qa/scores_ledger.md`) is the CURATED quality verdict layer (~69 hand-validated rows). INDEX entries cross-link to ledger rows via `scored_in_ledger`. +- Schema, naming convention, and rebuild recipes: [`qa/INDEX_SCHEMA.md`](qa/INDEX_SCHEMA.md). + ## GitHub And Reviews - Use branch prefix `codex/` for new branches unless instructed otherwise. diff --git a/qa/INDEX_SCHEMA.md b/qa/INDEX_SCHEMA.md new file mode 100644 index 00000000..2bfc7fe4 --- /dev/null +++ b/qa/INDEX_SCHEMA.md @@ -0,0 +1,259 @@ +# QA Index — Schema & Usage + +`qa/INDEX.jsonl` is an append-only index of QA artifacts. One JSON object per +line, three artifact kinds (`run`, `play-state`, `transcript`). Built by: + +- **`qa/scripts/indexer.py`** — extract metadata from a single dir/file +- **`qa/scripts/backfill_index.py`** — walk all artifact dirs, rebuild from scratch +- **`qa/scripts/find_run.py`** — query helper (the agent-facing surface) + +The index is **gitignored** (matches the `qa/ui_playtest_runs/` ignore pattern) +— each developer keeps a local index of their own artifacts. The scripts and +schema are committed; the data is not. + +## Canonical run-name format + +New playtest runs (going-forward) use: + +``` +----- +``` + +Example: + +``` +20260602T053015Z-9545383-baldurs-gate-newbie-claude-codex-native +``` + +Components: + +| Field | Source | Notes | +|------------|---------------------------------------|------------------------------------------------| +| `TS` | `date -u +%Y%m%dT%H%M%SZ` | UTC, sortable | +| `sha7` | `git rev-parse --short HEAD` | 7-char commit | +| `world` | runner arg `$2` | e.g. `baldurs-gate` | +| `persona` | runner arg `$3` | `newbie` / `veteran` / `adversarial` / ... | +| `provider` | env `WOS_APP_SELECTED_PROVIDER` | `claude` / `codex` | +| `scenario` | runner-fixed suffix | `play` (ui_playtest.sh), `app` (ui_playtest_app.sh) | + +Legacy names (`nb1`, `gate-96c0401-newbie`, `handoff---`, +etc.) are parsed best-effort by `indexer.parse_canonical_name`: timestamp +extracted by regex, sha by 7-hex match, persona/world by suffix match against +known values. Whatever isn't parseable falls through to the dir mtime + `null` +fields — never crashes. + +## INDEX.jsonl row schema + +Every row has these fields: + +```json +{ + "kind": "run | play-state | transcript", + "id": "", + "path": "", + "timestamp_iso": "2026-06-02T05:30:15Z", + "commit_sha": "9545383", + "indexed_at": "2026-06-02T14:00:00Z", + "source": "runner | backfill" +} +``` + +### `kind: "run"` adds + +```json +{ + "version": "v1.0.3-126-g1057234", + "world": "baldurs-gate", + "persona": "newbie", + "provider": "claude", + "scenario": "codex-native", + "surface": "BUILT dist/WorldOS.app (part A) + ...", + "beats_cap": 6, + "budget_usd": 4.0, + "canonical_name": false, + + "part": "A | B | AB", + "part_a_result": "PASS | FAIL | skipped", + "part_b_persona_loop": "ran | skipped | backend_not_ready | ...", + "part_b_score_pass": true, + "spend_usd": 0.42, + "minted_run_dir": "play-20260602043338", + "player_agent": "claude", + "player_cost_usd": 0.25, + "player_rc": 0, + "port": 8765, + + "score": { + "completed_intro_flow": true, + "reached_play_screen": true, + "actions_total": 11, + "in_story_turns": 2, + "console_errors": 0, + "network_failures": 0, + "image_404s": 0, + "gave_up": true, + "persona_satisfaction": 4, + "satisfaction_source": "derived | self-reported", + "pass": false + }, + "bug_counts": { + "critical": 0, "major": 2, "minor": 0, "trivial": 0, + "total": 2, "ndjson_lines": 2 + }, + "summary_md": "qa/ui_playtest_runs//summary.md", + + "linked_transcripts": ["qa/transcripts/.chat.jsonl", ...], + "linked_play_state": "play-state/", + "linked_rubrics": { + "tolkien": "qa/transcripts/<...>.tolkien.json", + "angrydm": "qa/transcripts/<...>.angrydm.json", + "score": "qa/transcripts/<...>.score.json", + "state": "qa/transcripts/<...>.state.json" + }, + + "scored_in_ledger": { + "story_overall": 4.1, "mech_overall": 4.1, "angrydm_overall": 3.2, + "behavioral": "GREEN", "rri": null, "critical_bugs": 0, + "image_render_rate": null, "pass": 1, + "surface": "engine-duo", "dm_model": "sonnet", "scorer_model": "claude" + } +} +``` + +`scored_in_ledger` is populated only when `qa/scores.db` has a row with the +matching `run_id`. It's the curated quality verdict (69 rows total, across all +surfaces and worktrees); the raw `INDEX.jsonl` row is the raw artifact. + +### `kind: "play-state"` adds + +```json +{ + "world": null, "persona": null, "canonical_name": false, + "campaign_count": 1, + "chat_lines": 7, + "player_moves": 2, + "linked_run": "qa/ui_playtest_runs/" +} +``` + +### `kind: "transcript"` adds + +```json +{ + "run_id": "gate-96c0401-duo", + "role": "chat | dm | player | null", + "suffix": "chat.jsonl | dm..jsonl | ...", + "line_count": 5, + "linked_run": "qa/ui_playtest_runs/" +} +``` + +## Common queries (`find_run.py`) + +```bash +# All runs since a date +qa/scripts/find_run.py --since 2026-05-25 + +# Failed runs only (score.pass != true) +qa/scripts/find_run.py --kind run --failed + +# Runs where the persona gave up +qa/scripts/find_run.py --gave-up + +# Runs by exact commit +qa/scripts/find_run.py --sha 1057234 + +# Just paths, for scripting +qa/scripts/find_run.py --persona newbie --paths-only --limit 20 + +# Runs that have a curated scores.db row +qa/scripts/find_run.py --scored + +# Runs that DON'T have a curated row (mostly exploratory) +qa/scripts/find_run.py --unscored --kind run + +# Runs with story-craft / angry-DM rubrics available +qa/scripts/find_run.py --has-rubric + +# Raw JSONL, pipe to jq +qa/scripts/find_run.py --since 2026-06-01 --jsonl | jq '.id, .score.persona_satisfaction' + +# Just count matches +qa/scripts/find_run.py --failed --count +``` + +Full flag reference: `qa/scripts/find_run.py --help`. + +## Grep recipes (no Python required) + +```bash +# Recent runs in last 24h (by indexed_at) +grep -F "\"indexed_at\": \"$(date -u +%Y-%m-%d)" qa/INDEX.jsonl + +# Find an id substring +grep -F '"id": "handoff-20260602' qa/INDEX.jsonl | python3 -c 'import sys,json; [print(json.loads(l)["path"]) for l in sys.stdin]' + +# All gave-up runs +python3 -c ' +import json +for line in open("qa/INDEX.jsonl"): + e = json.loads(line) + if (e.get("score") or {}).get("gave_up"): + print(e["id"], "→", e["path"]) +' +``` + +## Rebuilding the index + +If the index gets out of sync (file deleted, runner skipped the append, fresh +clone), rebuild from scratch: + +```bash +python3 qa/scripts/backfill_index.py +``` + +Idempotent; safe to re-run. Backfill writes to `qa/INDEX.jsonl.new` then atomic +renames, so a partial run doesn't leave a corrupt index. + +## Auto-append on every run + +The two playtest runners write to `INDEX.jsonl` automatically: + +- `qa/ui_playtest.sh` — appends after `score.json` write +- `qa/ui_playtest_app.sh` — appends after `run.json` write +- `qa/release_gate.sh` — inherits via its per-persona `ui_playtest_app.sh` calls + +The append is wrapped in `|| true` so a broken indexer never fails a real +playtest. Indexer is also idempotent — re-running on an already-indexed dir +updates the existing row (matched by `(kind, id)`) rather than appending a +duplicate. + +## Why JSONL, not SQLite + +- ~800 rows total, growing slowly. SQLite is overkill. +- Append-only is failsafe: a half-written line at EOF is recoverable; a + half-written sqlite write can corrupt the db. +- Grep/jq/awk-friendly. No client library needed. +- `qa/scores.db` already exists for the *curated* scores layer — JSONL covers + the *raw* artifact layer. Two different concerns, two different stores. + +## Relationship to `qa/scores.db` + +| Layer | Store | Rows | Source | Purpose | +|------------------|--------------------|-------|---------------|----------------------------------------------------------| +| Raw artifacts | `qa/INDEX.jsonl` | ~800 | runners + backfill | Every playtest dir + play-state + transcript | +| Curated quality | `qa/scores.db` | ~69 | hand-validated (`scores_db.py --add`) | Headline scored runs across surfaces | + +INDEX rows with a matching `run_id` in `scores.db` get a `scored_in_ledger` +field pointing at the curated verdict. Use `--scored` / `--unscored` on +`find_run.py` to filter either way. + +## Sister surfaces (not yet indexed) + +- Engine-side play-state writes (engine server is Python, separate concern) — + backfill catches existing dirs; runtime auto-append would need an engine + hook. Filed as a follow-up issue. +- Cross-worktree artifacts (other devs' / CI's runs landing under + `/private/tmp/wos-*/qa/...`) — each worktree has its own index. If a shared + catalog becomes useful later, promote milestone rows into a committed + `qa/MILESTONES.jsonl`. diff --git a/qa/QA_TOOLS.md b/qa/QA_TOOLS.md index c17432b3..f5e69197 100644 --- a/qa/QA_TOOLS.md +++ b/qa/QA_TOOLS.md @@ -20,6 +20,10 @@ Default local paths: | `qa/export_app_evidence.py` | Normalize a live app or run dir into a reviewable evidence bundle | `manifest.json`, status/session snapshots, screenshots, traces, logs | You are trying to prove behavior without first running a gate | | `qa/app_failure_buckets.py` | Classify harness failures into the stable app bucket list | Bucket JSON / shell-readable output | You need product fixes; this only labels failures | | `qa/app_handoff_hooks.js` | Static/same-port hook probe for core agent-driving controls | Hook-check JSON inside handoff evidence | You need human exploratory testing; this is a bounded locator check | +| `qa/scripts/find_run.py` | Search past QA artifacts (playtest runs, play-states, transcripts) by date / sha / persona / gate state / failed / gave-up. Reads `qa/INDEX.jsonl` — the local artifact catalog built by `backfill_index.py` and auto-appended by the runners. | Stdout: matching entries with paths | You want curated headline quality verdicts — use `qa/scores_ledger.md` (rendered from `qa/scores.db`) instead | +| `qa/scripts/backfill_index.py` | Rebuild `qa/INDEX.jsonl` from scratch (one-time on fresh clone, or after manual artifact moves). Idempotent. | `qa/INDEX.jsonl` (gitignored, per-dev) | The index is already current — the runners auto-append on every playtest | + +Don't grep `qa/ui_playtest_runs/` directly to find past runs — use `qa/scripts/find_run.py`. Full schema and recipes in [`qa/INDEX_SCHEMA.md`](INDEX_SCHEMA.md). Copy-paste fast handoff command: diff --git a/qa/README.md b/qa/README.md new file mode 100644 index 00000000..48da9557 --- /dev/null +++ b/qa/README.md @@ -0,0 +1,41 @@ +# qa/ — QA harness + +Routing pointers (see also: [`QA_TOOLS.md`](QA_TOOLS.md), [`SCORECARD.md`](SCORECARD.md), [`SCORING.md`](SCORING.md)). + +## Finding past QA runs + +Don't `ls` / grep `qa/ui_playtest_runs/`, `play-state/`, or `qa/transcripts/` directly — there are 800+ artifacts with mixed naming. Query the index instead: + +```bash +qa/scripts/find_run.py --since 2026-05-25 --gate red --failed +qa/scripts/find_run.py --persona newbie --paths-only --limit 20 +qa/scripts/find_run.py --sha 1057234 +``` + +Full schema, query recipes, and the canonical naming format for new runs: [`INDEX_SCHEMA.md`](INDEX_SCHEMA.md). + +On a fresh clone (or when the index is stale): + +```bash +python3 qa/scripts/backfill_index.py +``` + +Idempotent. Writes `qa/INDEX.jsonl` (gitignored, per-developer). The two playtest runners (`ui_playtest.sh`, `ui_playtest_app.sh`) auto-append to the index on every successful run. + +## Layered stores + +- **Raw artifact catalog** — `qa/INDEX.jsonl` (this directory). Every playtest dir, play-state, transcript. Auto-built. +- **Curated quality verdicts** — `qa/scores.db` rendered to [`scores_ledger.md`](scores_ledger.md). Hand-validated headline runs across surfaces. Append via `qa/scores_db.py --add`. + +INDEX rows that match a curated ledger row get a `scored_in_ledger` field linking the two. + +## Other key docs in this dir + +| File | Purpose | +|---|---| +| [`QA_TOOLS.md`](QA_TOOLS.md) | Command map for agents — which tool for which surface | +| [`SCORECARD.md`](SCORECARD.md) | Run-level evidence ledger (rendered from `scores.db`) | +| [`SCORING.md`](SCORING.md) | Lens scoring spec (story-craft, mechanical, angry-DM) | +| [`UI_PLAYTEST.md`](UI_PLAYTEST.md) | UI playtest harness (player + DM) | +| [`GUI_WORKBOOK.md`](GUI_WORKBOOK.md) | GUI-built-app surface notes | +| [`INDEX_SCHEMA.md`](INDEX_SCHEMA.md) | Artifact index schema + naming + queries | diff --git a/qa/scripts/__init__.py b/qa/scripts/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/qa/scripts/backfill_index.py b/qa/scripts/backfill_index.py new file mode 100755 index 00000000..5f9f4e53 --- /dev/null +++ b/qa/scripts/backfill_index.py @@ -0,0 +1,164 @@ +#!/usr/bin/env python3 +"""Rebuild qa/INDEX.jsonl from scratch by walking all QA artifact dirs. + +Idempotent: re-running produces the same INDEX.jsonl modulo `indexed_at` +timestamps. Writes to .new, then atomic-renames over. +""" + +from __future__ import annotations + +import argparse +import json +import sys +import time +from pathlib import Path +from typing import Optional + +# Local import works when run as `python3 qa/scripts/backfill_index.py` +# or `python3 -m qa.scripts.backfill_index`. +_HERE = Path(__file__).resolve().parent +if str(_HERE) not in sys.path: + sys.path.insert(0, str(_HERE)) +import indexer # noqa: E402 + + +def _walk_runs(repo_root: Path, verbose: bool) -> list[dict]: + """Walk qa/ui_playtest_runs/* and produce one entry per subdir.""" + base = repo_root / "qa" / "ui_playtest_runs" + out: list[dict] = [] + if not base.exists(): + return out + for d in sorted(base.iterdir()): + if not d.is_dir() or d.name.startswith("."): + continue + entry = indexer.extract_run(d, repo_root) + if entry is None: + if verbose: + print(f" skip (no entry): {d.name}", file=sys.stderr) + continue + entry["source"] = "backfill" + out.append(entry) + return out + + +def _walk_play_states(repo_root: Path, verbose: bool) -> list[dict]: + base = repo_root / "play-state" + out: list[dict] = [] + if not base.exists(): + return out + for d in sorted(base.iterdir()): + if not d.is_dir() or d.name.startswith("."): + continue + entry = indexer.extract_play_state(d, repo_root) + if entry is None: + if verbose: + print(f" skip (no entry): {d.name}", file=sys.stderr) + continue + entry["source"] = "backfill" + out.append(entry) + return out + + +def _walk_transcripts(repo_root: Path, verbose: bool) -> list[dict]: + base = repo_root / "qa" / "transcripts" + out: list[dict] = [] + if not base.exists(): + return out + for f in sorted(base.glob("*.jsonl")): + if not f.is_file(): + continue + entry = indexer.extract_transcript(f, repo_root) + if entry is None: + if verbose: + print(f" skip (no entry): {f.name}", file=sys.stderr) + continue + entry["source"] = "backfill" + out.append(entry) + return out + + +def backfill(repo_root: Path, index_path: Path, kinds: set[str], verbose: bool) -> dict: + """Rebuild INDEX.jsonl. Returns counts per kind.""" + t0 = time.monotonic() + entries: list[dict] = [] + counts: dict[str, int] = {} + + if "run" in kinds: + runs = _walk_runs(repo_root, verbose) + entries.extend(runs) + counts["run"] = len(runs) + + if "play-state" in kinds: + ps = _walk_play_states(repo_root, verbose) + entries.extend(ps) + counts["play-state"] = len(ps) + + if "transcript" in kinds: + tr = _walk_transcripts(repo_root, verbose) + entries.extend(tr) + counts["transcript"] = len(tr) + + index_path.parent.mkdir(parents=True, exist_ok=True) + tmp_path = index_path.with_suffix(index_path.suffix + ".new") + with tmp_path.open("w") as out: + for e in entries: + out.write(json.dumps(e, ensure_ascii=False)) + out.write("\n") + import os + os.replace(tmp_path, index_path) + + counts["total"] = len(entries) + counts["elapsed_sec"] = round(time.monotonic() - t0, 2) + return counts + + +def opaque_summary(repo_root: Path, kinds: set[str]) -> list[str]: + """Return ids of entries we could only minimally parse (no sha, no canonical).""" + opaque: list[str] = [] + if "run" in kinds: + for d in (repo_root / "qa" / "ui_playtest_runs").iterdir() if (repo_root / "qa" / "ui_playtest_runs").exists() else []: + if not d.is_dir(): + continue + parsed = indexer.parse_canonical_name(d.name) + has_meta = (d / "run.json").exists() or (d / "meta.json").exists() + if not has_meta and not parsed.get("sha"): + opaque.append(f"run/{d.name}") + return opaque + + +def main(argv: Optional[list[str]] = None) -> int: + parser = argparse.ArgumentParser(description="Rebuild qa/INDEX.jsonl.") + parser.add_argument("--root", help="Repo root (default: $PWD)") + parser.add_argument("--index", help="Path to INDEX.jsonl (default: /qa/INDEX.jsonl)") + parser.add_argument("--kinds", default="run,play-state,transcript", + help="Comma-separated kinds to index (default: all)") + parser.add_argument("-v", "--verbose", action="store_true") + args = parser.parse_args(argv) + + repo_root = Path(args.root).resolve() if args.root else Path.cwd().resolve() + index_path = Path(args.index).resolve() if args.index else repo_root / "qa" / "INDEX.jsonl" + kinds = {k.strip() for k in args.kinds.split(",") if k.strip()} + + if not (repo_root / "qa").exists(): + print(f"error: no qa/ dir under {repo_root}", file=sys.stderr) + return 2 + + counts = backfill(repo_root, index_path, kinds, args.verbose) + opaque = opaque_summary(repo_root, kinds) + + print(f"Backfill complete: {counts['total']} entries → {index_path}") + for k in ("run", "play-state", "transcript"): + if k in counts: + print(f" {k:<12} {counts[k]:>6}") + print(f" elapsed: {counts['elapsed_sec']}s") + if opaque: + print(f" opaque (no metadata, no sha in name): {len(opaque)}") + for o in opaque[:10]: + print(f" {o}") + if len(opaque) > 10: + print(f" ... and {len(opaque) - 10} more") + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/qa/scripts/find_run.py b/qa/scripts/find_run.py new file mode 100755 index 00000000..21f23b94 --- /dev/null +++ b/qa/scripts/find_run.py @@ -0,0 +1,237 @@ +#!/usr/bin/env python3 +"""Query qa/INDEX.jsonl for past QA artifacts. + +This is the agent-facing surface. Agents and humans use it instead of +grepping the raw filesystem. + +Examples: + qa/scripts/find_run.py --since 2026-05-25 --gate red --failed + qa/scripts/find_run.py --persona wayfarer --paths-only + qa/scripts/find_run.py --sha 1057234 + qa/scripts/find_run.py --kind play-state --since 2026-06-01 + qa/scripts/find_run.py --scored --jsonl | jq . +""" + +from __future__ import annotations + +import argparse +import datetime as _dt +import json +import sys +from pathlib import Path +from typing import Iterable, Optional + + +def _parse_date(s: Optional[str]) -> Optional[_dt.datetime]: + if not s: + return None + for fmt in ("%Y-%m-%dT%H:%M:%SZ", "%Y-%m-%d"): + try: + dt = _dt.datetime.strptime(s, fmt) + return dt.replace(tzinfo=_dt.timezone.utc) + except ValueError: + continue + raise SystemExit(f"error: cannot parse date {s!r} (use YYYY-MM-DD or ISO 8601)") + + +def _entry_ts(entry: dict) -> Optional[_dt.datetime]: + ts = entry.get("timestamp_iso") + if not ts: + return None + try: + return _dt.datetime.strptime(ts, "%Y-%m-%dT%H:%M:%SZ").replace(tzinfo=_dt.timezone.utc) + except ValueError: + return None + + +def iter_entries(index_path: Path) -> Iterable[dict]: + if not index_path.exists(): + raise SystemExit( + f"error: {index_path} does not exist. Run `python3 qa/scripts/backfill_index.py` first." + ) + with index_path.open() as f: + for line in f: + line = line.strip() + if not line: + continue + try: + yield json.loads(line) + except json.JSONDecodeError: + continue + + +def matches(entry: dict, args: argparse.Namespace) -> bool: + if args.kind and entry.get("kind") != args.kind: + return False + if args.sha and (entry.get("commit_sha") or "") != args.sha: + return False + if args.world and entry.get("world") != args.world: + return False + if args.persona and entry.get("persona") != args.persona: + return False + if args.provider and entry.get("provider") != args.provider: + return False + if args.scenario and entry.get("scenario") != args.scenario: + return False + if args.surface and (entry.get("surface") or "").find(args.surface) < 0: + return False + if args.id and args.id not in entry.get("id", ""): + return False + + if args.since: + ets = _entry_ts(entry) + if ets is None or ets < args.since: + return False + if args.until: + ets = _entry_ts(entry) + if ets is None or ets > args.until: + return False + + run_only_filter = args.failed or args.completed or args.gave_up + if run_only_filter and entry.get("kind") != "run": + return False + score = entry.get("score") or {} + if args.failed: + if score.get("pass") is True: + return False + if args.completed: + if not score.get("completed_intro_flow"): + return False + if args.gave_up: + if not score.get("gave_up"): + return False + if args.min_sat is not None: + sat = score.get("persona_satisfaction") + if sat is None or sat < args.min_sat: + return False + if args.max_sat is not None: + sat = score.get("persona_satisfaction") + if sat is None or sat > args.max_sat: + return False + if args.scored and not entry.get("scored_in_ledger"): + return False + if args.unscored and entry.get("scored_in_ledger"): + return False + if args.has_rubric and not entry.get("linked_rubrics"): + return False + if args.part_a_result: + if entry.get("part_a_result") != args.part_a_result: + return False + return True + + +def format_line(entry: dict, args: argparse.Namespace) -> str: + if args.jsonl: + return json.dumps(entry, ensure_ascii=False) + if args.paths_only: + return entry.get("path", "") + ts = (entry.get("timestamp_iso") or "? ")[:19] + kind = entry.get("kind", "?")[:4] + sha = (entry.get("commit_sha") or " ")[:7] + persona = (entry.get("persona") or "")[:11] + world = (entry.get("world") or "")[:14] + extras = [] + if entry.get("kind") == "run": + result = entry.get("part_a_result") + if result: + extras.append(f"A:{result}") + score = entry.get("score") or {} + if score.get("gave_up"): + extras.append("GAVE_UP") + sat = score.get("persona_satisfaction") + if sat is not None: + extras.append(f"sat={sat}") + ledger = entry.get("scored_in_ledger") or {} + if ledger: + story = ledger.get("story_overall") + mech = ledger.get("mech_overall") + if story is not None: + extras.append(f"story={story}") + if mech is not None: + extras.append(f"mech={mech}") + elif entry.get("kind") == "play-state": + cc = entry.get("campaign_count") + if cc: + extras.append(f"camps={cc}") + cl = entry.get("chat_lines") + if cl: + extras.append(f"chat={cl}") + elif entry.get("kind") == "transcript": + if entry.get("role"): + extras.append(entry["role"]) + if entry.get("line_count") is not None: + extras.append(f"lines={entry['line_count']}") + extras_str = " ".join(extras) + line = f"{ts} {kind} {sha} {persona:<11} {world:<14} {entry.get('id', '')[:60]:<60} {extras_str}" + if not args.no_paths: + line += f" → {entry.get('path', '')}" + return line + + +def main(argv: Optional[list[str]] = None) -> int: + parser = argparse.ArgumentParser( + description="Query qa/INDEX.jsonl for past QA artifacts.", + epilog=( + "Examples:\n" + " find_run.py --since 2026-05-25 --failed\n" + " find_run.py --persona newbie --paths-only\n" + " find_run.py --sha 1057234 --jsonl\n" + " find_run.py --kind play-state --since 2026-06-01\n" + ), + formatter_class=argparse.RawDescriptionHelpFormatter, + ) + parser.add_argument("--index", help="INDEX.jsonl path (default: /qa/INDEX.jsonl)") + parser.add_argument("--root", help="Repo root (used to find INDEX.jsonl)") + parser.add_argument("--kind", choices=["run", "play-state", "transcript"]) + parser.add_argument("--since") + parser.add_argument("--until") + parser.add_argument("--sha", help="Filter by exact commit_sha (7-char)") + parser.add_argument("--world") + parser.add_argument("--persona") + parser.add_argument("--provider") + parser.add_argument("--scenario") + parser.add_argument("--surface", help="Substring match on surface (e.g. 'GUI-built-app')") + parser.add_argument("--id", dest="id", help="Substring match on entry id") + parser.add_argument("--gate", choices=["green", "red"], help="(reserved) filter by behavioral gate") + parser.add_argument("--failed", action="store_true", help="score.pass != true") + parser.add_argument("--completed", action="store_true", help="completed_intro_flow == true") + parser.add_argument("--gave-up", action="store_true", dest="gave_up") + parser.add_argument("--part-a-result", choices=["PASS", "FAIL", "skipped"], dest="part_a_result") + parser.add_argument("--min-sat", type=int, dest="min_sat") + parser.add_argument("--max-sat", type=int, dest="max_sat") + parser.add_argument("--scored", action="store_true", + help="Has a matching curated row in qa/scores.db") + parser.add_argument("--unscored", action="store_true", + help="No matching curated row in qa/scores.db") + parser.add_argument("--has-rubric", action="store_true", dest="has_rubric", + help="Has linked lens rubric (tolkien/angrydm)") + parser.add_argument("--limit", type=int, default=0) + parser.add_argument("--paths-only", action="store_true", dest="paths_only") + parser.add_argument("--no-paths", action="store_true", dest="no_paths", + help="Don't append → path to default output") + parser.add_argument("--jsonl", action="store_true", help="Raw JSONL passthrough") + parser.add_argument("--reverse", action="store_true", help="Oldest first (default newest first)") + parser.add_argument("--count", action="store_true", help="Print matching count only") + args = parser.parse_args(argv) + + repo_root = Path(args.root).resolve() if args.root else Path.cwd().resolve() + index_path = Path(args.index).resolve() if args.index else repo_root / "qa" / "INDEX.jsonl" + args.since = _parse_date(args.since) + args.until = _parse_date(args.until) + + matched = [e for e in iter_entries(index_path) if matches(e, args)] + matched.sort(key=lambda e: e.get("timestamp_iso") or "", reverse=not args.reverse) + if args.limit > 0: + matched = matched[: args.limit] + + if args.count: + print(len(matched)) + return 0 + + for e in matched: + print(format_line(e, args)) + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/qa/scripts/indexer.py b/qa/scripts/indexer.py new file mode 100755 index 00000000..40ea49b8 --- /dev/null +++ b/qa/scripts/indexer.py @@ -0,0 +1,461 @@ +#!/usr/bin/env python3 +"""QA artifact indexer. + +Reads a single artifact dir or transcript file and returns one dict suitable +for `qa/INDEX.jsonl`. Designed to be called by: + - the playtest runners (auto-append at end of run, best-effort) + - the backfill script (rebuild INDEX.jsonl from scratch) + - find_run.py (reads only, no writes) + +Stdlib-only. Treats every extracted field as optional; missing → null, no crash. +""" + +from __future__ import annotations + +import argparse +import datetime as _dt +import fcntl +import json +import os +import re +import sqlite3 +import sys +from pathlib import Path +from typing import Any, Iterable, Optional + +CANONICAL_NAME_RE = re.compile( + r"^(?P\d{8}T\d{6}Z)-(?P[0-9a-f]{7,12})-(?P[a-z0-9-]+)-(?P[a-z0-9]+)-(?P[a-z0-9]+)-(?P[a-z0-9-]+)$" +) +ISO_TS_RE = re.compile(r"(\d{8}T\d{6}Z)") +SHA7_RE = re.compile(r"\b([0-9a-f]{7})\b") +KNOWN_PERSONAS = ("newbie", "veteran", "adversarial", "narrative", "optimizer") +KNOWN_WORLDS = ("baldurs-gate", "tidal-commonwealth", "sundered-reach") + + +def _read_json(path: Path) -> Optional[dict]: + try: + with path.open() as f: + return json.load(f) + except (FileNotFoundError, json.JSONDecodeError, OSError): + return None + + +def _ndjson_count(path: Path) -> int: + if not path.exists(): + return 0 + try: + with path.open() as f: + return sum(1 for line in f if line.strip()) + except OSError: + return 0 + + +def _mtime_iso(path: Path) -> Optional[str]: + try: + ts = path.stat().st_mtime + except OSError: + return None + return _dt.datetime.fromtimestamp(ts, tz=_dt.timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ") + + +def parse_canonical_name(name: str) -> dict: + """Extract structured fields from a dir/file name. + + Returns whatever it can parse. Canonical form populates all fields; other + forms get partial credit (ts/sha extracted by regex; persona by suffix match). + """ + out: dict[str, Any] = {} + m = CANONICAL_NAME_RE.match(name) + if m: + out.update(m.groupdict()) + out["canonical"] = True + return out + out["canonical"] = False + ts_match = ISO_TS_RE.search(name) + if ts_match: + out["ts"] = ts_match.group(1) + sha_match = SHA7_RE.search(name) + if sha_match: + out["sha"] = sha_match.group(1) + for persona in KNOWN_PERSONAS: + if persona in name.lower(): + out["persona"] = persona + break + for world in KNOWN_WORLDS: + if world in name.lower(): + out["world"] = world + break + return out + + +def _ts_to_iso(ts: str) -> Optional[str]: + """Convert YYYYMMDDTHHMMSSZ → 2026-06-02T05:30:15Z.""" + try: + dt = _dt.datetime.strptime(ts, "%Y%m%dT%H%M%SZ") + except ValueError: + return None + return dt.strftime("%Y-%m-%dT%H:%M:%SZ") + + +def _scores_db_lookup(run_id: str, repo_root: Path) -> Optional[dict]: + """Look up curated scores from qa/scores.db if a row matches run_id.""" + db_path = repo_root / "qa" / "scores.db" + if not db_path.exists(): + return None + try: + with sqlite3.connect(f"file:{db_path}?mode=ro", uri=True) as conn: + conn.row_factory = sqlite3.Row + cur = conn.execute( + "SELECT story_overall, mech_overall, angrydm_overall, " + "behavioral, rri, critical_bugs, image_render_rate, pass, " + "surface, dm_model, scorer_model " + "FROM runs WHERE run_id = ? LIMIT 1", + (run_id,), + ) + row = cur.fetchone() + if row is None: + return None + return {k: row[k] for k in row.keys()} + except sqlite3.Error: + return None + + +def _linked_transcripts(run_id: str, repo_root: Path) -> list[str]: + """Find qa/transcripts/*.jsonl files.""" + transc_dir = repo_root / "qa" / "transcripts" + if not transc_dir.exists() or not run_id: + return [] + matches = sorted(transc_dir.glob(f"{run_id}*.jsonl")) + return [str(p.relative_to(repo_root)) for p in matches] + + +def _linked_lens_rubrics(run_id: str, commit_sha: Optional[str], repo_root: Path) -> dict: + """Find lens scoring artifacts in qa/transcripts/. + + Tries exact run_id match first (e.g. .tolkien.json), then falls back + to sha-proximity (e.g. gate--duo.tolkien.json shares the sha). + Returns paths only — agents read the file for full scores. + """ + transc_dir = repo_root / "qa" / "transcripts" + if not transc_dir.exists(): + return {} + out: dict[str, str] = {} + for lens in ("tolkien", "angrydm", "score", "state"): + exact = transc_dir / f"{run_id}.{lens}.json" + if exact.exists(): + out[lens] = str(exact.relative_to(repo_root)) + continue + if commit_sha: + duo = transc_dir / f"gate-{commit_sha}-duo.{lens}.json" + if duo.exists(): + out[lens] = str(duo.relative_to(repo_root)) + return out + + +def _linked_play_state(run_json: Optional[dict], run_id: str, repo_root: Path) -> Optional[str]: + """Find a play-state dir linked to this run. + + Prefer run.json.part_a.minted_run_dir; fall back to play-state/. + """ + ps_dir = repo_root / "play-state" + if not ps_dir.exists(): + return None + if run_json: + minted = (run_json.get("part_a") or {}).get("minted_run_dir") + if minted: + cand = ps_dir / minted + if cand.exists(): + return str(cand.relative_to(repo_root)) + direct = ps_dir / run_id + if direct.exists(): + return str(direct.relative_to(repo_root)) + return None + + +def extract_run(run_dir: Path, repo_root: Path) -> Optional[dict]: + """Build an INDEX entry for one qa/ui_playtest_runs/.""" + if not run_dir.is_dir(): + return None + run_id = run_dir.name + run_json = _read_json(run_dir / "run.json") + meta_json = _read_json(run_dir / "meta.json") + score_json = _read_json(run_dir / "score.json") + parsed_name = parse_canonical_name(run_id) + + src = run_json or meta_json or {} + build_sha = src.get("build_sha") or parsed_name.get("sha") + persona = src.get("persona") or parsed_name.get("persona") + world = src.get("world") or parsed_name.get("world") + timestamp_iso = ( + src.get("at") + or src.get("finished_at") + or _ts_to_iso(parsed_name.get("ts", "")) + or _mtime_iso(run_dir) + ) + + entry: dict[str, Any] = { + "kind": "run", + "id": run_id, + "path": str(run_dir.relative_to(repo_root)), + "timestamp_iso": timestamp_iso, + "commit_sha": build_sha[:7] if build_sha else None, + "version": src.get("version"), + "world": world, + "persona": persona, + "provider": parsed_name.get("provider") + or ((run_json.get("part_b") or {}).get("provider") if run_json else None), + "scenario": parsed_name.get("scenario"), + "surface": src.get("surface"), + "beats_cap": src.get("beats_cap"), + "budget_usd": src.get("budget_usd"), + "canonical_name": parsed_name.get("canonical", False), + } + + if run_json: + part_a = run_json.get("part_a") or {} + part_b = run_json.get("part_b") or {} + entry["part"] = run_json.get("part") + entry["part_a_result"] = part_a.get("result") + entry["part_b_persona_loop"] = part_b.get("persona_loop") + entry["part_b_score_pass"] = part_b.get("score_pass") + entry["spend_usd"] = (run_json.get("spend_usd") or {}).get("total") + entry["minted_run_dir"] = part_a.get("minted_run_dir") + entry["provider"] = entry["provider"] or part_b.get("provider") + entry["player_agent"] = part_b.get("player_agent") + + if meta_json: + entry.setdefault("player_cost_usd", meta_json.get("player_cost_usd")) + entry.setdefault("player_rc", meta_json.get("player_rc")) + entry.setdefault("port", meta_json.get("port")) + + if score_json: + entry["score"] = { + "completed_intro_flow": score_json.get("completed_intro_flow"), + "reached_play_screen": score_json.get("reached_play_screen"), + "actions_total": score_json.get("actions_total"), + "in_story_turns": score_json.get("in_story_turns"), + "console_errors": score_json.get("console_errors"), + "network_failures": score_json.get("network_failures"), + "image_404s": score_json.get("image_404s"), + "gave_up": score_json.get("gave_up"), + "persona_satisfaction": score_json.get("persona_satisfaction"), + "satisfaction_source": score_json.get("satisfaction_source"), + "pass": score_json.get("pass"), + } + entry["bug_counts"] = { + "critical": score_json.get("bug_reports_critical"), + "major": score_json.get("bug_reports_major"), + "minor": score_json.get("bug_reports_minor"), + "trivial": score_json.get("bug_reports_trivial"), + "total": score_json.get("bug_reports_total"), + } + + bugs_path = run_dir / "bugs.ndjson" + if bugs_path.exists(): + entry.setdefault("bug_counts", {}) + entry["bug_counts"].setdefault("ndjson_lines", _ndjson_count(bugs_path)) + + summary = run_dir / "summary.md" + if summary.exists(): + entry["summary_md"] = str(summary.relative_to(repo_root)) + + entry["linked_transcripts"] = _linked_transcripts(run_id, repo_root) + entry["linked_play_state"] = _linked_play_state(run_json, run_id, repo_root) + rubrics = _linked_lens_rubrics(run_id, entry.get("commit_sha"), repo_root) + if rubrics: + entry["linked_rubrics"] = rubrics + + scored = _scores_db_lookup(run_id, repo_root) + if scored: + entry["scored_in_ledger"] = scored + + entry["indexed_at"] = _dt.datetime.now(tz=_dt.timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ") + return entry + + +def extract_play_state(ps_dir: Path, repo_root: Path) -> Optional[dict]: + """Build an INDEX entry for one play-state/.""" + if not ps_dir.is_dir(): + return None + name = ps_dir.name + parsed = parse_canonical_name(name) + timestamp_iso = _ts_to_iso(parsed.get("ts", "")) or _mtime_iso(ps_dir) + + entry: dict[str, Any] = { + "kind": "play-state", + "id": name, + "path": str(ps_dir.relative_to(repo_root)), + "timestamp_iso": timestamp_iso, + "commit_sha": parsed.get("sha"), + "world": parsed.get("world"), + "persona": parsed.get("persona"), + "canonical_name": parsed.get("canonical", False), + } + + campaigns_dir = ps_dir / "campaigns" + if campaigns_dir.exists(): + entry["campaign_count"] = sum(1 for _ in campaigns_dir.iterdir() if _.is_dir()) + + chat_jsonl = ps_dir / "chat.jsonl" + if chat_jsonl.exists(): + entry["chat_lines"] = _ndjson_count(chat_jsonl) + + moves_jsonl = ps_dir / "player_moves.jsonl" + if moves_jsonl.exists(): + entry["player_moves"] = _ndjson_count(moves_jsonl) + + linked_run = (repo_root / "qa" / "ui_playtest_runs" / name).exists() + if linked_run: + entry["linked_run"] = f"qa/ui_playtest_runs/{name}" + else: + for run_dir in (repo_root / "qa" / "ui_playtest_runs").glob("*"): + run_json = _read_json(run_dir / "run.json") + if run_json and (run_json.get("part_a") or {}).get("minted_run_dir") == name: + entry["linked_run"] = str(run_dir.relative_to(repo_root)) + break + + entry["indexed_at"] = _dt.datetime.now(tz=_dt.timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ") + return entry + + +def extract_transcript(transc_path: Path, repo_root: Path) -> Optional[dict]: + """Build an INDEX entry for one qa/transcripts/.""" + if not transc_path.is_file(): + return None + name = transc_path.name + parts = name.split(".", 1) + run_id = parts[0] + suffix = parts[1] if len(parts) > 1 else "" + + parsed = parse_canonical_name(run_id) + timestamp_iso = _ts_to_iso(parsed.get("ts", "")) or _mtime_iso(transc_path) + + role = None + if ".chat." in name: + role = "chat" + elif ".dm." in name: + role = "dm" + elif ".player." in name: + role = "player" + + entry: dict[str, Any] = { + "kind": "transcript", + "id": name, + "path": str(transc_path.relative_to(repo_root)), + "timestamp_iso": timestamp_iso, + "commit_sha": parsed.get("sha"), + "run_id": run_id, + "role": role, + "suffix": suffix, + "line_count": _ndjson_count(transc_path), + } + linked = (repo_root / "qa" / "ui_playtest_runs" / run_id) + if linked.exists(): + entry["linked_run"] = f"qa/ui_playtest_runs/{run_id}" + entry["indexed_at"] = _dt.datetime.now(tz=_dt.timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ") + return entry + + +def _iter_existing(index_path: Path) -> Iterable[dict]: + if not index_path.exists(): + return + with index_path.open() as f: + for line in f: + line = line.strip() + if not line: + continue + try: + yield json.loads(line) + except json.JSONDecodeError: + continue + + +def append_or_update(entry: dict, index_path: Path) -> str: + """Append entry to INDEX.jsonl, or replace an existing same-id entry. + + Returns "appended" or "updated". File-locked for concurrent-runner safety. + """ + if entry is None: + return "skipped" + index_path.parent.mkdir(parents=True, exist_ok=True) + lock_path = index_path.with_suffix(index_path.suffix + ".lock") + with lock_path.open("w") as lockfile: + fcntl.flock(lockfile.fileno(), fcntl.LOCK_EX) + try: + existing = list(_iter_existing(index_path)) + key = (entry.get("kind"), entry.get("id")) + replaced = False + for i, row in enumerate(existing): + if (row.get("kind"), row.get("id")) == key: + existing[i] = entry + replaced = True + break + if not replaced: + existing.append(entry) + tmp_path = index_path.with_suffix(index_path.suffix + ".new") + with tmp_path.open("w") as out: + for row in existing: + out.write(json.dumps(row, ensure_ascii=False)) + out.write("\n") + os.replace(tmp_path, index_path) + return "updated" if replaced else "appended" + finally: + fcntl.flock(lockfile.fileno(), fcntl.LOCK_UN) + + +def _repo_root_from(path: Path) -> Path: + """Walk up to find repo root (contains qa/).""" + p = path.resolve() + for ancestor in [p] + list(p.parents): + if (ancestor / "qa").is_dir() and (ancestor / ".git").exists(): + return ancestor + return p + + +def main(argv: Optional[list[str]] = None) -> int: + parser = argparse.ArgumentParser(description="Index a QA artifact dir or file.") + parser.add_argument("path", help="Artifact dir or transcript file to index") + parser.add_argument("--append", action="store_true", + help="Append/update INDEX.jsonl (default: print to stdout)") + parser.add_argument("--index", help="Path to INDEX.jsonl (default: /qa/INDEX.jsonl)") + parser.add_argument("--root", help="Repo root (default: walk up from path)") + args = parser.parse_args(argv) + + target = Path(args.path).resolve() + repo_root = Path(args.root).resolve() if args.root else _repo_root_from(target) + index_path = Path(args.index).resolve() if args.index else repo_root / "qa" / "INDEX.jsonl" + + if target.is_dir(): + try: + rel = target.relative_to(repo_root) + except ValueError: + print(f"error: {target} is not under repo root {repo_root}", file=sys.stderr) + return 2 + parts = rel.parts + if parts and parts[0] == "qa" and len(parts) >= 2 and parts[1] == "ui_playtest_runs": + entry = extract_run(target, repo_root) + elif parts and parts[0] == "play-state": + entry = extract_play_state(target, repo_root) + else: + print(f"error: unrecognized artifact dir kind under {target}", file=sys.stderr) + return 2 + elif target.is_file() and target.suffix == ".jsonl": + entry = extract_transcript(target, repo_root) + else: + print(f"error: {target} is not a directory or .jsonl file", file=sys.stderr) + return 2 + + if entry is None: + print(f"error: could not extract entry from {target}", file=sys.stderr) + return 1 + + if args.append: + result = append_or_update(entry, index_path) + print(f"{result}: {entry['kind']}:{entry['id']} → {index_path}") + else: + print(json.dumps(entry, ensure_ascii=False, indent=2)) + return 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/qa/ui_playtest.sh b/qa/ui_playtest.sh index b82f636d..d77a73e4 100755 --- a/qa/ui_playtest.sh +++ b/qa/ui_playtest.sh @@ -27,11 +27,14 @@ set -uo pipefail ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"; cd "$ROOT" || exit 1 . "$ROOT/qa/lib_beat_driver.sh" # worldos_env + shared helpers -RUN="${1:-play-$(date +%H%M%S)}" WORLD="${2:-baldurs-gate}" PERSONA="${3:-newbie}" BEATS="${4:-30}" # max player palette actions (soft cap) BUDGET="${5:-3.00}" # USD cap for the PLAYER agent +# Canonical run name: ----- +# Indexer (qa/scripts/indexer.py) parses this form; older ad-hoc names still work. +_DEFAULT_SHA="$(git -C "$ROOT" rev-parse --short HEAD 2>/dev/null || echo nogit)" +RUN="${1:-$(date -u +%Y%m%dT%H%M%SZ)-${_DEFAULT_SHA}-${WORLD}-${PERSONA}-claude-play}" PW_DIR="$ROOT/qa/playwright" PW_CHANNEL="$(worldos_env UIPT_CHANNEL "")" # "" = bundled chromium; "chrome" = system Chrome DM_MODEL="$(worldos_env DM_MODEL sonnet)" @@ -229,4 +232,10 @@ echo "[uipt] done. dir=$RUNDIR" if [ -f "$RUNDIR/summary.md" ]; then echo "----- summary.md -----"; cat "$RUNDIR/summary.md" fi + +# --- auto-index this run for qa/INDEX.jsonl (best-effort, never blocks) ------ +if [ -f "$ROOT/qa/scripts/indexer.py" ]; then + python3 "$ROOT/qa/scripts/indexer.py" --append "$RUNDIR" --root "$ROOT" 2>&1 || true +fi + exit "$SCORE_RC" diff --git a/qa/ui_playtest_app.sh b/qa/ui_playtest_app.sh index 127eda85..cd206d9c 100755 --- a/qa/ui_playtest_app.sh +++ b/qa/ui_playtest_app.sh @@ -61,11 +61,14 @@ set -uo pipefail ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"; cd "$ROOT" || exit 1 . "$ROOT/qa/lib_beat_driver.sh" # worldos_env + shared helpers (snapshot path, cost, etc.) -RUN="${1:-app-$(date +%H%M%S)}" WORLD="${2:-baldurs-gate}" PERSONA="${3:-newbie}" BEATS="${4:-6}" BUDGET="${5:-4.00}" +# Canonical run name: ----- +# Indexer (qa/scripts/indexer.py) parses this form; older ad-hoc names still work. +_DEFAULT_SHA="$(git -C "$ROOT" rev-parse --short HEAD 2>/dev/null || echo nogit)" +RUN="${1:-$(date -u +%Y%m%dT%H%M%SZ)-${_DEFAULT_SHA}-${WORLD}-${PERSONA}-${WOS_APP_SELECTED_PROVIDER:-claude}-app}" PART="$(worldos_env APP_PART "${WOS_APP_PART:-AB}")" KEEP_MINTED_BACKEND="${WOS_APP_KEEP_MINTED_BACKEND:-0}" SELECTED_PROVIDER="${WOS_APP_SELECTED_PROVIDER:-}" @@ -893,6 +896,11 @@ log "part A (#356 gate): $PART_A_RESULT part B (persona loop): $PART_B_RESULT" log "spend: DM ~\$$FINAL_DM_SPEND + player ~\$$PART_B_PLAYER_COST = ~\$$TOTAL_SPEND (budget \$$BUDGET)" [ -f "$RUNDIR/run.json" ] && { echo "----- run.json -----"; cat "$RUNDIR/run.json"; } +# --- auto-index this run for qa/INDEX.jsonl (best-effort, never blocks) ------ +if [ -f "$ROOT/qa/scripts/indexer.py" ]; then + python3 "$ROOT/qa/scripts/indexer.py" --append "$RUNDIR" --root "$ROOT" 2>&1 || true +fi + EXIT_OK=1 case "$PART" in A)