Skip to content

spike(p4_beat): structured Pydantic html: str — 5/5 acceptance (HOM-243)#136

Open
sidorovanthon wants to merge 3 commits into
mainfrom
a/hom-243-spike
Open

spike(p4_beat): structured Pydantic html: str — 5/5 acceptance (HOM-243)#136
sidorovanthon wants to merge 3 commits into
mainfrom
a/hom-243-spike

Conversation

@sidorovanthon
Copy link
Copy Markdown
Owner

Summary

HOM-243 pre-migration spike. Hard-gates the HOM-230 state-first-artifacts epic (HOM-231..242) — 5/5 paid dispatches must succeed with structured-output extraction on ~30 KB body before the migration plan proceeds.

Outcome: PASS. Migration proceeds with state-channel storage; the §6.0 BaseStore fallback is NOT taken.

Results — 6/6 paid dispatches succeeded

# html_chars retries wall_s tokens_out
pilot 7238 0 140 (logged)
1 5524 0 76 4632
2 6430 0 134 10450
3 5838 0 114 8940
4 7022 0 84 5786
5 6236 0 72 4850
  • Tier: `claude-opus-4-7` (production "expensive").
  • Zero `SchemaValidationError`. Zero retries. Zero truncation. Zero JSON-escape corruption.
  • Total spend ≈ $9 (within $10 cap).
  • HOM-134 retreat (`p4_captions_layer.py:172-180`) was on an older tier; current tier handles 30 KB structured-output reliably.

Evidence: `docs/spikes/hom-243-results.json`, `docs/spikes/hom-243-pilot.log`, `docs/spikes/hom-243-full.log`.

Code changes

  • `graph/src/edit_episode_graph/nodes/p4_beat.py` — adds `BeatOutput { html: str }` Pydantic schema; `_build_node()` wires `output_schema=BeatOutput`, `allowed_tools=["Read"]` (drops `Write`), `result_key="_beat_html_spike"`. `_CACHE_VERSION` 7 → 8.
  • `graph/src/edit_episode_graph/briefs/p4_beat.j2` — minimal edit: output-shape instruction + Process steps 4/5 now require structured `BeatOutput` JSON instead of `Write` to `scene_html_path`. Canon read list, hard rules, anti-patterns, palette/typography sections untouched (CLAUDE.md §"Decomposition via brief-references-canon" item 1).
  • `tests/snapshots/briefs/p4_beat.txt` — refreshed snapshot.
  • `scripts/spike_hom243.py` — standalone measurement script. Bypasses `SqliteCache` by construction (calls `LLMNode.call` directly with a real router); flagged paid; `--limit N` for risk-cap pilots.

Production-break warning — DO NOT MERGE before HOM-231 lands

`p4_beat` no longer writes `scene_html_path` to disk, but `p4_assemble_index.py:588` still reads it via `read_text`. Merging before Step A (`compose.scenes` reducer) + Step B (assemble switches to state read) would break Phase 4 production. The spec calls for spike PR to land as Step 0; in practice this PR's evidence value is `docs/spikes/` and the schema/brief edits should be either:

  • (a) merged together with HOM-231 / HOM-232 (Step A reducer + Step B `p4_beat` migration) as a coordinated cutover, or
  • (b) merged standalone with a follow-up commit re-adding a transient `Write` dual-path so the tree stays green during HOM-231 development.

Defer merge decision to operator. Evidence stands either way.

Test plan

  • Pilot dispatch — 1/1 success, html=7238, 0 retries.
  • Full acceptance — 5/5 success, html ≥ 5524, 0 retries.
  • (deferred to HOM-231) brief-snapshot check via `pytest tests/test_brief_snapshots.py` — refreshed in commit 4330859 but not re-validated post-merge since topology unchanged.
  • (deferred to HOM-232) replay smoke re-record — current `tests/test_graph_replay.py::test_p4_beat_smoke` skips with `requires_fixture_cache`; re-record happens during Step B per spec.

🤖 Generated with Claude Code

anticodeguy and others added 3 commits May 10, 2026 19:32
…M-243)

Reverses the HOM-134 FS-source-of-truth retreat for `p4_beat` only,
gated on a 5/5 acceptance run (see scripts/spike_hom243.py + spec
docs/superpowers/specs/2026-05-10-state-first-artifacts.md §10 Step 0).

- Add `BeatOutput { html: str }` Pydantic schema in nodes/p4_beat.py.
- Wire `output_schema=BeatOutput` + drop `Write` from `allowed_tools`
  in `_build_node()`. `result_key` becomes `_beat_html_spike` — fan-out
  reducer is OUT OF SCOPE per spec; acceptance inspects the returned
  dict directly, production fan-in remains FS-driven.
- `p4_beat_node` no longer pre-creates the scene HTML directory (no
  `Write` to land in it).
- Update brief output-shape + Process steps to instruct the sub-agent
  to return the scene fragment as the `html` field of the structured
  response (single fenced ```json``` block; the orchestrator's
  `_schema_extract` accepts the first valid fenced JSON). Canon read
  list, hard rules, anti-patterns, palette/typography, density and
  exit-pair sections are UNTOUCHED — brief still references canon by
  path, never embeds (CLAUDE.md §"Decomposition via brief-references-canon").
- Refresh `tests/snapshots/briefs/p4_beat.txt`.
- Bump `_CACHE_VERSION` 7 → 8 with HOM-243 rationale comment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…HOM-243)

Measurement script for the HOM-243 pre-migration spike. Runs N (default 5)
sequential paid dispatches of `p4_beat` against the canonical fixture's
`hook` beat, records per-attempt success/html_chars/retry_count/exception/
wall_time, writes summary to `docs/spikes/hom-243-results.json`, exits
non-zero on acceptance failure (5/5 success + every html_chars >= 5_000 +
every retry_count == 0).

State is reconstructed from the committed fixture cache.db via raw SQLite
+ LangGraph's `JsonPlusSerializer` (same path as
`tests/_helpers/replay_dispatch`) — no graph runtime, no LLM cache hit
risk. The dispatch goes straight through `LLMNode.__call__` so the
production `CachePolicy` is bypassed by construction; every iteration is
a real paid call.

DO NOT execute under replay/$0 conditions. Operator authorisation
required (~$10 cap). Invocation:

    $env:HOMESTUDIO_PROJECT_ROOT = "$PWD\tests\fixtures"
    $env:PYTHONPATH = "graph\src"
    graph\.venv\Scripts\python.exe scripts\spike_hom243.py

`docs/spikes/README.md` documents the spike directory convention.
The results JSON is intentionally NOT committed in this PR — operator
populates after the paid run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pilot: 1/1, html=7238 chars, 0 retries, 140s.
Full:  5/5, html ∈ [5524, 7022] chars, 0 retries, 72-134s/dispatch.

Tier: claude-opus-4-7 (production "expensive"). Total spend ≈ $9
(6 dispatches at ~$1.50). No SchemaValidationError, no truncation,
no JSON-escape corruption observed across any of 6 paid attempts.

Acceptance gate per spec docs/superpowers/specs/2026-05-10-state-first-artifacts.md
§10 Step 0 — PASSES. HOM-230 epic proceeds with state-channel
storage path (HOM-231..242). BaseStore §6.0 fallback is NOT taken.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sidorovanthon pushed a commit that referenced this pull request May 10, 2026
…0 status

- §10 Step 0: add 2026-05-10 status note recording HOM-243 spike PASS
  (6/6 paid dispatches, 0 SchemaValidationError, html ∈ [5524, 7238]).
  Re-trigger condition documented (tier downgrade or structural brief
  mutation).
- §10b: rename "sub-issue HOM-243 candidate" → "future sub-issue (number
  TBD)" — HOM-243 is the spike PR #136, not the atomic-record protocol.
- Section numbering: orphaned sizing-budget table now lives under a
  proper "## 11. Sizing budget" heading; cascading renumber of the
  duplicate §12 ("Open questions" → §13), §13 → §14 (Acceptance), §14
  → §15 (References), §15 → §16 (CLAUDE.md amendments). Cross-reference
  on line 119 updated (§13 → §11).

Per PR #135 review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sidorovanthon added a commit that referenced this pull request May 10, 2026
…utputs (#135)

* docs(spec): state-first artifacts — single-source-of-truth for node outputs

* docs(spec): revise state-first-artifacts per independent review (HOM-230)

- Honestly scope to 4/7 incidents; flag schema-evolution + process-discipline as orthogonal
- Add §6.0 Considered alternatives — evaluate LangGraph BaseStore vs state-channel
- Add pre-migration spike on p4_beat structured output (HOM-134 prior)
- Atomic-record protocol as complementary fix
- Split step D into D1 (read switch) + D2 (strip dual-write + git rm)
- Move compose.scenes reducer + test into step A
- Add §Risks & rollback section

* docs(spec): apply review nits — HOM-243 collision, §-numbering, Step 0 status

- §10 Step 0: add 2026-05-10 status note recording HOM-243 spike PASS
  (6/6 paid dispatches, 0 SchemaValidationError, html ∈ [5524, 7238]).
  Re-trigger condition documented (tier downgrade or structural brief
  mutation).
- §10b: rename "sub-issue HOM-243 candidate" → "future sub-issue (number
  TBD)" — HOM-243 is the spike PR #136, not the atomic-record protocol.
- Section numbering: orphaned sizing-budget table now lives under a
  proper "## 11. Sizing budget" heading; cascading renumber of the
  duplicate §12 ("Open questions" → §13), §13 → §14 (Acceptance), §14
  → §15 (References), §15 → §16 (CLAUDE.md amendments). Cross-reference
  on line 119 updated (§13 → §11).

Per PR #135 review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: anticodeguy <anticodeguy@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant