feat: puzzle quality system — writer-judge loop#232
Open
Conversation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add --label ralph to skill PR template and CLAUDE.md agent policy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Writer-judge pattern for .pzl puzzles: structured reasoning chain for the writer, 5-dimension scoring rubric for the judge, 3-round revision loop, and findings log for telemetry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5-task plan: judge agent, upgraded writer skill, findings log, calibration against existing puzzles, end-to-end validation. Also fixes bolt-face.pzl expected score range in spec (12-14 → 10-13). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dings log - puzzle-judge: 5-dimension scoring rubric (Focus, Determinism, Signal, Minimality, Documentation), structured verdicts, read-only evaluation - write-puzzle: 7-step reasoning chain, narrative comments, auto judge dispatch with 3-round revision loop, findings log telemetry - puzzle-findings.md: append-only quality telemetry log Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All scores within expected ranges — rubric is well-calibrated. legend-rule: 14/15 PASS, bolt-face: 10/15 NEEDS_REVISION, fdn-keyword-combat: 7/15 REJECT, lands-only: 5/15 REJECT, prowess-buff: 7/15 REJECT. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Score: 14/15 PASS. Typhoid Rats (1/1 deathtouch) blocks Juggernaut (5/3 must-attack) at 3 life. Single win path, zero AI decisions, full design narrative embedded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Writer skill now instructs the caller to extract total_tokens from judge agent dispatch results and record per-round + total in the findings log entry. Enables cost visibility per puzzle. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tion - Card selection guidance: prefer recording-referenced cards, verify before designing, curated palette for common roles - Step 7: writer appends friction/surprise/improvement notes to docs/puzzle-writer-learnings.md after each puzzle - Learnings log: self-improving feedback loop — 3+ repeats graduate into skill rules Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CI Report — GateTests: 825/825 passed (190 skipped) Coverage: 11.0% (642/5861 lines)
Slow tests (>3s): 1
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Writer-judge pattern for producing high-quality
.pzlpuzzles with structured reasoning, scoring, and telemetry..claude/agents/puzzle-judge.md) — 5-dimension scoring rubric (Focus, Determinism, Signal, Minimality, Documentation), structured verdicts (PASS/NEEDS_REVISION/REJECT).pzlcontent, design narrative embedded as#comments, auto-dispatches judge with 3-round revision loop, REJECT recovery (fresh attempt from step 1)docs/puzzle-findings.md) — quality telemetry with token tracking per judge dispatchdocs/puzzle-writer-learnings.md) — friction/surprise/improvement notes accumulate into skill improvementsdeathtouch-kill.pzlscored 14/15 PASSArtifacts
.claude/agents/puzzle-judge.md.claude/skills/write-puzzle/SKILL.mddocs/puzzle-findings.mddocs/puzzle-writer-learnings.mdmatchdoor/src/test/resources/puzzles/deathtouch-kill.pzldocs/superpowers/specs/2026-03-23-puzzle-quality-system-design.mddocs/superpowers/plans/2026-03-23-puzzle-quality-system.mdTest plan
just puzzle-checkpasses on new puzzle🤖 Generated with Claude Code