The gap
Today the self-judge reads ONE skill's SKILL.md to ground its rubric. But Claude often produces output via a chain — e.g. humanizer followed by a voice-style skill, or brand-guidelines overlaid on internal-comms. The judge sees only one skill's intent and misses the other's.
Proposed direction
When the calling agent invokes Second Pass, it should be able to pass a list of skills used, not just one. The judge then reads each SKILL.md, derives a unified intent (with explicit conflict resolution if rules clash), and grades against the merged spec.
Edge cases to handle
- Skill rules conflict (e.g. one says "use em-dashes," another bans them). Define a precedence rule.
- Token budget on multiple SKILL.md reads. Cache or summarize.
- Order matters — was humanizer applied first or last? May affect which rules dominate.
Out of scope
Per-skill custom rubrics. The merged intent should still feed the universal A-F rubric.
The gap
Today the self-judge reads ONE skill's SKILL.md to ground its rubric. But Claude often produces output via a chain — e.g.
humanizerfollowed by a voice-style skill, orbrand-guidelinesoverlaid oninternal-comms. The judge sees only one skill's intent and misses the other's.Proposed direction
When the calling agent invokes Second Pass, it should be able to pass a list of skills used, not just one. The judge then reads each SKILL.md, derives a unified intent (with explicit conflict resolution if rules clash), and grades against the merged spec.
Edge cases to handle
Out of scope
Per-skill custom rubrics. The merged intent should still feed the universal A-F rubric.