Follow-up from PR #157 (eval suite landing) and a fresh waza analysis run on 2026-06-04.
Why now
PR #157 shipped the first eval suite for azure-policy-advisor. Two independent signals converged on the same finding: the SKILL.md is verbose AND under-bounded on routing.
Signal 1 — Empirical (smoke trial in PR #157)
The negative-naming-question task originally failed with a trigger score of 0.57 (just over the 0.50 threshold). The naming prompt mentioned resource, compliance, and governance — overlapping vocabulary with this skill's description. Worked around in PR #157 by rewriting the negative prompt to pure naming-string trivia, but the underlying ambiguity is in the SKILL.md, not the eval.
Signal 2 — LLM judge (waza quality)
| Dimension |
Score |
| Completeness |
5 / 5 |
| Clarity |
4 / 5 |
| Trigger precision |
3 / 5 ← lowest score |
| Scope coverage |
4 / 5 |
| Anti-patterns |
4 / 5 |
| Overall |
4.0 / 5 |
Judge verdict on trigger precision:
The 'When to Use' section lists four reasonable triggers, but there is no 'DO NOT USE FOR' section — omitting exclusions makes it hard to distinguish this skill from the security-analyzer or RBAC skills, and risks mis-routing requests like 'check if my template is secure' that belong elsewhere.
Signal 3 — Token budget (waza check / waza tokens)
- SKILL.md is 6,233 tokens vs the 500-token soft target / 1,300-token hard fallback in
.waza.yaml.
waza tokens suggest identifies ~3,582 reclaimable tokens, dominated by:
- Line 508: 108-line JSON output schema block (~1,568 tokens)
- Line 303: 97-line Part 2 markdown report template (~1,392 tokens)
- 71 decorative emojis (~142 tokens)
- 637-line body exceeds the 500-line progressive-disclosure heuristic; 2 code blocks exceed 50 lines.
Scope
Land via /skill-improve azure-policy-advisor (per umbrella #93 contributor loop).
Concrete edits
- Add a
## When NOT to Use (or DO NOT USE FOR:) section listing adjacent skills and what they own. Suggested phrasing:
- For per-resource security configuration assessment → use
azure-security-analyzer
- For RBAC role recommendations → use
azure-role-selector
- For CAF naming abbreviations / name-string constraints → use
azure-naming-research
- For pricing / cost estimation → use
azure-cost-estimator
- For generating new ARM templates → use
azure-template-generator
- Extract giant code blocks to
references/ (waza's progressive-disclosure pattern):
references/policy-recommendations-schema.json ← the 108-line JSON schema currently at line 508
references/policy-assessment-template.md ← the 97-line Part 2 markdown template currently at line 303
- SKILL.md references them with a short link + the smallest illustrative excerpt
- Clarify scope on existing resources —
argument-hint says "ARM template JSON or resource types to assess", but the procedure focuses entirely on template parsing. Add a short note on whether/how the skill assesses live deployed resources (or split into a separate skill if not).
- Add discovery guidance for placeholders —
{subscription-id}, {mg-name} appear in CLI snippets without instruction on how the agent should obtain them when not provided by the user.
- Trim decorative emojis — 73 emojis in the file (~142 tokens). Keep the severity tier icons (🔴🟠🟡🔵) and status markers (✅⚠️) which carry semantic weight; drop the rest.
Acceptance
Out of scope
Related
Follow-up from PR #157 (eval suite landing) and a fresh
wazaanalysis run on 2026-06-04.Why now
PR #157 shipped the first eval suite for
azure-policy-advisor. Two independent signals converged on the same finding: the SKILL.md is verbose AND under-bounded on routing.Signal 1 — Empirical (smoke trial in PR #157)
The
negative-naming-questiontask originally failed with a trigger score of 0.57 (just over the 0.50 threshold). The naming prompt mentionedresource,compliance, andgovernance— overlapping vocabulary with this skill's description. Worked around in PR #157 by rewriting the negative prompt to pure naming-string trivia, but the underlying ambiguity is in the SKILL.md, not the eval.Signal 2 — LLM judge (
waza quality)Signal 3 — Token budget (
waza check/waza tokens).waza.yaml.waza tokens suggestidentifies ~3,582 reclaimable tokens, dominated by:Scope
Land via
/skill-improve azure-policy-advisor(per umbrella #93 contributor loop).Concrete edits
## When NOT to Use(orDO NOT USE FOR:) section listing adjacent skills and what they own. Suggested phrasing:azure-security-analyzerazure-role-selectorazure-naming-researchazure-cost-estimatorazure-template-generatorreferences/(waza's progressive-disclosure pattern):references/policy-recommendations-schema.json← the 108-line JSON schema currently at line 508references/policy-assessment-template.md← the 97-line Part 2 markdown template currently at line 303argument-hintsays "ARM template JSON or resource types to assess", but the procedure focuses entirely on template parsing. Add a short note on whether/how the skill assesses live deployed resources (or split into a separate skill if not).{subscription-id},{mg-name}appear in CLI snippets without instruction on how the agent should obtain them when not provided by the user.Acceptance
waza check .github/skills/azure-policy-advisorcompliance score moves from Low → Medium or higher.waza qualitytrigger_precision dimension improves from 3/5 → 4/5 or 5/5.negative-naming-questionwording (with 'resource' / 'compliance' / 'governance') should now score < 0.50 on trigger heuristic.Out of scope
Related
azure-policy-advisor#108.github/prompts/skill-improve.prompt.md