Trim and re-trigger `azure-policy-advisor` SKILL.md — add DO NOT USE FOR + cut tokens

Follow-up from PR #157 (eval suite landing) and a fresh `waza` analysis run on 2026-06-04.

## Why now

PR #157 shipped the first eval suite for `azure-policy-advisor`. Two independent signals converged on the same finding: **the SKILL.md is verbose AND under-bounded on routing.**

### Signal 1 — Empirical (smoke trial in PR #157)

The `negative-naming-question` task originally failed with a trigger score of **0.57** (just over the 0.50 threshold). The naming prompt mentioned `resource`, `compliance`, and `governance` — overlapping vocabulary with this skill's description. Worked around in PR #157 by rewriting the negative prompt to pure naming-string trivia, but the underlying ambiguity is in the SKILL.md, not the eval.

### Signal 2 — LLM judge (`waza quality`)

| Dimension | Score |
|---|---|
| Completeness | 5 / 5 |
| Clarity | 4 / 5 |
| **Trigger precision** | **3 / 5** ← lowest score |
| Scope coverage | 4 / 5 |
| Anti-patterns | 4 / 5 |
| **Overall** | **4.0 / 5** |

> Judge verdict on trigger precision:
> > The 'When to Use' section lists four reasonable triggers, but there is no 'DO NOT USE FOR' section — omitting exclusions makes it hard to distinguish this skill from the security-analyzer or RBAC skills, and risks mis-routing requests like 'check if my template is secure' that belong elsewhere.

### Signal 3 — Token budget (`waza check` / `waza tokens`)

- SKILL.md is **6,233 tokens** vs the 500-token soft target / 1,300-token hard fallback in `.waza.yaml`.
- `waza tokens suggest` identifies **~3,582 reclaimable tokens**, dominated by:
  - Line 508: 108-line JSON output schema block (~1,568 tokens)
  - Line 303: 97-line Part 2 markdown report template (~1,392 tokens)
  - 71 decorative emojis (~142 tokens)
- 637-line body exceeds the 500-line progressive-disclosure heuristic; 2 code blocks exceed 50 lines.

## Scope

Land via `/skill-improve azure-policy-advisor` (per umbrella #93 contributor loop).

### Concrete edits

1. **Add a `## When NOT to Use` (or `DO NOT USE FOR:`) section** listing adjacent skills and what they own. Suggested phrasing:
   - For per-resource security configuration assessment → use `azure-security-analyzer`
   - For RBAC role recommendations → use `azure-role-selector`
   - For CAF naming abbreviations / name-string constraints → use `azure-naming-research`
   - For pricing / cost estimation → use `azure-cost-estimator`
   - For generating new ARM templates → use `azure-template-generator`
2. **Extract giant code blocks to `references/`** (waza's progressive-disclosure pattern):
   - `references/policy-recommendations-schema.json` ← the 108-line JSON schema currently at line 508
   - `references/policy-assessment-template.md` ← the 97-line Part 2 markdown template currently at line 303
   - SKILL.md references them with a short link + the smallest illustrative excerpt
3. **Clarify scope on existing resources** — `argument-hint` says "ARM template JSON or resource types to assess", but the procedure focuses entirely on template parsing. Add a short note on whether/how the skill assesses live deployed resources (or split into a separate skill if not).
4. **Add discovery guidance for placeholders** — `{subscription-id}`, `{mg-name}` appear in CLI snippets without instruction on how the agent should obtain them when not provided by the user.
5. **Trim decorative emojis** — 73 emojis in the file (~142 tokens). Keep the severity tier icons (🔴🟠🟡🔵) and status markers (✅⚠️) which carry semantic weight; drop the rest.

### Acceptance

- [ ] `waza check .github/skills/azure-policy-advisor` compliance score moves from Low → Medium or higher.
- [ ] Token count under the 1,300 hard fallback (preferably under 3,000 as a realistic intermediate target — full 500 target is unrealistic for this skill's procedural depth).
- [ ] `waza quality` trigger_precision dimension improves from 3/5 → 4/5 or 5/5.
- [ ] Re-run the PR #157 eval suite: the original `negative-naming-question` wording (with 'resource' / 'compliance' / 'governance') should now score < 0.50 on trigger heuristic.
- [ ] PR description includes before/after waza scores.

## Out of scope

- Eval suite changes — PR #157's suite was designed against the current SKILL.md and should not need updates beyond reverting the naming-prompt workaround.
- Procedure logic — the 7-step assessment flow is sound (judge scored completeness 5/5); this issue is purely about routing precision and token efficiency.

## Related

- Eval suite (just landed): PR #157
- Original eval-authoring issue: #108
- Umbrella: #93
- Skill-improve playbook: `.github/prompts/skill-improve.prompt.md`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trim and re-trigger `azure-policy-advisor` SKILL.md — add DO NOT USE FOR + cut tokens #158

Why now

Signal 1 — Empirical (smoke trial in PR #157)

Signal 2 — LLM judge (`waza quality`)

Signal 3 — Token budget (`waza check` / `waza tokens`)

Scope

Concrete edits

Acceptance

Out of scope

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Dimension	Score
Completeness	5 / 5
Clarity	4 / 5
Trigger precision	3 / 5 ← lowest score
Scope coverage	4 / 5
Anti-patterns	4 / 5
Overall	4.0 / 5

Trim and re-trigger azure-policy-advisor SKILL.md — add DO NOT USE FOR + cut tokens #158

Description

Why now

Signal 1 — Empirical (smoke trial in PR #157)

Signal 2 — LLM judge (waza quality)

Signal 3 — Token budget (waza check / waza tokens)

Scope

Concrete edits

Acceptance

Out of scope

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Trim and re-trigger `azure-policy-advisor` SKILL.md — add DO NOT USE FOR + cut tokens #158

Signal 2 — LLM judge (`waza quality`)

Signal 3 — Token budget (`waza check` / `waza tokens`)