Skip to content

feat: LLM-backed BDD judge + automated release pipeline#13

Merged
aks-builds merged 1 commit into
mainfrom
feat/llm-judge-release-pipeline
May 18, 2026
Merged

feat: LLM-backed BDD judge + automated release pipeline#13
aks-builds merged 1 commit into
mainfrom
feat/llm-judge-release-pipeline

Conversation

@aks-builds
Copy link
Copy Markdown
Owner

bdd-llm-reviewer (the first v2 change to ship as code):

  • cli/src/bdd/judge.js: opt-in LLM judge using @anthropic-ai/sdk with Claude Haiku 4.5. Tool_use for structured output; cache_control: ephemeral on the proposal system block so re-runs across multiple specs in one feature reuse the cache. Defensive parsing: malformed responses degrade to a bdd/llm-parse-error warning, never throw. Emits three new rule ids: bdd/llm-contradiction (cross-spec contra- dictions), bdd/llm-missing-coverage (success criteria with no scenario), bdd/llm-vague-then (observable verb but no checkable outcome).
  • --llm flag wired into propose, sync, validate. judge.enabled config in .openspecpm/config.json also activates it. Failures in propose are soft; failures in sync hard-gate unless --force; failures in validate degrade into findings.
  • doctor always shows a [judge] section reporting ANTHROPIC_API_KEY with a remediation URL when unset.
  • cli/src/audit.js: record() now accepts an optional meta field; judge usage (model, input_tokens, output_tokens, cache_creation / cache_read input tokens) is logged per LLM call so cache hit rate is auditable from .openspecpm/audit.log.
  • audit scrubber regex tightened: segment-based check instead of substring, so input_tokens / cache_read_input_tokens no longer false-positive the secret redactor. api_token / JIRA_PASSWORD still scrub.
  • cli/tests/judge.test.js: 8 new tests with a stub Anthropic client (no network). cli/tests/audit.test.js: 1 new test for the meta field. 91 -> 100 tests, all green.

Automated release pipeline (PR-based, two-workflow):

  • .github/workflows/release.yml: workflow_dispatch bumps version on a release/vX.Y.Z branch, rolls CHANGELOG, opens a PR, enables squash auto-merge. No direct push to main.
  • .github/workflows/publish.yml: triggers on release/* PR merge. Reads version from package.json, publishes to npm with sigstore provenance, syncs latest dist-tag for pre-1.0, tags the merge commit, creates the GitHub release with notes from CHANGELOG.
  • .github/workflows/auto-approve.yml: header comment updated to document the new APPROVER_PAT secret (reusable workflow extended in aks-builds/workflows@b5021d9 to support a secondary-account PAT approver alongside the existing GitHub App path).
  • CONTRIBUTING.md: Releasing section rewritten for the new flow, approver-secret matrix documented. Test count corrected (91 -> 100).
  • CHANGELOG.md: [Unreleased] block describes both the judge feature and the release pipeline.

Screenshots overhaul:

  • 4 new captures (propose, decompose, fan-out, search) + 1 curated synthetic capture for the LLM judge (real run requires a network call, can't be regenerated deterministically).
  • 5 regenerated captures: doctor now shows [judge], status/next/ blocked/validate reflect the current fixture state.
  • docs/screenshots/render.ps1: new capture blocks, cwd-stripped paths on propose/decompose so widths stay sane.
  • README "In action" section grew from 6 to 11 items in workflow order.

Doc sweep (per user-level CLAUDE.md):

  • README command-reference table flags --llm on propose / sync / validate rows.
  • SKILL.md script-first table mirrors the same.
  • references/conventions.md adds ANTHROPIC_API_KEY under Secrets.

Dependencies:

  • @anthropic-ai/sdk ^0.65.0 added to dependencies.

What & why

Type of change

  • feat — new behavior
  • fix — bug fix
  • refactor — no behavior change
  • docs — docs / comments / README
  • chore — tooling / CI / deps
  • test — tests only

Checklist

  • npm test passes locally
  • Tests added/updated for the behavior change (or N/A — explain below)
  • CHANGELOG.md updated under [Unreleased]
  • README command-reference and/or SKILL.md trigger phrases updated (if commands changed)
  • Adapter changes update the field-mapping table in skill/openspecpm/references/sync.md
  • No secrets, PATs, or API tokens in code, tests, or fixtures
  • BDD scenarios (if added/changed) pass cli/src/bdd/linter.js heuristics

Backend coverage

  • GitHub
  • Azure DevOps
  • Jira
  • Linear
  • GitLab
  • N/A — non-backend change

Screenshots / output

Anything reviewers should know

bdd-llm-reviewer (the first v2 change to ship as code):
- cli/src/bdd/judge.js: opt-in LLM judge using @anthropic-ai/sdk with
  Claude Haiku 4.5. Tool_use for structured output; cache_control:
  ephemeral on the proposal system block so re-runs across multiple
  specs in one feature reuse the cache. Defensive parsing: malformed
  responses degrade to a bdd/llm-parse-error warning, never throw.
  Emits three new rule ids: bdd/llm-contradiction (cross-spec contra-
  dictions), bdd/llm-missing-coverage (success criteria with no
  scenario), bdd/llm-vague-then (observable verb but no checkable
  outcome).
- --llm flag wired into propose, sync, validate. judge.enabled config
  in .openspecpm/config.json also activates it. Failures in propose
  are soft; failures in sync hard-gate unless --force; failures in
  validate degrade into findings.
- doctor always shows a [judge] section reporting ANTHROPIC_API_KEY
  with a remediation URL when unset.
- cli/src/audit.js: record() now accepts an optional meta field;
  judge usage (model, input_tokens, output_tokens, cache_creation /
  cache_read input tokens) is logged per LLM call so cache hit rate
  is auditable from .openspecpm/audit.log.
- audit scrubber regex tightened: segment-based check instead of
  substring, so input_tokens / cache_read_input_tokens no longer
  false-positive the secret redactor. api_token / JIRA_PASSWORD
  still scrub.
- cli/tests/judge.test.js: 8 new tests with a stub Anthropic client
  (no network). cli/tests/audit.test.js: 1 new test for the meta
  field. 91 -> 100 tests, all green.

Automated release pipeline (PR-based, two-workflow):
- .github/workflows/release.yml: workflow_dispatch bumps version on
  a release/vX.Y.Z branch, rolls CHANGELOG, opens a PR, enables
  squash auto-merge. No direct push to main.
- .github/workflows/publish.yml: triggers on release/* PR merge.
  Reads version from package.json, publishes to npm with sigstore
  provenance, syncs latest dist-tag for pre-1.0, tags the merge
  commit, creates the GitHub release with notes from CHANGELOG.
- .github/workflows/auto-approve.yml: header comment updated to
  document the new APPROVER_PAT secret (reusable workflow extended
  in aks-builds/workflows@b5021d9 to support a secondary-account
  PAT approver alongside the existing GitHub App path).
- CONTRIBUTING.md: Releasing section rewritten for the new flow,
  approver-secret matrix documented. Test count corrected (91 -> 100).
- CHANGELOG.md: [Unreleased] block describes both the judge feature
  and the release pipeline.

Screenshots overhaul:
- 4 new captures (propose, decompose, fan-out, search) + 1 curated
  synthetic capture for the LLM judge (real run requires a network
  call, can't be regenerated deterministically).
- 5 regenerated captures: doctor now shows [judge], status/next/
  blocked/validate reflect the current fixture state.
- docs/screenshots/render.ps1: new capture blocks, cwd-stripped paths
  on propose/decompose so widths stay sane.
- README "In action" section grew from 6 to 11 items in workflow
  order.

Doc sweep (per user-level CLAUDE.md):
- README command-reference table flags --llm on propose / sync /
  validate rows.
- SKILL.md script-first table mirrors the same.
- references/conventions.md adds ANTHROPIC_API_KEY under Secrets.

Dependencies:
- @anthropic-ai/sdk ^0.65.0 added to dependencies.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@aks-builds aks-builds self-assigned this May 18, 2026
Copy link
Copy Markdown
Collaborator

@aks-reviewes aks-reviewes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-approved by aks-builds secondary account - PR opened by the sole codeowner.

Copy link
Copy Markdown

@aks-codeowner-bot aks-codeowner-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-approved by aks-codeowner-bot - PR opened by the sole codeowner.

@aks-builds aks-builds merged commit 2fb46c3 into main May 18, 2026
3 checks passed
@aks-builds aks-builds deleted the feat/llm-judge-release-pipeline branch May 18, 2026 06:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants