feat: LLM-backed BDD judge + automated release pipeline by aks-builds · Pull Request #13 · aks-builds/openspecpm

aks-builds · 2026-05-18T06:06:41Z

bdd-llm-reviewer (the first v2 change to ship as code):

cli/src/bdd/judge.js: opt-in LLM judge using @anthropic-ai/sdk with Claude Haiku 4.5. Tool_use for structured output; cache_control: ephemeral on the proposal system block so re-runs across multiple specs in one feature reuse the cache. Defensive parsing: malformed responses degrade to a bdd/llm-parse-error warning, never throw. Emits three new rule ids: bdd/llm-contradiction (cross-spec contra- dictions), bdd/llm-missing-coverage (success criteria with no scenario), bdd/llm-vague-then (observable verb but no checkable outcome).
--llm flag wired into propose, sync, validate. judge.enabled config in .openspecpm/config.json also activates it. Failures in propose are soft; failures in sync hard-gate unless --force; failures in validate degrade into findings.
doctor always shows a [judge] section reporting ANTHROPIC_API_KEY with a remediation URL when unset.
cli/src/audit.js: record() now accepts an optional meta field; judge usage (model, input_tokens, output_tokens, cache_creation / cache_read input tokens) is logged per LLM call so cache hit rate is auditable from .openspecpm/audit.log.
audit scrubber regex tightened: segment-based check instead of substring, so input_tokens / cache_read_input_tokens no longer false-positive the secret redactor. api_token / JIRA_PASSWORD still scrub.
cli/tests/judge.test.js: 8 new tests with a stub Anthropic client (no network). cli/tests/audit.test.js: 1 new test for the meta field. 91 -> 100 tests, all green.

Automated release pipeline (PR-based, two-workflow):

.github/workflows/release.yml: workflow_dispatch bumps version on a release/vX.Y.Z branch, rolls CHANGELOG, opens a PR, enables squash auto-merge. No direct push to main.
.github/workflows/publish.yml: triggers on release/* PR merge. Reads version from package.json, publishes to npm with sigstore provenance, syncs latest dist-tag for pre-1.0, tags the merge commit, creates the GitHub release with notes from CHANGELOG.
.github/workflows/auto-approve.yml: header comment updated to document the new APPROVER_PAT secret (reusable workflow extended in aks-builds/workflows@b5021d9 to support a secondary-account PAT approver alongside the existing GitHub App path).
CONTRIBUTING.md: Releasing section rewritten for the new flow, approver-secret matrix documented. Test count corrected (91 -> 100).
CHANGELOG.md: [Unreleased] block describes both the judge feature and the release pipeline.

Screenshots overhaul:

4 new captures (propose, decompose, fan-out, search) + 1 curated synthetic capture for the LLM judge (real run requires a network call, can't be regenerated deterministically).
5 regenerated captures: doctor now shows [judge], status/next/ blocked/validate reflect the current fixture state.
docs/screenshots/render.ps1: new capture blocks, cwd-stripped paths on propose/decompose so widths stay sane.
README "In action" section grew from 6 to 11 items in workflow order.

Doc sweep (per user-level CLAUDE.md):

README command-reference table flags --llm on propose / sync / validate rows.
SKILL.md script-first table mirrors the same.
references/conventions.md adds ANTHROPIC_API_KEY under Secrets.

Dependencies:

@anthropic-ai/sdk ^0.65.0 added to dependencies.

What & why

Type of change

Checklist

npm test passes locally
Tests added/updated for the behavior change (or N/A — explain below)
CHANGELOG.md updated under [Unreleased]
README command-reference and/or SKILL.md trigger phrases updated (if commands changed)
Adapter changes update the field-mapping table in skill/openspecpm/references/sync.md
No secrets, PATs, or API tokens in code, tests, or fixtures
BDD scenarios (if added/changed) pass cli/src/bdd/linter.js heuristics

Backend coverage

Screenshots / output

Anything reviewers should know

bdd-llm-reviewer (the first v2 change to ship as code): - cli/src/bdd/judge.js: opt-in LLM judge using @anthropic-ai/sdk with Claude Haiku 4.5. Tool_use for structured output; cache_control: ephemeral on the proposal system block so re-runs across multiple specs in one feature reuse the cache. Defensive parsing: malformed responses degrade to a bdd/llm-parse-error warning, never throw. Emits three new rule ids: bdd/llm-contradiction (cross-spec contra- dictions), bdd/llm-missing-coverage (success criteria with no scenario), bdd/llm-vague-then (observable verb but no checkable outcome). - --llm flag wired into propose, sync, validate. judge.enabled config in .openspecpm/config.json also activates it. Failures in propose are soft; failures in sync hard-gate unless --force; failures in validate degrade into findings. - doctor always shows a [judge] section reporting ANTHROPIC_API_KEY with a remediation URL when unset. - cli/src/audit.js: record() now accepts an optional meta field; judge usage (model, input_tokens, output_tokens, cache_creation / cache_read input tokens) is logged per LLM call so cache hit rate is auditable from .openspecpm/audit.log. - audit scrubber regex tightened: segment-based check instead of substring, so input_tokens / cache_read_input_tokens no longer false-positive the secret redactor. api_token / JIRA_PASSWORD still scrub. - cli/tests/judge.test.js: 8 new tests with a stub Anthropic client (no network). cli/tests/audit.test.js: 1 new test for the meta field. 91 -> 100 tests, all green. Automated release pipeline (PR-based, two-workflow): - .github/workflows/release.yml: workflow_dispatch bumps version on a release/vX.Y.Z branch, rolls CHANGELOG, opens a PR, enables squash auto-merge. No direct push to main. - .github/workflows/publish.yml: triggers on release/* PR merge. Reads version from package.json, publishes to npm with sigstore provenance, syncs latest dist-tag for pre-1.0, tags the merge commit, creates the GitHub release with notes from CHANGELOG. - .github/workflows/auto-approve.yml: header comment updated to document the new APPROVER_PAT secret (reusable workflow extended in aks-builds/workflows@b5021d9 to support a secondary-account PAT approver alongside the existing GitHub App path). - CONTRIBUTING.md: Releasing section rewritten for the new flow, approver-secret matrix documented. Test count corrected (91 -> 100). - CHANGELOG.md: [Unreleased] block describes both the judge feature and the release pipeline. Screenshots overhaul: - 4 new captures (propose, decompose, fan-out, search) + 1 curated synthetic capture for the LLM judge (real run requires a network call, can't be regenerated deterministically). - 5 regenerated captures: doctor now shows [judge], status/next/ blocked/validate reflect the current fixture state. - docs/screenshots/render.ps1: new capture blocks, cwd-stripped paths on propose/decompose so widths stay sane. - README "In action" section grew from 6 to 11 items in workflow order. Doc sweep (per user-level CLAUDE.md): - README command-reference table flags --llm on propose / sync / validate rows. - SKILL.md script-first table mirrors the same. - references/conventions.md adds ANTHROPIC_API_KEY under Secrets. Dependencies: - @anthropic-ai/sdk ^0.65.0 added to dependencies. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

aks-reviewes

Auto-approved by aks-builds secondary account - PR opened by the sole codeowner.

aks-codeowner-bot

Auto-approved by aks-codeowner-bot - PR opened by the sole codeowner.

aks-builds self-assigned this May 18, 2026

aks-reviewes approved these changes May 18, 2026

View reviewed changes

aks-codeowner-bot Bot approved these changes May 18, 2026

View reviewed changes

aks-builds merged commit 2fb46c3 into main May 18, 2026
3 checks passed

aks-builds deleted the feat/llm-judge-release-pipeline branch May 18, 2026 06:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: LLM-backed BDD judge + automated release pipeline#13

feat: LLM-backed BDD judge + automated release pipeline#13
aks-builds merged 1 commit into
mainfrom
feat/llm-judge-release-pipeline

aks-builds commented May 18, 2026

Uh oh!

aks-reviewes left a comment

Uh oh!

aks-codeowner-bot Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aks-builds commented May 18, 2026

What & why

Type of change

Checklist

Backend coverage

Screenshots / output

Anything reviewers should know

Uh oh!

aks-reviewes left a comment

Choose a reason for hiding this comment

Uh oh!

aks-codeowner-bot Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants