feat: LLM-backed BDD judge + automated release pipeline#13
Merged
Conversation
bdd-llm-reviewer (the first v2 change to ship as code): - cli/src/bdd/judge.js: opt-in LLM judge using @anthropic-ai/sdk with Claude Haiku 4.5. Tool_use for structured output; cache_control: ephemeral on the proposal system block so re-runs across multiple specs in one feature reuse the cache. Defensive parsing: malformed responses degrade to a bdd/llm-parse-error warning, never throw. Emits three new rule ids: bdd/llm-contradiction (cross-spec contra- dictions), bdd/llm-missing-coverage (success criteria with no scenario), bdd/llm-vague-then (observable verb but no checkable outcome). - --llm flag wired into propose, sync, validate. judge.enabled config in .openspecpm/config.json also activates it. Failures in propose are soft; failures in sync hard-gate unless --force; failures in validate degrade into findings. - doctor always shows a [judge] section reporting ANTHROPIC_API_KEY with a remediation URL when unset. - cli/src/audit.js: record() now accepts an optional meta field; judge usage (model, input_tokens, output_tokens, cache_creation / cache_read input tokens) is logged per LLM call so cache hit rate is auditable from .openspecpm/audit.log. - audit scrubber regex tightened: segment-based check instead of substring, so input_tokens / cache_read_input_tokens no longer false-positive the secret redactor. api_token / JIRA_PASSWORD still scrub. - cli/tests/judge.test.js: 8 new tests with a stub Anthropic client (no network). cli/tests/audit.test.js: 1 new test for the meta field. 91 -> 100 tests, all green. Automated release pipeline (PR-based, two-workflow): - .github/workflows/release.yml: workflow_dispatch bumps version on a release/vX.Y.Z branch, rolls CHANGELOG, opens a PR, enables squash auto-merge. No direct push to main. - .github/workflows/publish.yml: triggers on release/* PR merge. Reads version from package.json, publishes to npm with sigstore provenance, syncs latest dist-tag for pre-1.0, tags the merge commit, creates the GitHub release with notes from CHANGELOG. - .github/workflows/auto-approve.yml: header comment updated to document the new APPROVER_PAT secret (reusable workflow extended in aks-builds/workflows@b5021d9 to support a secondary-account PAT approver alongside the existing GitHub App path). - CONTRIBUTING.md: Releasing section rewritten for the new flow, approver-secret matrix documented. Test count corrected (91 -> 100). - CHANGELOG.md: [Unreleased] block describes both the judge feature and the release pipeline. Screenshots overhaul: - 4 new captures (propose, decompose, fan-out, search) + 1 curated synthetic capture for the LLM judge (real run requires a network call, can't be regenerated deterministically). - 5 regenerated captures: doctor now shows [judge], status/next/ blocked/validate reflect the current fixture state. - docs/screenshots/render.ps1: new capture blocks, cwd-stripped paths on propose/decompose so widths stay sane. - README "In action" section grew from 6 to 11 items in workflow order. Doc sweep (per user-level CLAUDE.md): - README command-reference table flags --llm on propose / sync / validate rows. - SKILL.md script-first table mirrors the same. - references/conventions.md adds ANTHROPIC_API_KEY under Secrets. Dependencies: - @anthropic-ai/sdk ^0.65.0 added to dependencies. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
aks-reviewes
approved these changes
May 18, 2026
Collaborator
aks-reviewes
left a comment
There was a problem hiding this comment.
Auto-approved by aks-builds secondary account - PR opened by the sole codeowner.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
bdd-llm-reviewer (the first v2 change to ship as code):
Automated release pipeline (PR-based, two-workflow):
Screenshots overhaul:
Doc sweep (per user-level CLAUDE.md):
Dependencies:
What & why
Type of change
feat— new behaviorfix— bug fixrefactor— no behavior changedocs— docs / comments / READMEchore— tooling / CI / depstest— tests onlyChecklist
npm testpasses locallyCHANGELOG.mdupdated under[Unreleased]skill/openspecpm/references/sync.mdcli/src/bdd/linter.jsheuristicsBackend coverage
Screenshots / output
Anything reviewers should know