chore(hooks): render dev-test plugin via the prod generator by mfbx9da4 · Pull Request #2884 · speakeasy-api/gram

mfbx9da4 · 2026-05-16T15:11:35Z

Stacked on #2855.

Why

mise run hooks:test shipped a hand-maintained send_hook.sh stub under hooks/plugin-claude-test/ that had drifted from the templated hook.sh that server/internal/plugins.GenerateObservabilityPluginPackage emits for real publishes. Two concrete consequences:

The stub didn't send Gram-Key / Gram-Project headers, so even with --local the hook arrived unauthenticated. With OTEL session metadata not yet in Redis on the first prompt, resolveClaudeScanProjectID returned ok=false and PreToolUse enforcement silently no-op'd, so the local harness couldn't actually exercise blocking policies on the first tool call of a session.
The stub's behavior diverged from prod (no auth header injection, no codex-style empty-stdout handling, no key-prefix audit comment), so anything tested locally wasn't representative of what real users get.

The default (non---local) mode was also broken in practice: it picks up the published plugin from ~/.claude/plugins/cache/..., which bakes a gram_live_ key tied to whatever environment first published the plugin. Locally, that key 401s.

What changed

Before: two parallel implementations of send_hook.sh. One in the repo (stub, drifting), one generated by the server (canonical). Tests used the stub. Prod used the generator.

After: one implementation. test.sh provisions a hooks-scoped api key in the local DB, then calls a new server/cmd/dev-observability-plugin helper that wraps plugins.GenerateObservabilityPluginPackage to render the plugin into hooks/.local/plugin-claude/. Claude is launched with --plugin-dir against that directory. The bytes Claude executes are now identical to what prod ships, just with a locally-issued key.

Other changes:

Hardened test.sh: missing project / missing users are errors instead of warnings (the rest of the script was already assuming the key got provisioned).
Removed hooks/plugin-claude-test/ (the stale stub).
Kept --local as a deprecated no-op so existing muscle memory still works.
Gitignored hooks/.local/ since it's regenerated on every run.

hooks/plugin-claude/ and .mise-tasks/hooks/zip.sh are intentionally left alone. The zip output isn't referenced anywhere in the server or build pipeline, so that's a separate cleanup.

Test plan

mise run hooks:test --project default provisions a key, renders the plugin, launches Claude
Inspect hooks/.local/plugin-claude/hooks/hook.sh and confirm it has the locally-issued gram_local_... key + Gram-Project: default headers
Trigger a UserPromptSubmit with a string matching a blocking risk policy and confirm the hook returns 403 / the prompt is blocked
mise run hooks:test --project default --local still works (no-op flag accepted)

Every scanner now emits kebab-case rule ids and human-readable descriptions that never echo the matched value. Adds a central rule catalog, plumbs RuleContext through every Finding writer, and exposes a structured DenyReason from the shadowmcp validator. Frontend resolves unknown kebab rule ids by title-casing them so new backend rules render legibly without a dashboard release.

The detection mechanism (missing toolset id, unknown toolset, ...) is implementation detail, not the risk itself. The actionable risk is the same in every case: an unverified MCP tool call. Revert the validator DenyReason addition and the four-way rule split; emit a single 'shadow-mcp' rule_id whose description still names the tool.

Group rule ids by risk category via a dotted prefix so consumers can bucket findings without switching on `source`: secret.<rule> gitleaks pii.<rule> presidio shadow-mcp shadow_mcp (unchanged) destructive.tool destructive_tool destructive.cli-<cmd> cli_destructive pi ML classifier prompt injection pi.<heuristic> L0 heuristic prompt injection Collapses the two role-hijack subrules into a single `pi.role-hijack` and drops the `deberta-v3-classifier` model name from the rule id (model is implementation detail). cli_destructive drops the implicit `shell`/`cloud` category prefixes; `git`/`database` stay because they convey real grouping. Adds CanonicalGitleaksRuleID, CanonicalPresidioRuleID, and CanonicalCLIDestructiveRuleID helpers; the frontend canonicalizer mirrors them.

Drop the cliDestructiveCanonical indirection table in rules.go. Update cliDestructivePattern.FullName() to emit `destructive.<category>.<name>` directly (e.g. `destructive.shell.rm-rf`, `destructive.git.push-force`) so the writer can use it verbatim as the canonical rule id. The shell/git/database/cloud groupings now form a real second layer under `destructive.`, peers of `destructive.tool`.

The standalone L1 classifier rule was `pi`; heuristic sub-rules used `pi.<heuristic>`. Rename the whole family to `prompt-injection` and `prompt-injection.<heuristic>` so the prefix is self-describing and matches the source name. Consumers pattern-matching on the prefix get both the L1 verdict and the L0 heuristics under one bucket.

DETECTION_RULES now stores rule ids in the canonical form findings carry (secret./pii./prompt-injection.unknown). PolicyCenter.tsx translates canonical -> Presidio's UPPER_SNAKE wire format at the payload boundary via the new ruleIdToPresidioEntity helper. prompt_injection_rules ships the canonical rule_id verbatim; the backend's L1 opt-in key is the same string as the resulting finding rule_id. The deberta classifier verdict lands at prompt-injection.unknown — the binary model can't identify a specific attack family, so it sits alongside the L0 prompt-injection.* heuristics rather than at the standalone prompt-injection prefix.

Add ValidateRuleID + a kebab-with-optional-dots grammar regex. Normalize panics on a malformed canonical id when the binary is a test binary (testing.Testing()) or after cmd/gram wires EnableRuleIDFormatEnforcement() on for the local environment. Production runs leave the value through, so a future writer drift doesn't take down scanning. Catches writer regressions at the boundary instead of letting them contaminate risk_results, which keeps the public rule_id contract honest.

…eature flag Drop all sub-rules under the prompt-injection prefix; every prompt-injection finding now carries rule_id = "prompt-injection" regardless of whether it was produced by the L1 deberta classifier or an L0 heuristic regex/keyword pattern. The detection engine is an implementation detail, not part of the public contract. Engine selection moves from per-policy opt-in to per-org feature flag: FlagPromptInjectionUseRegex (default off) → L1 classifier (deberta) FlagPromptInjectionUseRegex on → L0 heuristics (regex) StubClassifier (no --pi-classifier-url) falls back to L0 unconditionally so local-dev still produces findings. ScanBatch/Scan now take orgID instead of a rules slice; the risk_policies.prompt_injection_rules field becomes vestigial (left on schema for backward compatibility; future PR can drop it). Frontend: DETECTION_RULES.prompt_injection is empty — the prompt-injection category is a single toggle. PolicyCenter no longer emits per-rule selection; categoriesToPayload signature is simplified.

…t-in Rename FlagPromptInjectionUseRegex -> FlagPromptInjectionUseClassifier and invert the default. Out of the box every org gets the L0 regex engine; an org opts in to the deberta classifier by flipping the feature flag. Reason: the classifier requires a sidecar deployment that not every environment runs. Defaulting to the always-available regex keeps the out-of-the-box experience working and turns the classifier into a deliberate uplift rather than an implicit dependency.

guardRuleIDs sweeps every InsertRiskResultsParams before the INSERT and runs ValidateRuleID on its rule_id. In dev/test mode an offending row panics so writer drift fails CI immediately; in production the row is dropped (and logged with risk.source / risk.rule_id) so a single misbehaving scanner can't break the whole batch. This is the second layer of defense, stacked behind the existing Normalize check at the canonical-id construction site. A DB CHECK constraint will follow in the migration PR for the third (always-on) layer.

… custom message on block PolicyCenter: - Drop the standalone deberta-v3-classifier rule from the prompt_injection category UI. The engine (regex vs deberta) is an org-level concern flipped via the prompt-injection-use-classifier feature flag, not something the policy author picks. The category becomes a single on/off checkbox. - Drop formPromptInjectionRules / setFormPromptInjectionRules state, the useRiskCapabilities + piClassifierEnabled wiring, and the special-case rendering branch that allowed per-rule selection under prompt_injection. - Render the Custom Message field only when action='block'. Flag-action policies record findings silently and never surface a user-facing reason, so the field is irrelevant there.

canonicalizeRuleId was a stop-gap from when DETECTION_RULES.id stored wire-format ids (UPPER_SNAKE Presidio entities, raw gitleaks names). After the earlier rewrite, DETECTION_RULES.id is already canonical end to end, so every call site was a no-op. Remove the function and the import; SecurityOverview's lookup maps now key on rule.id directly. ruleIdToPresidioEntity and humanizeRuleId stay — they handle the only remaining transforms (canonical -> Presidio wire format, and unknown-id display fallback). Also drop the prompt_attacks category from the policy form: it was a placeholder that did not map to a real backend source, and its entries were never actionable.

Unknown findings fall through to humanizeRuleId for their display label. Reading 'Pii Credit Card' or 'Destructive Cli Rm Rf' is jarring; this adds a small acronym set (pii, cli, mcp, api, ssn, nhs, iban, ...) that gets upper-cased instead of title-cased. Examples: pii.credit-card -> 'PII Credit Card' pii.us-ssn -> 'PII US SSN' pii.uk-nhs -> 'PII UK NHS' destructive.cli-rm-rf -> 'Destructive CLI Rm Rf'

…cy ids Risk_results rows written before this PR still carry legacy formats (MEDICAL_LICENSE, shadow_mcp.unverified_call, cli_destructive.shell/rm-rf, pi.role-hijack.you-are-now, ...). Until the data migration ships, the dashboard renders them via humanizeRuleId. Splitting on every separator (., -, _, /) means legacy ids read legibly during the overlap window instead of as Frankenstein 'Cli_destructive Shell/rm Rf' strings.

…ibe builders RuleContext was a bag of optional fields (ToolName, MatchedPattern) forwarded through a string-keyed catalog to a single Normalize entry point. Every call site passed exhaustruct-zero values for the fields it didn't care about, and adding a new scanner with new context would force the shared struct to grow — open/closed violation. Replace with one typed function per emit shape: DescribeShadowMCP(toolName) -> (ruleID, description) DescribeDestructiveTool(toolName) -> ... DescribeCLIDestructive(pattern, toolName)-> ... DescribePresidioEntity(rawEntityType) -> ... DescribePresidioDeadLetter() -> ... DescribeGitleaks(raw, upstreamDesc) -> ... DescribePromptInjection() -> ... Each builder accepts exactly the inputs it needs. A new scanner adds a new function instead of extending a shared type. guard() (kept in rules.go) still validates every canonical id against the grammar in dev/test, so the rule_id format guarantee is unchanged. ValidateRuleID, guardRuleIDs in writeResults, and the EnableRuleIDFormatEnforcement opt-in all stay.

…ult-rule-ids # Conflicts: # .speakeasy/workflow.lock # client/sdk/.speakeasy/gen.lock # server/gen/http/openapi3.json

… dev/test-only Two follow-ups on the merge: 1. Reverted the verbose risk_results attribute descriptions I added to server/design/shared/risk.go. They were internal commentary ("never leaks internal validator detail", "do not surface unredacted in public contracts") that should not appear in the public OpenAPI / SDK docs. Restored the original terse customer-facing strings; Goa + TypeScript SDK regenerated. 2. Fixed a production correctness bug in AnalyzeBatch.guardRuleIDs: the previous version dropped any row with a malformed rule_id in prod. Because writeResults first DELETEs all rows for the message batch and the unanalyzed-message query treats 'no risk_results row' as the signal to re-scan, a malformed rule_id caused the same message to be re-scanned every batch. guardRuleIDs is now dev/test- only: in those modes a malformed id panics; in production the function is a no-op pass-through, so analyzed-state semantics are preserved.

The previous refactor turned the engines into a mutually-exclusive either/or — flag off → L0 only, flag on → L1 only (with L0 as a fallback on classifier error). That was a regression: pre-refactor behavior was L0 always runs, L1 is appended when opted in. The only thing the feature flag change was supposed to do was move the L1 opt-in from policy config to an org-level feature flag. Restore the additive model: - PromptInjectionScanner.Scan / ScanBatch always run runHeuristics(text). - When FlagPromptInjectionUseClassifier is on for the org AND the classifier is non-stub, also run the classifier and append any L1 finding to the L0 set. - Classifier errors leave L0 findings intact (no swap to "fallback"). - Stub classifier is treated as "no L1 deployed" regardless of flag. Rename the gate to classifierEnabled() to match the new semantics. Tests rewritten to cover L0-only, L0+L1 (both fire), L0+L1 (only L1 fires on benign text), SAFE label, classifier error, and stub classifier paths.

rules.go keeps the central rule_id surface: grammar guard, canonical rule_id constants, and the per-source canonicalizers (CanonicalGitleaksRuleID, CanonicalPresidioRuleID). The description sentences move next to the scanner that owns them: - gitleaks.go → DescribeGitleaks - presidio.go → DescribePresidioEntity, DescribePresidioDeadLetter, presidioEntityDescriptions - pi_scanner.go → DescribePromptInjection - cli_destructive.go → DescribeCLIDestructive, cliCommandHumanForm - analyze_batch.go → DescribeShadowMCP, DescribeDestructiveTool (their writers live in this file too) Net: rule_id contract surface stays in one place; presentation logic lives with the source that produces it.

The RiskBadgePopover badge was rendering the source (`presidio`, `gitleaks`, ...) which leaks the implementation detail policy authors never think in. Switch to the category prefix of the rule id: pii.email-address → PII secret.anthropic-api-key → SECRET destructive.shell.rm-rf → DESTRUCTIVE shadow-mcp → SHADOW-MCP prompt-injection → PROMPT-INJECTION Adds a small ruleIdCategoryLabel(ruleId) helper in rule-ids.ts that takes the first dot-delimited segment (or the whole id when undotted) and uppercases it. Falls back to the uppercased source if no rule_id.

Presidio aggressively detects `::` (and the all-zero variants `::0`, `0::0`, `0:0:0:0:0:0:0:0`) as an IP_ADDRESS. It appears in code, networking configs, and stack traces but is never a meaningful PII finding, so it just noises up the security overview. Add a narrow isPresidioFalsePositive filter applied in convertPresidioFindings that drops these matches before they become a Finding. Real addresses (127.0.0.1, 2001:db8::1, ::1 loopback) still flow through.

This change adds a new `AppendBatch` function that can bulk insert outbox entries within a transaction.

The local hooks:test harness used a hand-maintained send_hook.sh stub under hooks/plugin-claude-test/ that had drifted from the templated hook.sh that server/internal/plugins.GenerateObservabilityPluginPackage emits for real publishes. The stub had no Gram-Key/Gram-Project headers, so even with --local the hook arrived unauthenticated and PreToolUse enforcement silently no-op'd before OTEL session metadata landed in Redis. Add a tiny server/cmd/dev-observability-plugin entrypoint that wraps GenerateObservabilityPluginPackage, and have test.sh provision a hooks-scoped api key, render the plugin into hooks/.local/plugin-claude/ on the fly, then exec claude --plugin-dir against it. The test path now exercises the exact bytes prod ships, just with a locally-issued key wired through. Delete the stale stub dir, ignore the generated output dir, and keep --local as a deprecated no-op so existing muscle memory still works.

claude

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

_{Tip: disable this comment in your organization's Code Review settings.}

vercel · 2026-05-16T15:11:41Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
gram-docs-redirect	Ready	Preview, Comment	May 16, 2026 3:11pm

changeset-bot · 2026-05-16T15:11:41Z

⚠️ No Changeset found

Latest commit: dd52f1b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

mfbx9da4 and others added 27 commits May 15, 2026 10:55

chore(sdk): regenerate SDK with normalized RiskResult docs

8290274

Merge remote-tracking branch 'origin/main' into da/normalize-risk-res…

7e2e9ab

…ult-rule-ids # Conflicts: # .speakeasy/workflow.lock # client/sdk/.speakeasy/gen.lock # server/gen/http/openapi3.json

chore: add changeset for risk_result rule_id normalization

68f1c0d

chore(changeset): rewrite for API-consumer audience

3f8a3e8

chore: add outbox.AppendBatch for adding multiple outbox entries (#2870)

70a5717

This change adds a new `AppendBatch` function that can bulk insert outbox entries within a transaction.

chore: write risk result findings to outbox

2e7aa9c

mfbx9da4 requested a review from a team as a code owner May 16, 2026 15:11

claude Bot reviewed May 16, 2026

View reviewed changes

Base automatically changed from da/normalize-risk-result-rule-ids to main May 18, 2026 12:47

mfbx9da4 requested a review from a team as a code owner May 18, 2026 12:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(hooks): render dev-test plugin via the prod generator#2884

chore(hooks): render dev-test plugin via the prod generator#2884
mfbx9da4 wants to merge 27 commits into
mainfrom
da/hooks-test-use-prod-generator

mfbx9da4 commented May 16, 2026

Uh oh!

claude Bot left a comment

Uh oh!

vercel Bot commented May 16, 2026

Uh oh!

changeset-bot Bot commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mfbx9da4 commented May 16, 2026

Why

What changed

Test plan

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

vercel Bot commented May 16, 2026

Uh oh!

changeset-bot Bot commented May 16, 2026

⚠️ No Changeset found

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants