Skip to content

chore(hooks): render dev-test plugin via the prod generator#2884

Open
mfbx9da4 wants to merge 27 commits into
mainfrom
da/hooks-test-use-prod-generator
Open

chore(hooks): render dev-test plugin via the prod generator#2884
mfbx9da4 wants to merge 27 commits into
mainfrom
da/hooks-test-use-prod-generator

Conversation

@mfbx9da4
Copy link
Copy Markdown
Contributor

Stacked on #2855.

Why

mise run hooks:test shipped a hand-maintained send_hook.sh stub under hooks/plugin-claude-test/ that had drifted from the templated hook.sh that server/internal/plugins.GenerateObservabilityPluginPackage emits for real publishes. Two concrete consequences:

  • The stub didn't send Gram-Key / Gram-Project headers, so even with --local the hook arrived unauthenticated. With OTEL session metadata not yet in Redis on the first prompt, resolveClaudeScanProjectID returned ok=false and PreToolUse enforcement silently no-op'd, so the local harness couldn't actually exercise blocking policies on the first tool call of a session.
  • The stub's behavior diverged from prod (no auth header injection, no codex-style empty-stdout handling, no key-prefix audit comment), so anything tested locally wasn't representative of what real users get.

The default (non---local) mode was also broken in practice: it picks up the published plugin from ~/.claude/plugins/cache/..., which bakes a gram_live_ key tied to whatever environment first published the plugin. Locally, that key 401s.

What changed

Before: two parallel implementations of send_hook.sh. One in the repo (stub, drifting), one generated by the server (canonical). Tests used the stub. Prod used the generator.

After: one implementation. test.sh provisions a hooks-scoped api key in the local DB, then calls a new server/cmd/dev-observability-plugin helper that wraps plugins.GenerateObservabilityPluginPackage to render the plugin into hooks/.local/plugin-claude/. Claude is launched with --plugin-dir against that directory. The bytes Claude executes are now identical to what prod ships, just with a locally-issued key.

Other changes:

  • Hardened test.sh: missing project / missing users are errors instead of warnings (the rest of the script was already assuming the key got provisioned).
  • Removed hooks/plugin-claude-test/ (the stale stub).
  • Kept --local as a deprecated no-op so existing muscle memory still works.
  • Gitignored hooks/.local/ since it's regenerated on every run.

hooks/plugin-claude/ and .mise-tasks/hooks/zip.sh are intentionally left alone. The zip output isn't referenced anywhere in the server or build pipeline, so that's a separate cleanup.

Test plan

  • mise run hooks:test --project default provisions a key, renders the plugin, launches Claude
  • Inspect hooks/.local/plugin-claude/hooks/hook.sh and confirm it has the locally-issued gram_local_... key + Gram-Project: default headers
  • Trigger a UserPromptSubmit with a string matching a blocking risk policy and confirm the hook returns 403 / the prompt is blocked
  • mise run hooks:test --project default --local still works (no-op flag accepted)

mfbx9da4 and others added 27 commits May 15, 2026 10:55
Every scanner now emits kebab-case rule ids and human-readable
descriptions that never echo the matched value. Adds a central rule
catalog, plumbs RuleContext through every Finding writer, and exposes a
structured DenyReason from the shadowmcp validator. Frontend resolves
unknown kebab rule ids by title-casing them so new backend rules render
legibly without a dashboard release.
The detection mechanism (missing toolset id, unknown toolset, ...) is
implementation detail, not the risk itself. The actionable risk is the
same in every case: an unverified MCP tool call. Revert the validator
DenyReason addition and the four-way rule split; emit a single
'shadow-mcp' rule_id whose description still names the tool.
Group rule ids by risk category via a dotted prefix so consumers can
bucket findings without switching on `source`:

  secret.<rule>         gitleaks
  pii.<rule>            presidio
  shadow-mcp            shadow_mcp (unchanged)
  destructive.tool      destructive_tool
  destructive.cli-<cmd> cli_destructive
  pi                    ML classifier prompt injection
  pi.<heuristic>        L0 heuristic prompt injection

Collapses the two role-hijack subrules into a single `pi.role-hijack`
and drops the `deberta-v3-classifier` model name from the rule id
(model is implementation detail). cli_destructive drops the implicit
`shell`/`cloud` category prefixes; `git`/`database` stay because they
convey real grouping.

Adds CanonicalGitleaksRuleID, CanonicalPresidioRuleID, and
CanonicalCLIDestructiveRuleID helpers; the frontend canonicalizer
mirrors them.
Drop the cliDestructiveCanonical indirection table in rules.go. Update
cliDestructivePattern.FullName() to emit `destructive.<category>.<name>`
directly (e.g. `destructive.shell.rm-rf`, `destructive.git.push-force`)
so the writer can use it verbatim as the canonical rule id. The
shell/git/database/cloud groupings now form a real second layer under
`destructive.`, peers of `destructive.tool`.
The standalone L1 classifier rule was `pi`; heuristic sub-rules used
`pi.<heuristic>`. Rename the whole family to `prompt-injection` and
`prompt-injection.<heuristic>` so the prefix is self-describing and
matches the source name. Consumers pattern-matching on the prefix get
both the L1 verdict and the L0 heuristics under one bucket.
DETECTION_RULES now stores rule ids in the canonical form findings carry
(secret./pii./prompt-injection.unknown). PolicyCenter.tsx translates
canonical -> Presidio's UPPER_SNAKE wire format at the payload boundary
via the new ruleIdToPresidioEntity helper. prompt_injection_rules ships
the canonical rule_id verbatim; the backend's L1 opt-in key is the same
string as the resulting finding rule_id. The deberta classifier verdict
lands at prompt-injection.unknown — the binary model can't identify a
specific attack family, so it sits alongside the L0 prompt-injection.*
heuristics rather than at the standalone prompt-injection prefix.
Add ValidateRuleID + a kebab-with-optional-dots grammar regex. Normalize
panics on a malformed canonical id when the binary is a test binary
(testing.Testing()) or after cmd/gram wires
EnableRuleIDFormatEnforcement() on for the local environment. Production
runs leave the value through, so a future writer drift doesn't take
down scanning.

Catches writer regressions at the boundary instead of letting them
contaminate risk_results, which keeps the public rule_id contract
honest.
…eature flag

Drop all sub-rules under the prompt-injection prefix; every prompt-injection
finding now carries rule_id = "prompt-injection" regardless of whether it
was produced by the L1 deberta classifier or an L0 heuristic regex/keyword
pattern. The detection engine is an implementation detail, not part of the
public contract.

Engine selection moves from per-policy opt-in to per-org feature flag:

  FlagPromptInjectionUseRegex (default off) → L1 classifier (deberta)
  FlagPromptInjectionUseRegex on            → L0 heuristics (regex)

StubClassifier (no --pi-classifier-url) falls back to L0 unconditionally so
local-dev still produces findings.

ScanBatch/Scan now take orgID instead of a rules slice; the
risk_policies.prompt_injection_rules field becomes vestigial (left on
schema for backward compatibility; future PR can drop it).

Frontend: DETECTION_RULES.prompt_injection is empty — the prompt-injection
category is a single toggle. PolicyCenter no longer emits per-rule
selection; categoriesToPayload signature is simplified.
…t-in

Rename FlagPromptInjectionUseRegex -> FlagPromptInjectionUseClassifier and
invert the default. Out of the box every org gets the L0 regex engine; an
org opts in to the deberta classifier by flipping the feature flag.

Reason: the classifier requires a sidecar deployment that not every
environment runs. Defaulting to the always-available regex keeps the
out-of-the-box experience working and turns the classifier into a
deliberate uplift rather than an implicit dependency.
guardRuleIDs sweeps every InsertRiskResultsParams before the INSERT and
runs ValidateRuleID on its rule_id. In dev/test mode an offending row
panics so writer drift fails CI immediately; in production the row is
dropped (and logged with risk.source / risk.rule_id) so a single
misbehaving scanner can't break the whole batch.

This is the second layer of defense, stacked behind the existing
Normalize check at the canonical-id construction site. A DB CHECK
constraint will follow in the migration PR for the third (always-on)
layer.
… custom message on block

PolicyCenter:
- Drop the standalone deberta-v3-classifier rule from the prompt_injection
  category UI. The engine (regex vs deberta) is an org-level concern flipped
  via the prompt-injection-use-classifier feature flag, not something the
  policy author picks. The category becomes a single on/off checkbox.
- Drop formPromptInjectionRules / setFormPromptInjectionRules state, the
  useRiskCapabilities + piClassifierEnabled wiring, and the special-case
  rendering branch that allowed per-rule selection under prompt_injection.
- Render the Custom Message field only when action='block'. Flag-action
  policies record findings silently and never surface a user-facing reason,
  so the field is irrelevant there.
canonicalizeRuleId was a stop-gap from when DETECTION_RULES.id stored
wire-format ids (UPPER_SNAKE Presidio entities, raw gitleaks names).
After the earlier rewrite, DETECTION_RULES.id is already canonical end
to end, so every call site was a no-op. Remove the function and the
import; SecurityOverview's lookup maps now key on rule.id directly.
ruleIdToPresidioEntity and humanizeRuleId stay — they handle the only
remaining transforms (canonical -> Presidio wire format, and unknown-id
display fallback).

Also drop the prompt_attacks category from the policy form: it was a
placeholder that did not map to a real backend source, and its entries
were never actionable.
Unknown findings fall through to humanizeRuleId for their display label.
Reading 'Pii Credit Card' or 'Destructive Cli Rm Rf' is jarring; this
adds a small acronym set (pii, cli, mcp, api, ssn, nhs, iban, ...) that
gets upper-cased instead of title-cased.

Examples:
  pii.credit-card  -> 'PII Credit Card'
  pii.us-ssn       -> 'PII US SSN'
  pii.uk-nhs       -> 'PII UK NHS'
  destructive.cli-rm-rf -> 'Destructive CLI Rm Rf'
…cy ids

Risk_results rows written before this PR still carry legacy formats
(MEDICAL_LICENSE, shadow_mcp.unverified_call, cli_destructive.shell/rm-rf,
pi.role-hijack.you-are-now, ...). Until the data migration ships, the
dashboard renders them via humanizeRuleId. Splitting on every separator
(., -, _, /) means legacy ids read legibly during the overlap window
instead of as Frankenstein 'Cli_destructive Shell/rm Rf' strings.
…ibe builders

RuleContext was a bag of optional fields (ToolName, MatchedPattern)
forwarded through a string-keyed catalog to a single Normalize entry
point. Every call site passed exhaustruct-zero values for the fields
it didn't care about, and adding a new scanner with new context would
force the shared struct to grow — open/closed violation.

Replace with one typed function per emit shape:

  DescribeShadowMCP(toolName)              -> (ruleID, description)
  DescribeDestructiveTool(toolName)        -> ...
  DescribeCLIDestructive(pattern, toolName)-> ...
  DescribePresidioEntity(rawEntityType)    -> ...
  DescribePresidioDeadLetter()             -> ...
  DescribeGitleaks(raw, upstreamDesc)      -> ...
  DescribePromptInjection()                -> ...

Each builder accepts exactly the inputs it needs. A new scanner adds a
new function instead of extending a shared type.

guard() (kept in rules.go) still validates every canonical id against
the grammar in dev/test, so the rule_id format guarantee is unchanged.
ValidateRuleID, guardRuleIDs in writeResults, and the
EnableRuleIDFormatEnforcement opt-in all stay.
…ult-rule-ids

# Conflicts:
#	.speakeasy/workflow.lock
#	client/sdk/.speakeasy/gen.lock
#	server/gen/http/openapi3.json
… dev/test-only

Two follow-ups on the merge:

1. Reverted the verbose risk_results attribute descriptions I added to
   server/design/shared/risk.go. They were internal commentary ("never
   leaks internal validator detail", "do not surface unredacted in
   public contracts") that should not appear in the public OpenAPI /
   SDK docs. Restored the original terse customer-facing strings; Goa
   + TypeScript SDK regenerated.

2. Fixed a production correctness bug in AnalyzeBatch.guardRuleIDs:
   the previous version dropped any row with a malformed rule_id in
   prod. Because writeResults first DELETEs all rows for the message
   batch and the unanalyzed-message query treats 'no risk_results row'
   as the signal to re-scan, a malformed rule_id caused the same
   message to be re-scanned every batch. guardRuleIDs is now dev/test-
   only: in those modes a malformed id panics; in production the
   function is a no-op pass-through, so analyzed-state semantics are
   preserved.
The previous refactor turned the engines into a mutually-exclusive
either/or — flag off → L0 only, flag on → L1 only (with L0 as a fallback
on classifier error). That was a regression: pre-refactor behavior was
L0 always runs, L1 is appended when opted in. The only thing the
feature flag change was supposed to do was move the L1 opt-in from
policy config to an org-level feature flag.

Restore the additive model:

- PromptInjectionScanner.Scan / ScanBatch always run runHeuristics(text).
- When FlagPromptInjectionUseClassifier is on for the org AND the
  classifier is non-stub, also run the classifier and append any L1
  finding to the L0 set.
- Classifier errors leave L0 findings intact (no swap to "fallback").
- Stub classifier is treated as "no L1 deployed" regardless of flag.

Rename the gate to classifierEnabled() to match the new semantics.
Tests rewritten to cover L0-only, L0+L1 (both fire), L0+L1 (only L1
fires on benign text), SAFE label, classifier error, and stub
classifier paths.
rules.go keeps the central rule_id surface: grammar guard, canonical
rule_id constants, and the per-source canonicalizers
(CanonicalGitleaksRuleID, CanonicalPresidioRuleID). The description
sentences move next to the scanner that owns them:

- gitleaks.go      → DescribeGitleaks
- presidio.go      → DescribePresidioEntity, DescribePresidioDeadLetter,
                     presidioEntityDescriptions
- pi_scanner.go    → DescribePromptInjection
- cli_destructive.go → DescribeCLIDestructive, cliCommandHumanForm
- analyze_batch.go → DescribeShadowMCP, DescribeDestructiveTool
  (their writers live in this file too)

Net: rule_id contract surface stays in one place; presentation logic
lives with the source that produces it.
The RiskBadgePopover badge was rendering the source (`presidio`,
`gitleaks`, ...) which leaks the implementation detail policy authors
never think in. Switch to the category prefix of the rule id:

  pii.email-address    → PII
  secret.anthropic-api-key → SECRET
  destructive.shell.rm-rf → DESTRUCTIVE
  shadow-mcp           → SHADOW-MCP
  prompt-injection     → PROMPT-INJECTION

Adds a small ruleIdCategoryLabel(ruleId) helper in rule-ids.ts that
takes the first dot-delimited segment (or the whole id when undotted)
and uppercases it. Falls back to the uppercased source if no rule_id.
Presidio aggressively detects `::` (and the all-zero variants `::0`,
`0::0`, `0:0:0:0:0:0:0:0`) as an IP_ADDRESS. It appears in code,
networking configs, and stack traces but is never a meaningful PII
finding, so it just noises up the security overview.

Add a narrow isPresidioFalsePositive filter applied in
convertPresidioFindings that drops these matches before they become a
Finding. Real addresses (127.0.0.1, 2001:db8::1, ::1 loopback) still
flow through.
This change adds a new `AppendBatch` function that can bulk insert
outbox entries within a transaction.
The local hooks:test harness used a hand-maintained send_hook.sh stub
under hooks/plugin-claude-test/ that had drifted from the templated
hook.sh that server/internal/plugins.GenerateObservabilityPluginPackage
emits for real publishes. The stub had no Gram-Key/Gram-Project
headers, so even with --local the hook arrived unauthenticated and
PreToolUse enforcement silently no-op'd before OTEL session metadata
landed in Redis.

Add a tiny server/cmd/dev-observability-plugin entrypoint that wraps
GenerateObservabilityPluginPackage, and have test.sh provision a
hooks-scoped api key, render the plugin into hooks/.local/plugin-claude/
on the fly, then exec claude --plugin-dir against it. The test path
now exercises the exact bytes prod ships, just with a locally-issued
key wired through.

Delete the stale stub dir, ignore the generated output dir, and keep
--local as a deprecated no-op so existing muscle memory still works.
@mfbx9da4 mfbx9da4 requested a review from a team as a code owner May 16, 2026 15:11
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@vercel
Copy link
Copy Markdown

vercel Bot commented May 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
gram-docs-redirect Ready Ready Preview, Comment May 16, 2026 3:11pm

Request Review

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 16, 2026

⚠️ No Changeset found

Latest commit: dd52f1b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Base automatically changed from da/normalize-risk-result-rule-ids to main May 18, 2026 12:47
@mfbx9da4 mfbx9da4 requested a review from a team as a code owner May 18, 2026 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants