Skip to content

feat(serve): block HITL on dashboard approval; stop caching human verdicts#54

Merged
fafaisland merged 2 commits into
mainfrom
claude/happy-hopper-d86bda
May 12, 2026
Merged

feat(serve): block HITL on dashboard approval; stop caching human verdicts#54
fafaisland merged 2 commits into
mainfrom
claude/happy-hopper-d86bda

Conversation

@fafaisland

@fafaisland fafaisland commented May 12, 2026

Copy link
Copy Markdown
Collaborator

Closes #55.

Summary

In non-calibration mode (serve --ui --port 9090), a fresh HumanInTheLoop verdict from real risk scoring now parks the /check request, surfaces in the dashboard's Approvals queue with full risk context, and resumes with the human's verdict once they click Allow or Deny. Previously the call returned HITL immediately and the agent's action was simply lost — the Approvals tab stayed empty and there was no path for a human to intervene. See #55 for the full problem statement.

Type

  • New pack
  • Pack update
  • Core feature
  • Bug fix
  • Documentation

What changed

crates/permit0-cli/src/cmd/serve.rs:

  • New block_for_advisory_approval parks /check on fresh HITL verdicts (Scorer / AgentReviewer source only) until the human resolves the approval, then resumes with the human's verdict. Times out after the existing 5-minute ApprovalManager window with 408 — same shape as calibration mode.
  • UnknownFallback HITLs are deliberately excluded — those mean "permit0 has no opinion about this tool" and should be fixed with a normalizer or allowlist entry rather than flooding the queue.
  • Both the advisory path and the calibration path cache the human's Allow/Deny in the policy cache (engine.store().policy_cache_set), so subsequent identical calls don't re-prompt. HITL verdicts submitted by the reviewer are not cached, matching the engine's own behavior. Session-aware cache bypass is the right model long-term, but session chains aren't supported yet — revisit then.

Behavior matrix (non-calibration mode, with --ui)

Engine source Verdict Behavior
Scorer / AgentReviewer Allow / Deny Returned immediately. Cached (perf).
Scorer / AgentReviewer HumanInTheLoop Parks /check → Approvals queue → resumes with human verdict. Cached on Allow/Deny.
UnknownFallback HumanInTheLoop Returned immediately (unchanged). Not queued.
PolicyCache / Allowlist / Denylist / prior HumanReviewer any Pass-through.

serve --port 9090 without --ui: no approval manager → pass-through, same as before.

Tradeoff to flag for reviewers

/check now holds the HTTP connection open up to 5 min waiting on a human (matches the existing ApprovalManager::DEFAULT_APPROVAL_TIMEOUT). HTTP clients with shorter default timeouts (most do — 30–60s is common) will give up before the human responds and the call fails on the agent side. Calibration mode has had this property since it shipped, so this is consistent with the existing model. If we want a different default here (60s? configurable per-request?), say so and I'll add it.

Test plan

  • cargo build -p permit0-cli
  • cargo clippy -p permit0-cli --all-targets -- -D warnings
  • cargo fmt --all --check
  • cargo test -p permit0-cli (14 + 6 pass)
  • cargo test -p permit0-ui (77 pass)
  • Manual end-to-end against a live daemon on port 9091:
    1. POST /api/v1/check for gmail_send parks.
    2. GET /api/v1/approvals returns the entry with MEDIUM tier, score 43, entities {to, body, subject, domain, recipient_scope}, flags [GOVERNANCE, OUTBOUND, PRIVILEGE, EXPOSURE, MUTATION].
    3. POST /api/v1/approvals/decide with permission: allow → original /check resumes and returns {"permission":"allow","source":"HumanReviewer"}.
    4. Identical second /check returns in ~9ms with source: PolicyCache — confirms cache write after human approval.
    5. Third /check with a different body parks again — confirms cache keys on the full normalized action.

In non-calibration mode, fresh HumanInTheLoop verdicts from real risk
scoring (Scorer / AgentReviewer) now park the /check request and appear
in the dashboard's Approvals tab. The request resumes with the human's
verdict once they click Allow or Deny, so the agent's call actually
proceeds based on human input instead of just being told "blocked"
and dropped.

Also removes the policy-cache writes from human verdicts (both
calibration and the new advisory flow). Every HITL goes through fresh
approval, including byte-identical repeats — previously, approving
once cached the verdict and silently auto-allowed all future identical
calls. Engine-determined Allow/Deny from the scorer still cache for
performance; only human-mediated decisions stop persisting.

UnknownFallback HITLs are intentionally excluded from the approval
queue — those represent "permit0 has no opinion about this tool" and
should be addressed by adding a normalizer or allowlisting the
norm_hash, not by flooding the queue with every unrecognized call.

Caveat: the daemon now holds /check connections open up to 5 min
(the existing approval timeout) waiting on a human. HTTP clients
with shorter timeouts will give up before the human responds.
@fafaisland fafaisland requested a review from AnissL93 as a code owner May 12, 2026 19:45
Restore policy_cache_set for human-approved verdicts in both the
advisory and calibration paths. Caching keeps the perf win for
repeated identical calls — approve once, future identical sends
don't re-prompt. Session-aware cache bypass is the right model for
when session chains are supported, but they aren't yet, so the
straightforward cache restoration matches current engine semantics.

HumanInTheLoop verdicts submitted by the reviewer are still not
cached (matches the engine's own behavior at engine.rs:303), so a
human can deliberately pass the call through without poisoning the
cache.
@fafaisland fafaisland merged commit 059bd14 into main May 12, 2026
8 checks passed
@fafaisland fafaisland deleted the claude/happy-hopper-d86bda branch May 12, 2026 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] HITL decisions in non-calibration mode have no human review path

1 participant