feat(serve): block HITL on dashboard approval; stop caching human verdicts#54
Merged
Conversation
In non-calibration mode, fresh HumanInTheLoop verdicts from real risk scoring (Scorer / AgentReviewer) now park the /check request and appear in the dashboard's Approvals tab. The request resumes with the human's verdict once they click Allow or Deny, so the agent's call actually proceeds based on human input instead of just being told "blocked" and dropped. Also removes the policy-cache writes from human verdicts (both calibration and the new advisory flow). Every HITL goes through fresh approval, including byte-identical repeats — previously, approving once cached the verdict and silently auto-allowed all future identical calls. Engine-determined Allow/Deny from the scorer still cache for performance; only human-mediated decisions stop persisting. UnknownFallback HITLs are intentionally excluded from the approval queue — those represent "permit0 has no opinion about this tool" and should be addressed by adding a normalizer or allowlisting the norm_hash, not by flooding the queue with every unrecognized call. Caveat: the daemon now holds /check connections open up to 5 min (the existing approval timeout) waiting on a human. HTTP clients with shorter timeouts will give up before the human responds.
Restore policy_cache_set for human-approved verdicts in both the advisory and calibration paths. Caching keeps the perf win for repeated identical calls — approve once, future identical sends don't re-prompt. Session-aware cache bypass is the right model for when session chains are supported, but they aren't yet, so the straightforward cache restoration matches current engine semantics. HumanInTheLoop verdicts submitted by the reviewer are still not cached (matches the engine's own behavior at engine.rs:303), so a human can deliberately pass the call through without poisoning the cache.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #55.
Summary
In non-calibration mode (
serve --ui --port 9090), a freshHumanInTheLoopverdict from real risk scoring now parks the/checkrequest, surfaces in the dashboard's Approvals queue with full risk context, and resumes with the human's verdict once they click Allow or Deny. Previously the call returned HITL immediately and the agent's action was simply lost — the Approvals tab stayed empty and there was no path for a human to intervene. See #55 for the full problem statement.Type
What changed
crates/permit0-cli/src/cmd/serve.rs:block_for_advisory_approvalparks/checkon fresh HITL verdicts (Scorer/AgentReviewersource only) until the human resolves the approval, then resumes with the human's verdict. Times out after the existing 5-minuteApprovalManagerwindow with 408 — same shape as calibration mode.UnknownFallbackHITLs are deliberately excluded — those mean "permit0 has no opinion about this tool" and should be fixed with a normalizer or allowlist entry rather than flooding the queue.engine.store().policy_cache_set), so subsequent identical calls don't re-prompt. HITL verdicts submitted by the reviewer are not cached, matching the engine's own behavior. Session-aware cache bypass is the right model long-term, but session chains aren't supported yet — revisit then.Behavior matrix (non-calibration mode, with
--ui)Scorer/AgentReviewerAllow/DenyScorer/AgentReviewerHumanInTheLoop/check→ Approvals queue → resumes with human verdict. Cached on Allow/Deny.UnknownFallbackHumanInTheLoopPolicyCache/Allowlist/Denylist/ priorHumanReviewerserve --port 9090without--ui: no approval manager → pass-through, same as before.Tradeoff to flag for reviewers
/checknow holds the HTTP connection open up to 5 min waiting on a human (matches the existingApprovalManager::DEFAULT_APPROVAL_TIMEOUT). HTTP clients with shorter default timeouts (most do — 30–60s is common) will give up before the human responds and the call fails on the agent side. Calibration mode has had this property since it shipped, so this is consistent with the existing model. If we want a different default here (60s? configurable per-request?), say so and I'll add it.Test plan
cargo build -p permit0-clicargo clippy -p permit0-cli --all-targets -- -D warningscargo fmt --all --checkcargo test -p permit0-cli(14 + 6 pass)cargo test -p permit0-ui(77 pass)POST /api/v1/checkforgmail_sendparks.GET /api/v1/approvalsreturns the entry with MEDIUM tier, score 43, entities{to, body, subject, domain, recipient_scope}, flags[GOVERNANCE, OUTBOUND, PRIVILEGE, EXPOSURE, MUTATION].POST /api/v1/approvals/decidewithpermission: allow→ original/checkresumes and returns{"permission":"allow","source":"HumanReviewer"}./checkreturns in ~9ms withsource: PolicyCache— confirms cache write after human approval./checkwith a differentbodyparks again — confirms cache keys on the full normalized action.