Feat/shell allowlist approval#142
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
skill-check — worker3 verified, 22 skipped (no docs/). 1 error across the verified workers.
|
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Spec for converting approval-gate from a synchronous blocking gate (await_decision + 250ms poll) into a deferred-execution trigger model: the agent receives a pending_approval tool result immediately, the gate executes the underlying function on resolve, and the outcome is stitched into the agent's next turn as a system message. Depends on PR #136 (reactive ui::approval pipeline) landing first.
…ng-call invocation
…ds executed/failed
…w RPCs; delete await_decision
…approval messages
…ult for gated calls
…ndelivered/ack_delivered
…er model + clippy clean
…gv function and update ShellConfig for allow_any option
…adability in approval lifecycle tests
…val_required becomes a fallback
When an operator registers an interceptor rule for a function id (in
iii.worker.yaml), every call to that function goes through the
classifier/pause path — even if the run's `approval_required` list
is empty. The run-level list now only matters for functions that have
no matching rule.
Rationale: the interceptor config is the deployment's source of truth
for which functions need gating. Requiring callers to also list those
ids in approval_required was a second source of truth that's easy to
forget, and the failure mode was silent allowlist-rejection on
shell::exec ("command 'python3' not in allowlist") because the gate
never fired.
Extracted decide_intercept_action() as a pure helper for unit testing
the decision matrix without spinning up the iii engine.
Add ARITY dictionary mapping bash command prefixes to their
human-meaningful token count (git -> 2, npm run -> 3, docker compose -> 3,
etc.) and a prefix() helper that walks longest-first to derive the
canonical command identity for an argv. allowlist_contains now uses
prefix_matches() so operators can write multi-token allowlist entries
('git checkout', 'npm run build') and have them match the right slice;
single-token entries and basename-normalized full-path argv heads stay
back-compat.
Ported from opencode's packages/opencode/src/permission/arity.ts.
Introduce a typed Denial enum (serde tag=kind, content=detail) carried on
wire replies, persisted records, and approval_resolved stream events.
Variants: Policy {classifier_reason, classifier_fn}, UserRejected,
UserCorrected {feedback}, StateError {phase, error}, Legacy {reason}.
approval-gate
- Decision::Deny and ClassifierDecision::Deny wrap Denial.
- block_reply_for emits {block, denial?} instead of a flat reason string.
- transition_record takes Option<Denial> in place of decision_reason.
- handle_resolve parses payload.denial; missing denial defaults to
UserRejected so single-click reject UIs stay simple; malformed denial
returns bad_denial.
- handle_intercept state-write failure emits Denial::StateError; already
resolved / in-flight replays carry a replay discriminator (not a
denial).
- migrate_legacy_record handles two generations of drift: pre-trigger-
model status strings and the pre-Denial decision_reason field; the old
field is stripped on read so it never round-trips.
- approval_resolved stream events carry denial in place of
decision_reason.
- timed_out records carry no Denial; status is self-describing.
turn-orchestrator
- approval_stitching renders per-kind LLM-readable lines (policy names
the classifier, user_corrected surfaces the feedback string, etc.).
harness-types (6 vendored copies kept byte-identical)
- ApprovalResolved carries denial: Option<Denial> in place of
reason: Option<String>; Denial type defined locally and re-exported
from lib.rs.
…ingle drain RPC (#146) * feat(approval-gate): layered policy rules with findLast evaluator Introduce a first-class Rule/Ruleset/Action surface ported from opencode's Permission.evaluate. Rules are wildcard globs over the iii function id (pattern is "*" in v1 — the field is reserved for per- function pattern extractors in a follow-up). Evaluator semantics: findLast across flattened layers, so operators stack a permissive default with more-specific overrides without surgery on the base list. approval-gate - New rules module: Action (allow/deny/ask), Rule, Ruleset alias, wildcard_match (DP, no regex), evaluate(perm, pattern, IntoIterator of &Rule). 16 unit tests covering matching, layering, last-wins. - WorkerConfig.rules: Vec<Rule> (additive; existing InterceptorRule flow unchanged). - apply_policy_rules: pure helper that maps a rule hit to PolicyOutcome (Allow / Deny{rule_permission, rule_pattern} / FallThrough). Allow short-circuits to pass-through; Deny short-circuits with a Denial::Policy that names the matched rule; Ask and no-match fall through to the existing per-function interceptor flow. - handle_subscriber consults the rules layer first, before decide_intercept_action. * feat(approval-gate): cascade allow on `always: true` resolve reply When a user resolves a pending call with decision=allow + always=true, approval-gate now pushes a runtime Allow rule for the call's function id and sweeps the same session's pending records, auto-resolving every other one the new rule covers. One click instead of N. approval-gate - approve_and_execute() extracted from handle_resolve's allow branch as a reusable async helper. Both the user-driven allow path and the cascade sweep drive state through the same transitions (approved -> invoke -> executed | failed). - Shared policy ruleset wrapped as Arc<RwLock<Ruleset>> so reply-time mutation is safe across the subscriber + resolve closures. Read guards are scoped so they never cross an .await (std::sync::RwLock is not async-safe to hold across suspension). - cascade_allow_for_session() pushes the new rule under the write lock, snapshots session pending via list_prefix, and runs approve_and_execute for each newly-Allow record. The originator is skipped (already resolved above). Per-record state-write failures are logged and the rest of the cascade continues. - handle_resolve response carries `cascaded: N` when the sweep resolved at least one extra record; omitted otherwise so the one-shot path stays unchanged. Cascade scope is intentionally narrow for v1: - function-id-only matching (pattern "*"), mirrors the v1 rules surface - same session only - in-memory rule (no cross-restart persistence) - allow + always; deny + always is out of scope Five new tests cover the cascade decision tree (no-always, same-session match, cross-session non-effect, originator-skip, terminal-record skip). * chore(approval-gate): remove legacy migration surface The Denial refactor replaced decision_reason with a typed structure, but several legacy hooks remained in place. They are now removed because the underlying schemas they protected against (pre-trigger-model status strings, pre-Denial flat decision_reason field) cannot occur in any state store this branch is wired to. approval-gate - Drop migrate_legacy_record entirely. The pre-trigger-model status="allow"/"deny" rename and the decision_reason -> Denial::Legacy lift both go away. Old persisted records (if any) now surface as filtered-out orphans rather than silently mistranslated. - Drop Denial::Legacy variant from the enum. - Drop legacy_migrated flag from records. - Drop the four migrate_legacy_record_* unit tests and the handle_list_undelivered_persists_migrated_legacy_record integration test. - handle_list_pending no longer pre-filters legacy shapes; orphan records lacking a session_id stamp are dropped uniformly. - timeout_resolved_event drops the decision_reason: "timeout" field (status: "timed_out" is self-describing). - handle_sweep_session drops the optional `reason` payload field. The cause (session_deleted vs run_stopped) is the caller's concern; the swept timed_out records carry no Denial. - skills/sweep_session.md updated. turn-orchestrator - approval_stitching drops the legacy denial-kind branch and the legacy_migrated note line; the corresponding stitch test is removed. - run_stop no longer forwards reason: "run_stopped" on the approval::sweep_session payload; it logs the cause locally instead. - The run_stop integration test now asserts the absence of the reason field on the sweep payload. harness-types (6 vendored copies, byte-identical) - Drop the Legacy variant and its docblock line from each agent_event.rs copy. * feat(approval-gate): add typed Record, Status, Next schema New src/record.rs defines the persisted approval record as a typed struct alongside its lifecycle Status enum and a Next enum that pairs each target status with its outcome data. Wire format is byte-identical to the existing serde_json::Value blobs — operators can adopt the typed shape incrementally without changing any persisted state. approval-gate - New module record: - Status enum (Pending / Approved / Executed / Failed / Denied / TimedOut), serde snake_case, with is_terminal() helper. - Record struct mirroring the historical JSON keys field-for-field, optional outcome fields scoped to terminal statuses, serde skip-when-None so existing rows round-trip cleanly. - Next enum that bundles (target_status, outcome_payload) so the type system rules out impossible transitions (e.g. Executed without a result, Denied without a Denial). Replaces the old (status_string, Option<result>, Option<error>, Option<denial>) parallel-Option signature when callers opt in. - Record::to_value / Record::from_value bridge to the Value blobs the iii state bus still expects. - 8 unit tests covering serde round-trips, status semantics, expiry saturation, forward-compat with unknown fields. - lib.rs re-exports Next, Record, Status. Handler migration to the typed schema is a separate change; this commit just makes the types available so future work can lift Value access into typed field access incrementally rather than in one big diff. * refactor(approval-gate): extract wire.rs from lib.rs First step of breaking lib.rs (~4300 lines) into focused modules. This commit lifts the wire-format types and small wire-shape helpers into their own module — pure data shapes with no I/O, no iii-sdk dependency, no async — so downstream workers that only need to understand the approval-gate protocol can depend on a small surface. Moved (lib.rs -> wire.rs): - Denial (structured deny payload) - IncomingCall + requires_approval() - Decision (Allow | Deny(Denial)) - WireDecision (allow|deny wire enum) - pending_key() - extract_call() - block_reply_for() All public items stay re-exported from the crate root, so every existing call site (handlers, tests, integration tests, downstream crates) continues to work without import changes. Pure move, no behavior change. 141 approval-gate tests pass; turn-orchestrator + harness + session + providers + harness-tui + shell all pass. * refactor(approval-gate): extract lifecycle.rs from lib.rs Second step of breaking lib.rs into focused modules. This commit lifts the persisted-record lifecycle helpers — pure Value-blob constructors and transitions, no I/O, no iii-sdk dependency — into their own module. Moved (lib.rs -> lifecycle.rs): - is_terminal_status() - build_pending_record() - transition_record() + transition_record_with_now() - collect_timed_out_for_sweep() - maybe_flip_timed_out() All public items stay re-exported from the crate root. Pure move, no behavior change. 141 approval-gate tests pass; all 8 downstream worker crates pass. * refactor(approval-gate): extract state.rs from lib.rs Third step of breaking lib.rs into focused modules. This commit lifts the iii-backed state-bus and function-executor implementations into their own module, along with the `__from_approval` marker plumbing and the boot-time marker-target safety check. Moved (lib.rs -> state.rs): - StateBus trait + IiiStateBus impl (iii state::* wrapper) - FunctionExecutor trait + IiiFunctionExecutor impl (iii.trigger wrapper) - rule_for() (lookup helper used by IiiFunctionExecutor) - merge_from_approval_marker_if_needed() (marker payload composer) - unverified_marker_targets() (boot guard) The StateBus / FunctionExecutor traits are kept exactly as they were in lib.rs — they exist purely as test seams so unit tests can swap in InMemoryStateBus / FakeExecutor. No new abstractions added. Public items stay re-exported from the crate root. Pure move, no behavior change. 141 approval-gate tests pass; all 8 downstream worker crates pass (turn-orchestrator's pre-existing dual_write flake notwithstanding). * refactor(approval-gate): extract intercept.rs from lib.rs Fourth step of breaking lib.rs into focused modules. This commit lifts the intercept decision flow into its own module — the three layers that decide what the gate does with an incoming function call: policy rules, per-function interceptor rules, and classifier replies, plus the async handle_intercept that writes the pending record. Moved (lib.rs -> intercept.rs): - InterceptAction enum + decide_intercept_action() - PolicyOutcome enum + apply_policy_rules() - ClassifierDecision enum + interpret_classifier_reply() - handle_intercept() handle_intercept stays public (re-exported from the crate root). The pub(crate) helpers stay pub(crate) and are imported back into lib.rs for the subscriber closure in register(). Pure move, no behavior change. 141 approval-gate tests pass; turn-orchestrator + harness + shell verified green. * refactor(approval-gate): extract resolve.rs from lib.rs Fifth step of breaking lib.rs into focused modules. This commit lifts the approval::resolve flow into its own module. Moved (lib.rs -> resolve.rs): - handle_lookup_record() - handle_resolve() - cascade_allow_for_session() (the always=true sweep) - approve_and_execute() (shared by user-driven allow + cascade) handle_resolve and handle_lookup_record stay public via the crate root re-exports. approve_and_execute and cascade_allow_for_session stay private to the resolve module. Pure move, no behavior change. 141 approval-gate tests pass. * refactor(approval-gate): extract delivery.rs from lib.rs Sixth step of breaking lib.rs into focused modules. This commit lifts the delivery-tracking RPCs into their own module. Moved (lib.rs -> delivery.rs): - handle_list_pending() - handle_list_undelivered() - handle_ack_delivered() - handle_consume_undelivered() - handle_flush_delivered() - handle_sweep_session() - LIST_UNDELIVERED_DEFAULT_LIMIT All public items stay re-exported from the crate root. Pure move, no behavior change. 141 approval-gate tests pass. * refactor(approval-gate): extract sweeper.rs from lib.rs Seventh step of breaking lib.rs into focused modules. This commit lifts the periodic timeout sweeper and the small iii-stream helpers it owns into their own module. Moved (lib.rs -> sweeper.rs): - spawn_timeout_sweeper() - timeout_resolved_event() - write_event() - write_hook_reply() - uuid_like() All five stay pub(crate) — only the register() wiring in lib.rs and the resolve flow need them. Pure move, no behavior change. 141 approval-gate tests pass. * refactor(approval-gate): extract register.rs from lib.rs Final step of breaking lib.rs into focused modules. This commit lifts the iii function/trigger wiring — the heart of the worker startup path — into its own module. Moved (lib.rs -> register.rs): - register() (~420 lines: the subscriber closure + 8 function registrations + trigger registration + sweeper spawn) - Refs struct (the handle bag returned to the binary) - FN_RESOLVE / FN_LIST_PENDING / FN_LIST_UNDELIVERED / FN_CONSUME_UNDELIVERED / FN_ACK_DELIVERED / FN_FLUSH_DELIVERED / FN_SWEEP_SESSION / FN_LOOKUP_RECORD constants - STATE_SCOPE constant lib.rs is now ~50 lines of module declarations + re-exports + ~2580 lines of inline tests. The test mod uses super::* plus local serde_json and std::sync imports. All public items stay re-exported from the crate root so the binary's `use approval_gate::register;` keeps working unchanged. Pure move, no behavior change. 141 approval-gate tests pass; turn-orchestrator + harness + shell verified green (turn-orchestrator's pre-existing dual_write flake notwithstanding). * refactor(approval-gate): move pub(crate)-using tests inline First step of relocating the 2580-line inline test block from lib.rs into more focused homes. This commit moves the 18 tests that touch pub(crate) helpers — ones that can't live in tests/*.rs because integration tests can only see pub items — into the #[cfg(test)] mod tests block of whichever production module owns the helper they exercise. intercept.rs gained: - interpret_classifier_reply_reads_decision_tags - decide_intercept_action_* (5 tests) - apply_policy_rules_* (5 tests) state.rs gained: - merge_from_approval_* (4 tests) - rule_for_returns_matching_rule / rule_for_returns_none_when_absent sweeper.rs gained: - timeout_resolved_event_shape lib.rs loses those 18 tests. 141 approval-gate tests pass (no duplicate counts — each test runs once). turn-orchestrator + harness + shell verified green. Public-API tests (the remaining ~96 in lib.rs) move to tests/*.rs in a follow-up commit so they can share fakes via a common module. * refactor(approval-gate): move public-API tests to tests/*.rs Final test split. The 86 remaining inline tests in lib.rs that exercise only the crate's `pub` surface now live in tests/*.rs, organized by the area they cover. The state-machine proptest gets its own file. lib.rs drops from 2369 to 52 lines (just module declarations + re-exports). New tests/ layout: - common/mod.rs — shared fakes: FakeExecutor, InMemoryStateBus, FailingStateBus, sample_call(), empty_policy_rules() - lifecycle.rs — 19 tests: build_pending_record, transition_record, maybe_flip_timed_out, collect_timed_out_for_sweep, is_terminal_status, pending_key - wire.rs — 10 tests: extract_call, block_reply_for, requires_approval - intercept.rs — 9 tests: handle_intercept (replays, fail-closed, session_id stamping, force_pending) - resolve.rs — 19 tests: handle_resolve, cascade-on-`always`, handle_lookup_record - delivery.rs — 25 tests: list_pending / list_undelivered / ack_delivered / consume_undelivered / flush_delivered / sweep_session - misc.rs — 4 tests: FN_* constants, unverified_marker_targets, FakeExecutor smoke test - state_machine.rs — the proptest with the four lifecycle invariants Each tests/*.rs uses `mod common;` for shared fakes — only one source of truth for in-memory bus/executor stand-ins. 141 approval-gate tests pass across 13 test binaries (previously 6). turn-orchestrator + harness + shell verified green. * feat(approval-gate): new Record schema with Pending|InFlight|Done lifecycle Pending → InFlight → Done(Outcome). InFlight is the intermediate persist that closes the dup-exec race within a worker process — a second approval::resolve arriving during the invoke await sees the row is non-Pending and bails. Outcome is a tagged enum (Executed/Failed/Denied/TimedOut). resolved_at is stamped on the first non-Pending transition for deterministic multi-row consume ordering. lifecycle.rs is deleted; its only surviving helper (flipped_to_timed_out_if_expired) is now a Record method. * fix(approval-gate): keep lifecycle.rs as transitional compat shim Task 1 of the simplification plan deletes lifecycle.rs entirely, but the plan was a bit optimistic: dependent modules (intercept, resolve, delivery, sweeper) still import its helpers (build_pending_record, transition_record, maybe_flip_timed_out, is_terminal_status, collect_timed_out_for_sweep) and the crate fails to compile without them. Cargo can't run record.rs's unit tests if the library itself won't link. Restore lifecycle.rs as a transitional shim. Tasks 5/6/8/11 will progressively migrate each callsite to the new Record API; whichever of those tasks lands last deletes lifecycle.rs for real. * feat(approval-gate): Denial::Policy carries rule_permission + rule_pattern The classifier surface is being deleted in favor of the layered rules engine; rename the Policy variant's detail fields to match the actual data they'll carry once T5 lands. For now, classifier callsites (intercept.rs, register.rs) map their old reason/fn locals into the new field names — transient mapping, deleted in T5. Test callsites in tests/wire.rs and tests/lifecycle.rs updated to use representative rule_permission/rule_pattern values. Note: IncomingCall.approval_required + requires_approval() field/method are NOT removed in this commit (the plan's T2 calls for it but intercept.rs's decide_intercept_action still depends on them). They disappear in T5 when intercept.rs is rewritten end-to-end. * feat(approval-gate): pattern_for(function_id, args) extractor For shell::exec/shell::exec_bg, derive a pattern string from {command, args} so layered rules can match argv-level patterns like 'git status*'. Other function ids default to '*'. Malformed/missing args fall back to '*' (matches only wildcard rules). 8 new tests cover all branches: join+args, single-string command, empty args list, missing command, malformed args, non-shell function id, and the documented space-conflation case. * feat(approval-gate): StateBus::delete primitive Required by the new approval::consume RPC (T8), which returns Done rows and deletes them in the same call. IiiStateBus impl maps to the existing state::delete RPC (used elsewhere in the codebase, e.g. auth-credentials). Test fakes (InMemoryStateBus, FailingStateBus) gain matching impls — InMemory removes the key idempotently, Failing errors to match its other primitives. * feat(approval-gate): verdict-driven handle_intercept The classifier surface and per-function InterceptorRule decision flow are gone. handle_intercept now reads the verdict via the new lib-level verdict_for(function_id, args, ruleset) helper: Verdict::Allow → {block:false}, no state write Verdict::Deny → {block:true, denial:Policy{rule_permission, rule_pattern}} Verdict::Ask → write Pending row + reply {block:true, status:pending} Replay defense extended for the new 3-state schema: Pending → replay:in_flight, status:pending InFlight → replay:in_flight, status:in_flight (new) Done → replay:already_resolved, status:done intercept.rs drops InterceptAction, decide_intercept_action, ClassifierDecision, interpret_classifier_reply, PolicyOutcome, apply_policy_rules. register.rs's subscriber closure collapses the old policy_outcome+decide_intercept+classifier dance into one handle_intercept call. resolve.rs's cascade loop reuses verdict_for instead of the deleted apply_policy_rules. State-write failure still fails closed with Denial::StateError. 7 new tests cover all four verdict branches + the three replay states + fail-closed behavior. Note: InterceptorRule config struct and intercept_rules variable are still in place for T12 to fully remove. * feat(approval-gate): three-phase resolve + cascade exact-pattern (T6+T7) handle_resolve rewritten to typed Record + three-phase allow path: 1. write InFlight (closes the dup-exec race within a worker process) 2. iii.trigger(function_id, args) and await 3. write Done(Executed{result}) or Done(Failed{error}) Deny is a single Pending → Done(Denied{denial}) write — no invoke, no InFlight needed. WireDecision::Deny with no denial in payload defaults to UserRejected. UserCorrected{feedback} round-trips. Dup-exec guard: handle_resolve refuses non-Pending rows with in_flight (concurrent resolve mid-invoke) or already_resolved (terminal). Cascade allow on always:true (T7) now pushes a runtime Allow rule with the originator's EXACT pattern via rules::pattern_for(args) — not the prior blanket pattern:'*'. 'Always allow git status' no longer auto-allows rm -rf / via the same shell::exec function id. Lock-ordering invariant pinned in module docs: never hold the ruleset guard across .await. * feat(approval-gate): approval::consume + strip delivery dead RPCs + delete sweeper.rs/lifecycle.rs (T8+T9+T11) Three behavioral changes folded into one commit because they share files: T8 (new approval::consume RPC): Three-phase drain — gather Done candidates, sort by resolved_at + cap, delete-and-return. Defensive session_id filter. Lazy timeout flip on read. Default cap CONSUME_DEFAULT_LIMIT=50 bounds the response size against MB-scale stdout payloads. T9 (delivery.rs strip): handle_list_undelivered, handle_consume_undelivered, handle_ack_delivered, handle_flush_delivered all deleted. LIST_UNDELIVERED_DEFAULT_LIMIT constant gone. handle_list_pending rewritten on the typed Record API with lazy timeout flip on read. handle_sweep_session rewritten on new schema — flips Pending+InFlight rows to Done(TimedOut). T11 (sweeper.rs deleted): Background polling task gone. Timeouts now flip lazily on read in handle_resolve / handle_consume / handle_list_pending. UI handles expires_at countdown client-side; the LLM learns of timeouts on the next consume. lifecycle.rs transient shim also deleted (no more callers). Stream helpers (uuid_like, write_event, write_hook_reply) move into register.rs as their only consumer; spawn_timeout_sweeper + timeout_resolved_event deleted with the rest of sweeper.rs. register.rs surgery: Refs struct loses 5 dead FunctionRef fields and the sweeper JoinHandle. RPC registrations for list_undelivered, consume_undelivered, ack_delivered, flush_delivered all gone. New FN_CONSUME registration added. Classifier-alias warning check trimmed to live function ids. 15 new tests cover handle_consume (7) + handle_sweep_session (1) + handle_list_pending lazy flip (1) + the pre-existing edits. 65 lib tests pass total. * feat(approval-gate): strip __from_approval marker plumbing (T10) IiiFunctionExecutor::invoke forwards function_id+args directly to iii.trigger — no marker stamp. Deleted from state.rs: - merge_from_approval_marker_if_needed (and its 4 unit tests) - unverified_marker_targets (and the 2 rule_for tests) - rule_for (only marker-related callers) Deleted from register.rs: - IiiFunctionExecutor's rules: Arc<Vec<InterceptorRule>> field - Boot-time unverified-marker check (refused-to-start guard) Per the spec's Threat Model section: bus access ≡ shell access in the new model. The harness's bus-level access control is the perimeter; defense-in-depth via per-target marker verification is out of scope for v1. Shell-side marker verification is deleted in T13. T12 will fully retire InterceptorRule + the classifier-alias warning loop in register.rs. * feat(approval-gate): strip config to topic+scope+timeout+rules (T12) config.rs: WorkerConfig now has just {topic, approval_state_scope, default_timeout_ms, rules}. interceptors + sweeper_interval_ms fields gone. InterceptorRule struct kept as a no-op shim with minimal fields so the classifier-alias warning loop in register.rs still compiles — the loop is fed an empty Vec so the body never runs. iii.worker.yaml: replaced interceptor config with a curated default ruleset. Read-only fs/git auto-allowed; shell::exec/exec_bg ask; catch-all asks. Operators stack their own rules on top (last-match wins). register.rs cfg.interceptors → cfg.rules wiring confirmed: policy_rules is seeded from cfg.rules at startup and remains mutable via cascade. Note: InterceptorRule shim + classifier-alias warning loop in register will be deleted in a follow-up pass when the dependent code is fully cleaned up. They're cosmetic at this point — neither has functional effect. * feat(shell): strip classify_argv / allowlist / __from_approval marker (T13) Shell becomes a plain executor. All policy decisions live in approval-gate's rules layer (see the layered ruleset shipped in approval-gate/iii.worker.yaml). Deleted: - shell/src/functions/approval_bypass.rs (marker validation module) - shell/src/functions/classify.rs (classifier handler) - shell/src/arity.rs (arity-aware allowlist matcher) - ApprovalMarker + ClassifyArgvRequest types - ExecRequest.from_approval + ExecBgRequest.from_approval fields - ShellConfig.allowlist + allow_any + denylist_patterns + compiled_denylist fields and their methods (compile_denylist, denylist_hit_reason, allowlist_contains, is_command_allowed) - shell::classify_argv function registration - shell::classify_argv manifest entry - All allowlist/denylist unit tests in shell/src/config.rs - One test in shell/tests/function_handlers.rs that asserted shell rejected unlisted commands Kept (independent surface): - fs::host_root jail + fs.denylist_paths (filesystem path-based denials for shell::fs::* tools; distinct from the exec-policy denylist) - The fs jail validate_fs_jail boot check Threat model implication, per the spec: bus access ≡ shell access. The harness's bus-level access control is the perimeter; any worker on the bus can call shell::exec with arbitrary args. Defense-in-depth via shell-side allowlists is gone for v1. 133 shell lib tests + 328 shell integration tests pass. * feat(turn-orchestrator): switch stitch to approval::consume (T14) consume_approval_stitch now calls approval::consume (single RPC, returns + deletes in one shot). Payload is { session_id } only; no turn_id, no limit (gate enforces a default cap of 50, surfacing overflow via the 'omitted' counter as before). stitch_entries rewritten for the new Record wire shape: reads nested outcome: { kind, detail } instead of the old top-level status + result/error/denial. Outcome kinds: executed, failed, denied, timed_out. function_call_id is the canonical key (call_id retained as a legacy fallback for old rows on disk during the upgrade). render_denial_lines's policy branch reads rule_permission/rule_pattern (matching the renamed Denial::Policy detail shape from T2). The 'policy denied by classifier_fn: classifier_reason' wording becomes 'policy denied by rule <perm> : <pat>'. omission_summary_message reworded: 'approval::flush_delivered' is gone; the natural recovery path is the next-turn consume, which is what the new message points at. 131 lib tests pass (was 122 + 9 failing from old-shape assertions). Pre-existing integration test failure (tests/run_stop.rs imports run_stop module that isn't declared in lib.rs) is unrelated to this refactor. * test(approval-gate): E2E lifecycle tests + delete dead integration tests (T17) Delete 8 integration test files whose assertions targeted the deleted surface (FN_LIST_UNDELIVERED, FN_ACK_DELIVERED, transition_record, InterceptAction, marker plumbing, ...): - tests/approval_lifecycle.rs (engine-backed, used deleted RPCs) - tests/delivery.rs (old delivery RPC surface) - tests/integration.rs (old subscriber wiring) - tests/intercept.rs (decide_intercept_action, classifier path) - tests/lifecycle.rs (record-helper tests on deleted schema) - tests/misc.rs (mixed assertions on old shape) - tests/resolve.rs (old transition_record / Approved status) - tests/state_machine.rs (proptest dep + old schema) The src/* unit tests added across T1-T12 cover the equivalent surface for the new wire shape (Pending|InFlight|Done, Outcome enum, Denial::Policy {rule_permission, rule_pattern}, etc.). Add tests/lifecycle.rs with FIVE E2E lifecycle tests against the in-memory StateBus + FakeExecutor — the integration safety net for the whole refactor: 1. allow_path_end_to_end — pending → InFlight → executed → consume drains 2. deny_path_with_user_corrected_feedback_end_to_end — feedback round-trips 3. timeout_path_lazy_flips_on_consume_end_to_end — past expires_at flips on read 4. cascade_path_end_to_end — two pending rows; allow+always pushes exact-pattern rule; second auto-resolves; consume drains both; pinned: pushed pattern is the originator's argv ('echo go'), NOT blanket '*' 5. allow_rule_short_circuits_with_no_state_write — bonus: Verdict::Allow doesn't touch state 72 approval-gate tests pass across 6 suites (lib + wire + manifest + lifecycle + 2 from common helpers). Net deletion: ~2500 lines of old-shape integration tests gone, ~250 lines of new E2E in their place. * feat(harness/web): Allow+Always, Deny feedback, expires_at countdown (T15) ApprovalRow.tsx grows three real-feature additions matching the approval-gate refactor's new wire surface: 1. Allow+Always button → sends {decision:'allow', always:true}. The gate cascades by pushing a runtime Allow rule with the originator's exact pattern and sweeping the session's other Pending rows (auto-resolving any that match). The response carries cascaded:N when the sweep was non-empty; not surfaced in the UI yet. 2. Optional Deny feedback textarea → when the operator types a correction message before clicking deny, the payload becomes {decision:'deny', denial:{kind:'user_corrected', detail:{feedback}}}. That feedback flows verbatim to the LLM via the stitched message on the next consume — the high-value path because the model gets actionable correction, not just 'denied'. 3. Client-side expires_at countdown → the modal shows MM:SS remaining and disables actions on expiry. No server-emitted timed_out frame anymore (the gate's sweeper is gone); the LLM learns of timeout via the next-turn approval::consume that lazy-flips the row. Each pending row is split into its own ApprovalCard component so the countdown hook respects rules-of-hooks (can't sit inside a .map callback). ResolvePayload is constructed inline so the bridge's Record<string, unknown> type accepts it without an explicit cast at the interface boundary. CSS classes for the new states (approval-allow-always, approval-countdown, approval-feedback, approval-expired-note) inherit default styling for now; a follow-up styling pass can polish. * docs(harness/fanout): note T16 stream-rewrite as deferred follow-up The approval-gate refactor's T16 called for switching the fanout from polling approval::list_pending to subscribing to agent::events for approval_requested / approval_resolved frames. The polling implementation continues to work correctly against the new wire shape — diff_approvals reads function_call_id (now top-level on Record) and list_pending was kept by approval-gate's T8/T11 — so this is a perf cleanup, not a correctness fix. Comment the constant accordingly; defer the actual stream subscription to a follow-up PR. No code change beyond the doc. * fix(shell/e2e): drop allowlist/denylist tests + configs Shell stopped enforcing command-level policy; the e2e suite still referenced the removed allowlist/denylist_patterns config fields and pinned behavior that no longer exists. Strip those expectations and update the docs that described them. - config.yaml / config-jailed.yaml: remove allowlist + denylist_patterns (would fail YAML parse against the new ShellConfig schema) - cases-safety.ts: drop 3 allowlist/denylist assertions; keep one positive 'absolute-path command runs' case - cases-edge.ts: 'nonexistent unlisted command' rewritten to assert the OS spawn error (ENOENT), not an allowlist message - cases-exec-break.ts: drop 'unlisted command with extra fields' - cases-exec-sandbox.ts: drop 'sandbox allowlist still enforced' - cases-vuln-repro.ts: drop S-H3 case (advisory denylist) — the finding's whole point was that command policy belongs upstream - ARCHITECTURE.md / e2e README: reflect the new boundary
…vered, flush_delivered, list_undelivered, and sweep_session. These files have been deleted as part of the ongoing simplification and refactoring of the approval-gate system.
d7fd45b to
eed8760
Compare
…ning no-op The T12 simplification (94528b0) removed the classifier surface but left behind InterceptorRule and a hard-coded empty-vec warning loop for source compatibility. Nothing references either — delete the struct, its re-export, and the orphaned subscriber-closure captures.
…ator, hooks, and UI
When the gate replies status:"pending" the orchestrator, hook bus, and
web reducer all need to cooperate so the placeholder result resolves
into the executed outcome on approval. Six changes in one feature:
- hook-fanout: merge_first_block_wins now forwards the full blocker
reply instead of flattening to {block, reason}, so status:"pending",
denial, call_id, and function_id survive the merge.
- turn-orchestrator/run_start: new run::resume function. Approval-gate
calls it after a late-resolved approval; if the session is Stopped it
resets to Provisioning so handle_streaming runs and consume drains.
- turn-orchestrator/states/functions:
- terminate=true on pending block (was false, looping the model).
- Pass session_id into the before_function_call payload — without it
the gate cannot key its pending record and silently passes through.
- Render pending blocks as non-error (not red in the UI).
- Race-dedupe the FunctionResult append and gate turn_end_emitted so
two concurrent turn::step handlers don't double-emit.
- turn-orchestrator/states/assistant: resolve_pending_function_results
replaces the placeholder pending body with the executed / failed /
denied / timed_out outcome from approval::consume — fixes the chat
getting stuck on "Awaiting human approval...".
- turn-orchestrator/persistence: dedupe FunctionResult by
function_call_id at save time (last line of defense for the race).
- harness/web/reducer: dedupe function_result by function_call_id (not
timestamp+content-length) so race-emitted MessageEnd events don't
render the same pending block twice.
Bring back the pre-T13 argv-level allowlist + denylist on ShellConfig as optional, default-empty. Both lists empty (the shipped default and the playground/harness deployment) means pass-through: the approval-gate remains sole authority. Operators running shell standalone can populate either list to get a defense-in-depth check evaluated inside shell::exec / shell::exec_bg right after argv is parsed. - ShellConfig regains allowlist (Vec<String>), denylist_patterns (Vec<String>), and the compiled_denylist regex cache; load_config calls compile_denylist(). allow_any is intentionally not restored — its purpose was the classifier-path bypass that 94528b0 removed; with shell owning its own check, "no enforcement" is already expressible by leaving allowlist empty. - shell/src/arity.rs is reinstated (the file 94528b0 deleted) so allowlist_contains can keep using prefix_matches for token-aligned multi-token entries ("git checkout", "docker compose up") and basename-normalized full-path argv heads. - shell::exec and shell::exec_bg invoke cfg.is_command_allowed right after parse_argv. Denylist short-circuits before the allowlist check; empty lists short-circuit to Ok(()). - shell/config.yaml ships with both lists empty. The pre-T13 curated read-only allowlist is left out on purpose so the harness deployment doesn't end up in a re-prompt loop when the gate approves a command that the shell-side list excludes. - Restored config-level tests (allowlist permit/reject, basename match, arity matching, denylist precedence, empty-argv rejection) plus a new denylist_wins_over_allowlist pin. Handler-level tests un-stub cfg_with_allow and gain a passthrough regression test (exec_handler_passthrough_when_lists_empty) alongside the reinstated exec_handler_rejects_unlisted_command. - E2E configs lose the stale "T13: gone" comments. End-to-end verified through the harness web UI: approving a shell::exec call for an unlisted command now executes the call exactly once instead of looping back to pending_approval.
Tighter approval-card markup and a dedicated styles block. Functional behavior is unchanged; this only touches presentation and a11y. ApprovalRow.tsx: - Wrap the card in a semantic <section role="region"> labelled by the function id (titleId derived from the call id) and use a <header> for the eyebrow/title/countdown row. - Surface state via data attributes (data-expired, data-busy) instead of conditional class strings so CSS owns the visual variants. - Add countdownTone() — emits ok / warn / danger / expired thresholds at 30s / 10s / 0s, exposed as data-tone for CSS coloring. - Split the deny-feedback summary into a label + hint pair so it can wrap cleanly under small widths. - Drop the standalone "allow" duplicate; keep the explicit "allow" and "allow + always" pair (matches the wireframes pass). styles.css: - New "approvals" block at the end of the stylesheet defining .approval, .approval-head, .approval-eyebrow, .approval-countdown[data-tone], .approval-args, .approval-feedback*, .approval-actions and the three action buttons. Uses the existing CSS-variable palette (var(--accent), var(--paper), var(--rule)) and color-mix() for the wash + edge tones. - approval-enter keyframes with prefers-reduced-motion fallback. - Mobile breakpoint (<=640px) wraps actions and rebalances the deny button to its own row.
The review gaps were around approval resolution edge cases, allow+always cascading, resume re-kicks, shell policy fallback, and the web approval controls. This adds focused regression coverage at the handler/integration boundaries and fixes the manifest default_config source so the parity test reflects the shipped worker YAML instead of a divergent struct default. Constraint: Keep unrelated dirty workspace changes out of the commit Rejected: Full-stack UI-to-shell E2E | too heavy for this gap pass and less deterministic than the targeted boundary tests Confidence: high Scope-risk: narrow Directive: Do not change approval manifest defaults without keeping iii.worker.yaml parity covered Tested: approval-gate cargo test in clean staged worktree Tested: shell cargo test --test function_handlers in clean staged worktree Tested: turn-orchestrator cargo test --test run_resume in clean staged worktree Tested: harness/web npm test -- --run src/components/ApprovalRow.test.tsx Not-tested: Full browser-driven approval flow across orchestrator, approval-gate, shell, and web
No description provided.