feat(agent-guard): deny commands that disable InnerWarden itself#1127
Merged
Conversation
The check-command guard (POST /api/agent/check-command, the MCP innerwarden_check_command tool, and the agent proxy) scored InnerWarden self-disable commands as allow/risk 0, so an AI coding agent wired in-path could be talked into turning the monitor off. Add a security_tooling_tamper signal (score 60 -> deny): systemctl stop/disable/mask/kill innerwarden, pkill/killall innerwarden, innerwarden uninstall/disable, and rm/truncate of InnerWarden's binaries/config/data/pinned-eBPF (destructive verb AND an IW path), plus the universal defense-evasion verbs (stop auditd, setenforce 0, auditctl -e 0, disable apparmor). Status reads and restarts stay allowed. Closes the command-layer half of the self-tamper gap from the 2026-06-27 AI-coding-agent guardrail evaluation. Tests pin deny on the tamper + host-monitor sets and allow on the benign reads/restart. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
esteves-uk
approved these changes
Jun 27, 2026
maiconburn
added a commit
that referenced
this pull request
Jun 28, 2026
…#1132) Bump workspace version 0.15.29 -> 0.15.30 and roll [Unreleased] into [0.15.30]. Contents: - agent-guard now denies commands that disable InnerWarden itself (#1127) - innerwarden agent install-hook: enforcing PreToolUse hook for Claude Code (#1129) - admin-action audit filename in UTC, not local time (#1128) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
The in-path command guard (
POST /api/agent/check-command, the MCPinnerwarden_check_commandtool, andagent proxy --mode guard) previously returned allow / risk 0 for commands that disable InnerWarden —systemctl stop/mask innerwarden-*,pkill -f innerwarden,innerwarden uninstall,rm/truncateof its binaries/config/data/pinned-eBPF. So an AI coding agent wired through the guardrail could be talked into turning the monitor off without the guard objecting.Change
New
security_tooling_tampersignal (score 60 →deny) incrates/agent-guard:threats::check_security_tamper+SECURITY_TAMPER_INDICATORS+INNERWARDEN_SELF_PATHS, wired intoanalyze_command.stop auditd,setenforce 0,auditctl -e 0, disable AppArmor; MITRE T1562/T1489)./etc/innerwardenstay allowed.innerwarden get status,systemctl status/restart innerwarden-agent) are not flagged.Why
Closes the command-layer half of the self-tamper gap found in the 2026-06-27 AI-coding-agent guardrail evaluation (a maintenance-framed
systemctl stopwas scoredallow). The kernel-sidemitre_huntuid-0 self-stop carve-out is tracked separately.Tests
analyze_command_flags_innerwarden_self_disable— deny + high +security_tooling_tamperacross service-stop/mask, pkill/killall, CLI uninstall/disable, rm/truncate of IW paths.analyze_command_flags_host_monitor_disable— deny on auditd/AppArmor/SELinux disable.analyze_command_allows_innerwarden_status_read— no deny / no tamper signal on status reads + restart.make testgreen;cargo fmt --all+clippy -p innerwarden-agent-guard -D warningsclean.🤖 Generated with Claude Code