Skip to content

feat(agent-guard): deny commands that disable InnerWarden itself#1127

Merged
maiconburn merged 1 commit into
mainfrom
feat/agent-guard-self-tamper-detect
Jun 27, 2026
Merged

feat(agent-guard): deny commands that disable InnerWarden itself#1127
maiconburn merged 1 commit into
mainfrom
feat/agent-guard-self-tamper-detect

Conversation

@maiconburn

Copy link
Copy Markdown
Collaborator

What

The in-path command guard (POST /api/agent/check-command, the MCP innerwarden_check_command tool, and agent proxy --mode guard) previously returned allow / risk 0 for commands that disable InnerWarden — systemctl stop/mask innerwarden-*, pkill -f innerwarden, innerwarden uninstall, rm/truncate of its binaries/config/data/pinned-eBPF. So an AI coding agent wired through the guardrail could be talked into turning the monitor off without the guard objecting.

Change

New security_tooling_tamper signal (score 60 → deny) in crates/agent-guard:

  • threats::check_security_tamper + SECURITY_TAMPER_INDICATORS + INNERWARDEN_SELF_PATHS, wired into analyze_command.
  • Denies InnerWarden self-disable/removal and the universal defense-evasion verbs (stop auditd, setenforce 0, auditctl -e 0, disable AppArmor; MITRE T1562/T1489).
  • File removal requires a destructive verb AND an InnerWarden path, so reads/greps under /etc/innerwarden stay allowed.
  • Status reads and restarts (innerwarden get status, systemctl status/restart innerwarden-agent) are not flagged.

Why

Closes the command-layer half of the self-tamper gap found in the 2026-06-27 AI-coding-agent guardrail evaluation (a maintenance-framed systemctl stop was scored allow). The kernel-side mitre_hunt uid-0 self-stop carve-out is tracked separately.

Tests

  • analyze_command_flags_innerwarden_self_disable — deny + high + security_tooling_tamper across service-stop/mask, pkill/killall, CLI uninstall/disable, rm/truncate of IW paths.
  • analyze_command_flags_host_monitor_disable — deny on auditd/AppArmor/SELinux disable.
  • analyze_command_allows_innerwarden_status_read — no deny / no tamper signal on status reads + restart.

make test green; cargo fmt --all + clippy -p innerwarden-agent-guard -D warnings clean.

🤖 Generated with Claude Code

The check-command guard (POST /api/agent/check-command, the MCP
innerwarden_check_command tool, and the agent proxy) scored InnerWarden
self-disable commands as allow/risk 0, so an AI coding agent wired
in-path could be talked into turning the monitor off. Add a
security_tooling_tamper signal (score 60 -> deny): systemctl
stop/disable/mask/kill innerwarden, pkill/killall innerwarden,
innerwarden uninstall/disable, and rm/truncate of InnerWarden's
binaries/config/data/pinned-eBPF (destructive verb AND an IW path), plus
the universal defense-evasion verbs (stop auditd, setenforce 0,
auditctl -e 0, disable apparmor). Status reads and restarts stay
allowed. Closes the command-layer half of the self-tamper gap from the
2026-06-27 AI-coding-agent guardrail evaluation. Tests pin deny on the
tamper + host-monitor sets and allow on the benign reads/restart.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@maiconburn maiconburn requested a review from esteves-uk as a code owner June 27, 2026 23:05
@codecov

codecov Bot commented Jun 27, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@maiconburn maiconburn merged commit d34feba into main Jun 27, 2026
21 checks passed
@maiconburn maiconburn deleted the feat/agent-guard-self-tamper-detect branch June 27, 2026 23:34
maiconburn added a commit that referenced this pull request Jun 28, 2026
…#1132)

Bump workspace version 0.15.29 -> 0.15.30 and roll [Unreleased] into
[0.15.30]. Contents:
- agent-guard now denies commands that disable InnerWarden itself (#1127)
- innerwarden agent install-hook: enforcing PreToolUse hook for Claude Code (#1129)
- admin-action audit filename in UTC, not local time (#1128)

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants