fix: refactor to use eBPF over iptables for better routing control#249
fix: refactor to use eBPF over iptables for better routing control#249schmitthub merged 32 commits intomainfrom
Conversation
…es ordering When multiple config layers define SSH rules for different Git providers (e.g., github.com and gitlab.com both on port 22), the store insertion order determined which container could connect — first-match-wins in iptables meant only the first rule's Envoy listener received traffic. FormatPortMappings now prepends rules from the most-local config layer before the store rules, so normalizeAndDedup's first-seen-wins behavior gives the current project's SSH rules iptables priority. Also renames internal/docker/dockertest to internal/docker/mock and adds a mock_moby.go helper for lightweight moby client mocking without whail. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR aims to ensure the current project’s TCP/SSH firewall rules take priority in iptables ordering (so local SSH access works even when the global/store rules contain overlapping ports), while also renaming the Docker test fake package to internal/docker/mock and introducing a lightweight Moby client mocking utility for tests.
Changes:
- Adjust firewall TCP mapping generation to prepend the most-local project config layer rules before store rules (first-seen-wins dedup → local priority).
- Rename
internal/docker/dockertest→internal/docker/mockand update call sites across tests. - Add firewall testing utilities (
firewall/mocks.NewTestManager) plus new tests validating TCP mapping priority behavior.
Reviewed changes
Copilot reviewed 33 out of 33 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| test/e2e/harness/factory.go | Updates imports/usage to the renamed Docker fake client package. |
| internal/firewall/priority_test.go | Adds tests asserting local-layer TCP/SSH rule ordering wins over stored rules. |
| internal/firewall/mocks/stubs.go | Adds a test helper that builds a real firewall.Manager backed by a mocked Moby client. |
| internal/firewall/manager.go | Changes TCP mapping computation to prioritize local-layer rules; exposes FormatPortMappings and adds local-layer rule extraction helper. |
| internal/docker/wirebuildkit_test.go | Updates to use the renamed Docker fake client package. |
| internal/docker/mock/mock_moby.go | Adds an HTTP-transport-based mock helper for Moby client creation/version negotiation. |
| internal/docker/mock/helpers.go | Renames package to mock. |
| internal/docker/mock/fake_client.go | Renames package to mock. |
| internal/docker/mock/fake_client_test.go | Updates tests for the renamed Docker fake client package. |
| internal/cmd/loop/shared/runner_test.go | Updates imports/types to internal/docker/mock. |
| internal/cmd/loop/shared/concurrency_test.go | Updates fixtures and fake client usage to internal/docker/mock. |
| internal/cmd/image/build/build_progress_test.go | Updates fake Docker client usage to internal/docker/mock. |
| internal/cmd/container/wait/wait_test.go | Updates fake Docker client usage to internal/docker/mock. |
| internal/cmd/container/update/update_test.go | Updates fake Docker client usage to internal/docker/mock. |
| internal/cmd/container/unpause/unpause_test.go | Updates fake Docker client usage to internal/docker/mock. |
| internal/cmd/container/top/top_test.go | Updates fake Docker client usage to internal/docker/mock. |
| internal/cmd/container/stop/stop_test.go | Updates fake Docker client usage to internal/docker/mock. |
| internal/cmd/container/stats/stats_test.go | Updates fake Docker client usage to internal/docker/mock. |
| internal/cmd/container/start/start_test.go | Updates fake Docker client usage to internal/docker/mock. |
| internal/cmd/container/shared/init_test.go | Updates fake Docker client usage to internal/docker/mock. |
| internal/cmd/container/run/run_test.go | Updates fake Docker client usage to internal/docker/mock. |
| internal/cmd/container/restart/restart_test.go | Updates fake Docker client usage to internal/docker/mock. |
| internal/cmd/container/rename/rename_test.go | Updates fake Docker client usage to internal/docker/mock. |
| internal/cmd/container/remove/remove_test.go | Updates fake Docker client usage to internal/docker/mock. |
| internal/cmd/container/pause/pause_test.go | Updates fake Docker client usage to internal/docker/mock. |
| internal/cmd/container/logs/logs_test.go | Updates fake Docker client usage to internal/docker/mock. |
| internal/cmd/container/list/list_test.go | Updates fake Docker client usage to internal/docker/mock. |
| internal/cmd/container/kill/kill_test.go | Updates fake Docker client usage to internal/docker/mock. |
| internal/cmd/container/inspect/inspect_test.go | Updates fake Docker client usage to internal/docker/mock. |
| internal/cmd/container/exec/exec_test.go | Updates fake Docker client usage to internal/docker/mock. |
| internal/cmd/container/create/create_test.go | Updates fake Docker client usage to internal/docker/mock. |
| internal/cmd/container/cp/cp_test.go | Updates fake Docker client usage to internal/docker/mock. |
| internal/cmd/container/attach/attach_test.go | Updates fake Docker client usage to internal/docker/mock. |
Comments suppressed due to low confidence (3)
internal/docker/mock/fake_client.go:15
- The package-level doc comment still refers to "Package dockertest" and shows usage with dockertest.NewFakeClient / dockertest.RunningContainerFixture, but the package was renamed to internal/docker/mock. Update the comment and examples to use the new package name to avoid confusion for future test authors.
internal/cmd/container/stop/stop_test.go:306 - The helper function parameter is named
mock, which shadows the importedmockpackage identifier in this scope and makes it harder to read/extend the helper (and can cause accidental compile errors if the package is referenced inside). Rename the parameter to something specific likesbMock/bridgeMockto avoid shadowing.
// --- Per-package test helpers ---
func testFactory(t *testing.T, fake *mock.FakeClient, mock *sockebridgemocks.SocketBridgeManagerMock) (*cmdutil.Factory, *bytes.Buffer, *bytes.Buffer, *bytes.Buffer) {
t.Helper()
tio, in, out, errOut := iostreams.Test()
f := &cmdutil.Factory{
IOStreams: tio,
Logger: func() (*logger.Logger, error) { return logger.Nop(), nil },
Client: func(_ context.Context) (*docker.Client, error) {
return fake.Client, nil
},
Config: func() (config.Config, error) {
return configmocks.NewBlankConfig(), nil
},
}
if mock != nil {
f.SocketBridge = func() socketbridge.SocketBridgeManager {
return mock
}
internal/cmd/container/remove/remove_test.go:327
- The helper function parameter is named
mock, which shadows the importedmockpackage identifier in this scope and makes the helper harder to read/extend. Rename the parameter (e.g.,sbMock/bridgeMock) to avoid identifier shadowing.
// --- Per-package test helpers ---
func testFactory(t *testing.T, fake *mock.FakeClient, mock *sockebridgemocks.SocketBridgeManagerMock) (*cmdutil.Factory, *bytes.Buffer, *bytes.Buffer, *bytes.Buffer) {
t.Helper()
tio, in, out, errOut := iostreams.Test()
f := &cmdutil.Factory{
IOStreams: tio,
Logger: func() (*logger.Logger, error) { return logger.Nop(), nil },
Client: func(_ context.Context) (*docker.Client, error) {
return fake.Client, nil
},
Config: func() (config.Config, error) {
return configmocks.NewBlankConfig(), nil
},
}
if mock != nil {
f.SocketBridge = func() socketbridge.SocketBridgeManager {
return mock
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…affic routing Replace all iptables-based firewall routing with eBPF cgroup hooks that attach per-container from outside, enabling DNS-aware per-domain TCP routing and eliminating the iptables ordering problem for multi-provider SSH. New internal/ebpf/ package: - 5 BPF programs (connect4, sendmsg4, connect6, sendmsg6, sock_create) - Go manager with Load/OpenPinned/Enable/Disable/Bypass/UpdateDNSCache - Container binary (ebpf-manager) for init/enable/disable/bypass commands - Compiled BPF bytecode checked in (arm64 + x86) Firewall manager changes: - Enable/Disable/Bypass rewritten to use eBPF via docker exec ebpf-manager - eBPF manager container added to EnsureRunning alongside Envoy/CoreDNS - emitAgentMapping (Loki push hack) deleted — eBPF metrics replace it - FormatPortMappings/localLayerFirewallRules deleted — ordering problem gone Container image changes: - firewall.sh deleted (306 lines of iptables rules) - iptables/ipset packages removed from Dockerfile template - No container capabilities needed (no NET_ADMIN/NET_RAW) - Entrypoint simplified (eBPF attaches from outside) Documentation updated across 20+ files (design docs, CLAUDE.md files, Mintlify docs, CLI reference, README, SECURITY, plugin reference). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 99 out of 103 changed files in this pull request and generated 7 comments.
Comments suppressed due to low confidence (2)
internal/docker/mocks/fake_client.go:12
- The package doc comment and examples refer to
mock(singular) but the actual package name ismocks. This makes godoc misleading and the usage snippet won’t compile as written; update the comment/examples to usemocks.NewFakeClient/mocks.RunningContainerFixture.
internal/docker/mocks/helpers.go:522 - The example in the
SetupBuildKitdoc comment usesmock.NewFakeClient, which doesn’t match the package name (mocks). Update the snippet so it compiles and matches other docs (mocks.NewFakeClient).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…er.c Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ll lifecycle fixes The eBPF firewall was completely broken on ARM64 (Docker Desktop) due to IPToUint32/CIDRToAddrMask using binary.BigEndian instead of NativeEndian. ctx->user_ip4 stores network-order bytes as a native uint32 — BigEndian produced a different value, causing every IP comparison to fail and all traffic to misroute through the Envoy catch-all. Key fixes: - IPToUint32/Uint32ToIP/CIDRToAddrMask: BigEndian → NativeEndian - Route JSON tags: ebpf.Route fields lacked json tags, causing zero-value deserialization (SSH routes silently dropped) - SO_MARK loop prevention in connect4/sendmsg4 (iximiuz pattern) - Host proxy IP resolved from Docker network (not macOS host) via ebpf-manager resolve command, sourced from project config not container env - Stale BPF link cleanup on Load() — removes all pinned links before loading new programs - recvmsg4 added to Disable() link cleanup list - dns_cache pre-seeded during Enable() for per-domain TCP routing (stopgap until CoreDNS plugin is built) Firewall lifecycle: - firewall down: waits for daemon exit, then belt-and-suspenders Stop() - EnsureDaemon: health-probes Envoy before trusting PID — kills and respawns unhealthy daemons (fixes restart race) - container stop/remove: calls Firewall.Disable() to clean BPF map entries - ebpf container: StopTimeout=1s (sleep infinity ignores SIGTERM) - make restart: full rebuild + nuke firewall containers/image Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 102 out of 108 changed files in this pull request and generated 9 comments.
Comments suppressed due to low confidence (2)
internal/docker/mocks/fake_client.go:15
- The package comment/example still refers to
mock(e.g.,fake := mock.NewFakeClient(cfg)), but the package ismocks. This will confuse users and appears incorrect in generated docs; update the text/examples to usemocks.NewFakeClientandmocks.RunningContainerFixture.
internal/docker/mocks/helpers.go:522 - The example in this comment uses
mock.NewFakeClient(cfg), but the package ismocks. Update the example tomocks.NewFakeClient(cfg)to keep docs accurate.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
CoreDNS plugin that intercepts DNS responses and populates the eBPF
dns_cache map with resolved IP → domain hash mappings in real time.
This replaces the racy dns_cache seeding approach where IPs resolved
at Enable() time could differ from the IPs containers actually connect
to (DNS round-robin).
Plugin architecture:
- Handler: nonwriter pattern to capture responses from the forward plugin,
extracts A records, writes IP→{domain_hash, TTL} to pinned BPF map
- MapWriter interface: enables test injection (recordingMap fake)
- BPFMap: production implementation wrapping cilium/ebpf pinned map
- Zone-based domain hashing: wildcard zones (.example.com) hash to the
zone name, not the subdomain, matching the route_map key format
- Graceful degradation: nil/failed BPF map → DNS resolution continues,
map writes silently skipped
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Create the build infrastructure for a custom CoreDNS binary that includes the dnsbpf plugin (real-time BPF dns_cache population from DNS responses). - cmd/coredns-clawker/main.go: custom CoreDNS entrypoint importing stock CoreDNS + dnsbpf plugin, prepended to dnsserver.Directives - Dockerfile.coredns: multi-stage build for standalone/CI builds - internal/firewall/coredns_embed.go: go:embed for pre-compiled binary - Makefile: coredns-binary target, wired into clawker-build and all cross-platform build targets - internal/firewall/manager.go: unified embeddedImageSpec pattern (replaces separate ebpf/coredns build functions), ensureCorednsImage wired into EnsureRunning, SHA-pinned Alpine base images in inline Dockerfiles, errdefs.IsNotFound discrimination for ImageInspect errors, JSON stream error checking for ImageBuild Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire the custom CoreDNS image (with real-time BPF dns_cache population)
into the firewall stack, replacing the stock CoreDNS image and the
startup-time DNS seed approach.
Container config changes:
- corednsContainerConfig: switched to corednsImageTag (custom build),
added /sys/fs/bpf bind mount + CAP_BPF + CAP_SYS_ADMIN capabilities
- Removed stock corednsImage constant (no longer used)
Lifecycle ordering:
- EnsureRunning: eBPF container + init now runs BEFORE CoreDNS starts
(dnsbpf plugin opens the pinned dns_cache map on startup)
- regenerateAndRestart: added ebpfExec("init") before CoreDNS restart
to maintain the same BPF map existence guarantee on reload
Corefile generation:
- Added dnsbpf directive to per-domain forward zones and internal host
zones (after log, before forward). Catch-all zone excluded (NXDOMAIN
responses have no A records)
Cleanup:
- Removed dns_cache seed code from ebpf-manager runEnable (replaced by
real-time dnsbpf plugin population)
- Removed Domain field from ebpfRoute and ebpf.Route (only used for
now-removed seed code)
- Fixed restartContainer nil-error wrapping bug (%!w(<nil>))
- Fixed dnsbpf sync.Once + OnShutdown lifecycle bug (FD invalidation
after Corefile reload)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three architectural fixes found during UAT: BPF: Global route_map (not per-container) - route_key no longer includes cgroup_id — routes are shared across all enforced containers. container_map presence = enforcement check. - Eliminates 1×N route duplication and enables live rule sync. BPF: Fix connect6 IPv4-mapped bypass (security fix) - Dual-stack sockets (SSH, curl, Node.js) use AF_INET6 with ::ffff:x.x.x.x addresses. Previously connect6 just returned 1 (allow) for these, completely bypassing the firewall. Now applies the same routing logic as connect4: DNS redirect, loopback/subnet checks, dns_cache + route_map lookup, catch-all to Envoy egress. Global route sync: - New ebpf-manager sync-routes command replaces per-container route population in enable. Called from EnsureRunning (startup), regenerateAndRestart (rule changes), and Enable (per-container safety). - firewall add/remove/reload now immediately update BPF routing for all running containers. CoreDNS: Import standard plugins explicitly - coremain.Run() does NOT transitively import plugins from zplugin.go. Added explicit imports for forward, health, log, reload, template. Avoids the full core/plugin import which drags in Azure/AWS/K8s deps. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three critical firewall bugs discovered during UAT:
BPF: sendmsg6 IPv4-mapped bypass (security fix)
- cgroup/sendmsg6 had the same IPv4-mapped bypass as connect6 did
previously — unconnected UDP via AF_INET6 dual-stack sockets to
::ffff:127.0.0.11:53 (glibc resolver, nslookup) hit Docker's
embedded DNS directly, bypassing CoreDNS entirely.
- Fixed sendmsg6 to apply the same routing as sendmsg4 for
IPv4-mapped addresses: DNS redirect to CoreDNS, loopback/subnet
allow, deny other UDP.
- Added new cgroup/recvmsg6 program (mirrors recvmsg4) that rewrites
DNS response source from CoreDNS back to ::ffff:127.0.0.11 so the
resolver accepts it. Attached via ebpf.AttachCGroupUDP6Recvmsg.
- Updated header comment to reflect 7 cgroup programs.
firewall: remove init from hot-reload paths (critical correctness fix)
- Enable() and regenerateAndRestart() were calling ebpfExec("init")
which runs Load() → cleanupAllLinks(). This detached BPF programs
from ALL running containers, disabling enforcement completely.
Symptoms: firewall add/remove would wipe every container's BPF
attachments, causing traffic to bypass the firewall entirely until
the container was restarted.
- Programs are loaded by EnsureRunning at daemon startup and pinned
to the host bpffs — they persist for the daemon lifetime. No
re-init needed for rule changes or per-container enable.
- syncRoutes alone is sufficient for rule updates; it uses OpenPinned,
not Load, so it doesn't touch link attachments.
docs: refresh architecture and design for dnsbpf/global-route_map era
- Updated .claude/docs/ARCHITECTURE.md firewall section to describe
the 3-container stack (envoy + custom coredns + ebpf manager),
embedded image pattern, global route_map, sync-routes flow, stale
pin recovery, and custom CoreDNS CAP_BPF requirements.
- Added internal/dnsbpf and internal/ebpf package descriptions.
- Updated Package Import DAG to include ebpf (foundation) and dnsbpf
(domain).
- Updated .claude/docs/DESIGN.md §7.2 to reflect custom CoreDNS,
global route_map, dual-stack IPv6 handling, startup ordering
invariant, live hot-reload via syncRoutes, and the CoreDNS
container's CAP_BPF + CAP_SYS_ADMIN + /sys/fs/bpf mount.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Track known work items with enough context for fresh agents to pick up: - firewall enable --agent cgroup path bug (small fix) - Native IPv6 support (large initiative) - eBPF container as a proper service (replace sleep infinity) - OTel monitoring from eBPF (metrics_map scrape + event stream) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 116 out of 122 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (2)
internal/docker/mocks/fake_client.go:15
- Package/docs naming mismatch: this file declares
package mocks, but the doc comment and examples refer toPackage mockandmock.NewFakeClient(...). Update the comment/examples to usemocksso godoc and copy/paste usage are correct.
internal/docker/mocks/helpers.go:522 - The example in this comment uses
mock.NewFakeClient(...), but the package name ismocks(and there is nomockidentifier). Update the example so it compiles when copied (e.g.,mocks.NewFakeClient(...)).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
CI failures all share the same root cause: internal/firewall embeds
assets/ebpf-manager and assets/coredns-clawker via go:embed, both
gitignored but never built by the test/lint/docs targets. A stale
ebpf-manager binary lingered in git, masking the issue for that file
until coredns-clawker was added.
- Makefile: test/test-ci/docs/docs-check now depend on ebpf-binary
and coredns-binary so a clean checkout compiles
- .github/workflows/lint.yml: build the embedded binaries before
golangci-lint so typecheck can resolve the embed directives
- Drop the stale tracked ebpf-manager binary (gitignored, built on
demand)
- internal/firewall/manager.go: replace len(args)+1 preallocation in
ebpfExec/ebpfExecOutput with a literal-prefixed append to silence
the CodeQL "size computation may overflow" high-severity alert
- internal/cmd/firewall/down.go: nil-check opts.Firewall before use
so unit tests with a partial Factory don't panic on the
belt-and-suspenders container cleanup
- internal/ebpf/bpf/common.h: correct the dns_entry.expire_ts comment
to reflect that userspace writes/reads it as wall-clock seconds
(the BPF hot path never inspects it — only GarbageCollectDNS does)
- internal/ebpf/bpf/{clawker.c,common.h}: document the GPL-2.0 SPDX
license identifier (required for the BPF helpers used; the rest of
the repo stays MIT; the .o is loaded at runtime, not linked)
- internal/ebpf/gen.go: pin bpf2go to v0.21.0 to match go.mod per the
repo's version-pinning policy
- internal/ebpf/types.go: remove duplicate IPToUint32 doc line
- README.md: reword the "one domain per TCP/SSH port" bullet to drop
the self-contradictory "per-domain" phrasing
- .claude/docs/TESTING-REFERENCE.md: fix stale mock/ reference
- internal/cmd/container/run/run_test.go: update stale dockertest
reference to internal/docker/mocks.FakeClient
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…irewall daemons Previously EnsureDaemon silently discarded StopDaemon errors and blindly spawned a new daemon, which could result in two firewall daemons running concurrently if SIGTERM delivery failed or the old process didn't exit in time. Return the error from StopDaemon and re-check IsDaemonRunning after WaitForDaemonExit so we refuse to spawn a second daemon when the old one is still alive. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ion tests Follow-up to the e770cf2 competing-daemon fix: a full audit surfaced the same swallow-error pattern across the firewall shutdown path and several adjacent code paths. This commit closes the cluster and locks the behavior in with regression tests for the three critical zero-coverage gaps. Silent failures closed: - Daemon.Run teardown now propagates manager.Stop errors instead of returning nil — previously a failed stack teardown would exit the daemon cleanly, PID file removed, letting the next EnsureDaemon spawn a second daemon that collides with leftover containers. - Manager.stopAndRemove propagates ContainerList errors instead of returning nil, plugging the upstream leak that feeds Daemon.Run's competing-daemon race. - Manager.isContainerRunning gains an error-returning isContainerRunningE variant used by Status/regenerateAndRestart; public IsRunning() keeps the bool contract but now logs Warn on swallow. - Manager.cgroupDriver returns (string, error) and propagates — no more silent fallback to cgroupfs on systemd hosts when Info() blips. - Manager.Enable propagates touchSignalFile and host-proxy resolve errors instead of leaving the container's entrypoint hanging forever on a signal file that will never appear. - firewall down no longer early-returns on a missing PID file; the belt-and-suspenders fwMgr.Stop() always runs so crashed-daemon stale containers get cleaned up before the next firewall up. - firewall down surfaces real Stop errors to stderr with a warning icon instead of the previous misleading `_ = fwMgr.Stop(ctx)`. - Container stop/remove surface a visible warning on ios.ErrOut when firewall Disable fails, so users learn about leaked BPF link pins instead of only seeing them in the daemon log. - ebpf Manager.Enable's BypassMap.Delete routes through a new clearBypass helper that gates on ErrKeyNotExist and Warn-logs any other error — matches the pre-existing Disable pattern. - ebpf Manager.SyncRoutes collects per-key Delete/Update errors via errors.Join instead of reporting success on partial failure. - daemon.watchContainers exits after N consecutive ContainerList errors instead of looping forever on a permanently broken Docker connection. - e2e harness dockerListByLabel returns ([]string, error); all four callers t.Logf failures so test-run leaks become attributable. - e2e harness cleanupTestEnvironment builds the firewall label via cfg.LabelPurpose()/cfg.PurposeFirewall() instead of hardcoding dev.clawker.purpose=firewall. Regression coverage for the three zero-coverage critical production fixes: - internal/firewall/daemon_test.go (new): 7 ensureDaemonWith cases via a new daemonDeps DI seam locking in the competing-daemon guard (StopDaemon error → no spawn, hung daemon → refuse second daemon, clean-exit recovery), plus WaitForDaemonExitReport semantics with real subprocess lifecycle. - internal/firewall/manager_test.go (new): full coverage of resolveContainerID name→ID resolution via whailtest.FakeAPIClient with stubbed ContainerInspect; Bypass/Enable/Disable tests assert the downstream ebpfExec sees the canonical long ID after a friendly name input. 64-hex short-circuit asserts ContainerInspect is not called. Regression tests for all the new error-propagation paths (Stop/Status/cgroupDriver/touchSignalFile/host-proxy resolve). - internal/ebpf/manager_test.go: new bypassMap interface seam + fakeBypassMap fake exercise clearBypass with existing/missing/ EPERM/EINVAL/ENOMEM/wrapped-EPERM inputs; TestCgroupID_RejectsMaliciousPath asserts the sanitizer fires before os.Open on 8 adversarial paths. - internal/cmd/firewall/down_test.go (new): 5 tests exercising downRun directly — daemon-running (real child process), daemon- not-running (no orphans, orphans-cleaned, cleanup-error), and firewall-factory-error paths. - Container stop/remove tests: TestStopRun_FirewallDisableErrorDoesNotFailStop and its remove sibling now assert errOut contains the "firewall disable failed" / "BPF resources may leak" warning strings. Test surface: 4466 unit tests pass (up from 4462), 0 failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 124 out of 129 changed files in this pull request and generated 6 comments.
Comments suppressed due to low confidence (2)
internal/docker/mocks/fake_client.go:15
- Package-level doc comment and examples refer to
mock.*(package name + constructor), but the actual package ismocksandNewFakeClientrequires aconfig.Configargument. Update the comment/examples so they match the exported API (mocks.NewFakeClient(cfg)/mocks.RunningContainerFixture(...)).
internal/docker/mocks/helpers.go:523 - This example uses
mock.NewFakeClient(cfg), but the package name ismocks(and callers typically usemocks.NewFakeClient(cfg)or justNewFakeClient(cfg)within the package). As written, the example won’t compile if copied.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Two follow-ups to the silent-failure cluster fix in 2840de2 — same class of bug the reviewers flagged as out-of-scope, now addressed. Manager.Status (internal/firewall/manager.go): The discoverNetwork error was silently dropped via `netInfo, _ :=`, leaving Status reporting an empty network block on any Docker API failure. After the C2/C4 fixes in 2840de2 propagated isContainerRunningE and cgroupDriver errors, this drop became the last silent-fail on the Status path and was visually inconsistent with its surroundings. The legitimate "firewall not brought up yet" case (network missing) uses a NotFound error from NetworkInspect and is the ONLY branch we silently continue through — every other NetworkInspect error (daemon unreachable, permission denied, ...) propagates. This matches the pattern used by ensureNetwork at manager.go:69. ebpf Manager.GarbageCollectDNS (internal/ebpf/manager.go): Return value now reflects entries actually cleared from the map, not `len(expired)` (the iterate-pass count). Previously a run where every delete hit EPERM would still claim "N entries collected", making the metric a lie and hiding ongoing failures from metrics consumers. ErrKeyNotExist counts as cleared (end-state matches intent — the entry is gone, typically because the dnsbpf CoreDNS plugin raced us between Iterate and Delete); real delete failures (EPERM, ENOMEM, ...) are logged at Debug and excluded from the count. Extracted the delete loop into deleteExpiredDNSEntries for testability, reusing the existing bypassMap interface seam since both ebpf maps satisfy the structural Delete(key any) error contract. Regression tests: - TestStatus_PropagatesDiscoverNetworkError: non-NotFound NetworkInspect errors must reach the caller (new). - TestStatus_NetworkNotFoundIsNotAnError: the inverse — NotFound must NOT become an error, so `firewall status` still works before the first `firewall up` (new). - TestDeleteExpiredDNSEntries_CountsOnlyRealAndENOENTSuccess: forced EPERM path yields 0 cleared, happy path yields expected count (new). - TestDeleteExpiredDNSEntries_EmptyReturnsZero: boundary case for the no-expired-entries branch (new). - fakeDNSMap: new test fake for uint32-keyed BPF maps (the existing fakeBypassMap only handles uint64 bypass keys). Test surface: 4470 unit tests pass (4466 + 4 new), 0 failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The connect4 / connect6 / sendmsg4 / sendmsg6 / recvmsg4 / recvmsg6
programs were near-verbatim copies of each other, differing only in
whether the redirect target landed in ctx->user_ip4 or ctx->user_ip6[3].
Every future edit to routing logic had to be made in two places, and
metric emission had already drifted (bypass metrics fired in connect*
but were silently dropped in sendmsg*).
Consolidate the shared logic behind three helpers in common.h:
* enter_enforced() — uid/cgroup/bypass/container_map preamble,
with a check_bypass knob so recvmsg* skips
bypass accounting while everything else emits
ACTION_BYPASS consistently (fixes the drift)
* decide_connect() — full IPv4 TCP/UDP routing decision, returned
as a route_result the caller applies to the
appropriate ctx field
* decide_sendmsg() — UDP-only subset for sendmsg4 / sendmsg6
Each program in clawker.c is now a thin shim that calls enter_enforced,
calls decide_* (or the recvmsg source-rewrite helper), and translates
the verdict into a return value via apply_v4 / apply_v6_mapped. Magic
numbers 0x7f00000b, 0x0000ffff, and 53 are replaced with named constants
(DOCKER_EMBEDDED_DNS, IPV4_MAPPED_PREFIX, DNS_PORT). is_ipv4_mapped()
and is_ipv6_loopback() replace duplicated inline prefix checks.
While here, drop the committed vmlinux.h entirely. Clawker's BPF
programs touch only stable kernel UAPI (struct bpf_sock_addr,
struct bpf_sock, BPF_MAP_TYPE_*, LIBBPF_PIN_BY_NAME) and never use
CO-RE relocations (BPF_CORE_READ, preserve-access-index), so the 6 MB
kernel BTF dump is unused overhead. common.h now pulls types from
<linux/bpf.h> + <linux/types.h> + <stdbool.h> via linux-libc-dev,
which is pinned via apt in the new builder image.
To anchor the committed BPF bytecode to a reproducible recipe instead
of trust-on-first-use, add Dockerfile.bpf-builder with all inputs
pinned (debian:bookworm-slim by SHA256, clang/libbpf-dev/linux-libc-dev
by exact apt versions, golang toolchain by digest, bpf2go version via
gen.go). Two new Makefile targets drive it:
* make bpf-regenerate — runs go generate in the pinned image
* make bpf-verify — regenerates and diffs against committed
bytecode; fails on any drift
A new bpf-reproducibility job in .github/workflows/test.yml runs
bpf-verify on every PR so the committed .o files are continuously
verified against the pinned recipe. The job skips gracefully while
the base image digest is still a PIN_ME_BEFORE_MERGE placeholder so
the rest of the PR can land before the digest refresh; see the entry
added to .serena/memories/outstanding-features.md for the two
remaining follow-ups (pin the digest, wire SLSA attestation into the
release workflow).
internal/ebpf/REPRODUCIBILITY.md documents the full provenance chain
and the coordinated pin-update procedure. CLAUDE.md's version-pinning
table picks up a new row calling out that BPF bytecode regeneration
goes through Dockerfile.bpf-builder and is enforced by bpf-verify.
All 4470 unit tests pass after regeneration. E2E firewall tests
(test/e2e/firewall_test.go) exercise the BPF fast path end-to-end
with real Docker containers and must be re-run locally before merge
to confirm the refactored routing preserves kernel-side behavior.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…BPF artifacts
The previous commit in this PR introduced Dockerfile.bpf-builder as a
standalone BPF regeneration image and kept the committed clawker_*_bpfel.{go,o}
artifacts, then policed drift with a make bpf-verify CI gate. That was
wrong on both axes:
* Committed generated artifacts are trust-on-first-use no matter how
well they're anchored by a verify step — a reviewer has no realistic
way to audit a .o blob in a diff, and the gate becomes the only thing
standing between the repo and a tampered binary
* A separate Dockerfile.bpf-builder duplicated the pin surface with
Dockerfile.ebpf / Dockerfile.coredns, fracturing "what version of
clang/libbpf/Go ships in clawker" across three files
Replace the whole thing with a single reproducible build chain:
* Dockerfile.firewall consolidates Dockerfile.ebpf, Dockerfile.coredns,
and the would-be Dockerfile.bpf-builder into one file with six stages:
bpf-builder → ebpf-manager-builder → ebpf-manager-extract,
coredns-builder → coredns-extract, and bpf-bindings-extract. The
bpf-builder stage is shared — its output is COPY --from'd into both
ebpf-manager-builder and coredns-builder so coredns-clawker (which
imports internal/ebpf and needs the bpf2go types to compile) gets
the same pinned bindings as the ebpf-manager build.
* Every pin lives in one file: debian:bookworm-slim base image digest,
clang / llvm (for llvm-strip, previously missing) / libbpf-dev /
linux-libc-dev apt versions, golang:alpine toolchain digest, and
bpf2go version (the last still pinned in gen.go). All four pin layers
have to move together — the CLAUDE.md version-pinning row points at
internal/ebpf/REPRODUCIBILITY.md for the coordinated update recipe.
* Makefile ebpf-binary, coredns-binary, and bpf-bindings targets invoke
docker buildx build --target=<extract-stage> --output=type=local,dest=...
to drop the final artifacts into the host tree. Make's source→target
dep graph (via EBPF_BINARY_DEPS / COREDNS_BINARY_DEPS / BPF_BINDING_DEPS)
ensures buildx only runs when inputs actually change. Cold builds
populate buildkit layer cache; cached runs are <1s.
* Dockerfile.bpf-builder, Dockerfile.ebpf, Dockerfile.coredns, and the
committed clawker_*_bpfel.{go,o} files are all deleted. The bpf2go
Go wrappers are gitignored but extracted to the host tree via the
bpf-bindings-extract stage so gopls / go test / golangci-lint can
still analyze internal/ebpf/manager.go. Fresh clones need `make`
to run once before IDE tooling works on the ebpf package.
* make bpf-verify / bpf-regenerate / bpf-builder-image targets and the
bpf-reproducibility CI job are deleted. Nothing to verify when the
committed state has no generated artifacts to diff against — the
build itself is the reproducibility gate.
* outstanding-features entry for the reproducibility chain is rewritten
to reflect that the build-side story is complete. Only the SLSA
provenance attestation on the release binary is still outstanding,
plus a new note about replacing the long-running clawker-ebpf
container with direct control-plane BPF management when the upcoming
control plane lands.
Also corrects the "sidecar" terminology that an earlier agent slipped into
the clawker docs. The firewall stack containers are not sidecars: they
don't share network / PID / mount namespaces with user containers, one
stack serves all clawker-managed containers on the host (1:N), and the
eBPF programs live in kernel state pinned under /sys/fs/bpf/clawker/
independently of the container that loaded them. Fixed in
internal/firewall/CLAUDE.md, docs/firewall.mdx, .claude/docs/ARCHITECTURE.md,
internal/cmd/factory/default.go, two container test comments, and a
serena memory reference table. The two .claude-level memories that
include historical "sidecar" references (ebpf_firewall_docker_desktop_uat,
initiative_coredns-dnsbpf-plugin) are left alone as point-in-time
snapshots of past work.
Verified locally with a cold buildkit cache: `docker buildx prune -f &&
make test` rebuilds from scratch (base image pulls + pinned apt install +
go generate + three buildx extractions + 4470 unit tests) in ~57 seconds.
Subsequent `make test` runs with warm cache finish in ~12 seconds.
E2E firewall tests (test/e2e/firewall_test.go) still need a local
pre-merge run to confirm the refactored routing and new build path
produce a kernel-loadable binary.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 124 out of 126 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (2)
internal/docker/mocks/fake_client.go:12
- Package-level doc comment and examples refer to a non-existent
mockpackage (// Package mock ...,mock.NewFakeClient,mock.RunningContainerFixture). The actual package name ismocks, so these examples are misleading and can confuse users/search (and break copy/paste). Update the doc comment and examples to usemocksconsistently.
internal/docker/mocks/helpers.go:522 - The SetupBuildKit usage example uses
mock.NewFakeClient(cfg), but the package isinternal/docker/mocksand callers should usemocks.NewFakeClient. This example currently won’t compile when copy/pasted.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Document in outstanding-features.md that the release pipeline (.github/workflows/release.yml + .goreleaser.yaml) is broken under the new Dockerfile.firewall build chain and lists the exact breakages, the recommended fix (GoReleaser `builder: prebuilt` + Makefile cross-compile), and the things that keep working unchanged (SLSA attestation, cosign, SBOMs, homebrew tap). Must land before the next tag push. The SLSA provenance attestation step is already wired via actions/attest-build-provenance; the gap was never attestation itself, only that the build producing the artifacts it attests over was going to bypass the pinned Dockerfile.firewall entirely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two CI workflows were quietly broken by the switch to Dockerfile.firewall:
both govulncheck and the licenses freshness check load every Go package
in the module, which means `internal/firewall`'s go:embed targets
(assets/ebpf-manager, assets/coredns-clawker) and `internal/ebpf`'s
bpf2go-generated wrappers must exist on disk. They didn't — the runner
was checking out a tree with no generated artifacts and immediately
failing at compile time.
Fix both by routing through the existing `make ebpf-binary coredns-binary`
chain, same pattern lint.yml already uses:
* security.yml govulncheck job gains a `setup-go` step (it was missing
anyway — the previous govulncheck-action was setting up Go itself)
and a `Build embedded firewall binaries` step before the scan
* Makefile `licenses` and `licenses-check` targets now declare
`ebpf-binary coredns-binary` as prereqs, so `make licenses-check`
(which is what the licenses.yml workflow runs) transitively triggers
the pinned Dockerfile.firewall build before calling
gen-notice.sh → go-licenses report ./...
test.yml, lint.yml, and docs.yml already route through the Make chain
correctly (test → ebpf-binary coredns-binary, explicit build step,
docs-check → docs → ebpf-binary coredns-binary respectively) so they
weren't affected.
release.yml remains broken but is scoped for its own follow-up PR per
the existing note in .serena/memories/outstanding-features.md — it has
additional complications around per-target-arch cross-compilation that
security.yml and licenses.yml don't.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous attempt to unbreak govulncheck added `make ebpf-binary
coredns-binary` as a step before `golang/govulncheck-action`, and on its
face that should have worked. It didn't, because the govulncheck-action
does its own `actions/checkout` internally as part of its composite
implementation — which wipes the working tree after the make step
materializes `internal/firewall/assets/{ebpf-manager,coredns-clawker}`
and `internal/ebpf/clawker_*_bpfel.{go,o}`. By the time
`govulncheck ./...` actually runs the tree is back to gitignored-state
and compilation fails with `undefined: clawkerRouteKey` etc.
Two fixes, both landed here:
1. Extract the "build embedded firewall binaries" pattern into a
reusable composite action at `.github/actions/build-firewall-binaries/`.
It handles setup-go + `make ebpf-binary coredns-binary` as a single
step so every current and future job that needs those binaries can
`uses: ./.github/actions/build-firewall-binaries` with no duplication.
The action's description documents the "must run after checkout and
before any tool that re-checkouts" ordering constraint so future agents
understand why govulncheck gets the manual treatment.
2. Stop using `golang/govulncheck-action` entirely in security.yml — its
internal re-checkout is incompatible with pre-built artifacts living
on disk. Install `govulncheck` manually and run `govulncheck ./...`
directly, keeping the working tree intact throughout.
Also refactors lint.yml to consume the new composite action instead of
duplicating the setup-go + make pattern inline. test.yml, docs.yml, and
licenses.yml are left alone — they all route through Make targets
(`make test-ci`, `make docs-check`, `make licenses-check`) that
transitively depend on ebpf-binary + coredns-binary, so they don't need
the explicit composite action.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 125 out of 127 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (2)
internal/docker/mocks/fake_client.go:5
- The package doc comment doesn’t match the actual package name. Go stylecheck (ST1000) expects the comment to start with
Package mocks ...(and the usage snippet should referencemocks.NewFakeClient/mocks.RunningContainerFixture). As written (Package mock ...), this is likely to fail lint and is also confusing for readers.
internal/docker/mocks/helpers.go:521 - The usage example in this comment uses
mock.NewFakeClient, but the package isinternal/docker/mocks(plural). Update the example tomocks.NewFakeClientso it’s copy/pasteable and consistent with the renamed package.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… pin rule
Go 1.25.8 is vulnerable and the Go security team published 1.25.9 as the
fix. Bump every pinned reference project-wide:
* go.mod `go 1.25.9`
* test/adversarial/attacker-server/go.mod `go 1.25.9`
* test/adversarial/CLAUDE.md doc reference
* .clawker.yaml install hook `go install golang.org/dl/go1.25.9@latest`
* Dockerfile.firewall — three `golang:1.25.9-alpine@sha256:...` references
(bpf-builder COPY, ebpf-manager-builder FROM, coredns-builder FROM),
all pinned to the new manifest-list digest
sha256:7a00384194cf2cb68924bbb918d675f1517357433c8541bac0ab2f929b9d5447
* internal/bundler/dockerfile.go `DefaultGoBuilderImage` constant — used
to generate clawker-managed user container Dockerfiles — same digest
* internal/ebpf/REPRODUCIBILITY.md pin update section
docs/threat-model.mdx is intentionally left at `go 1.25.8` references
because it's a verification guide pinned to released tag v0.7.6, which
was built with 1.25.8. Changing it would break the reproduction recipe
for anyone verifying that specific release. The guide will be bumped
alongside the next tagged release.
Also adds a "multi-arch image pin rule" to CLAUDE.md's version-pinning
section, and an explicit manifest-list check in the REPRODUCIBILITY.md
pin update recipe. Every `@sha256:...` container image pin in the repo
must resolve to a multi-arch manifest list (OCI image index,
`application/vnd.oci.image.index.v1+json`), not a per-platform image
digest — otherwise cross-platform builds break because BuildKit can't
select a matching per-arch manifest.
Verified by running `docker buildx imagetools inspect` against all three
existing digests (debian:bookworm-slim, alpine:3.21, golang:1.25.9-alpine)
— all three return `MediaType: application/vnd.oci.image.index.v1+json`.
`DefaultGoBuilderImage` also points at the manifest-list digest.
`make test` rebuilds the whole pinned Docker chain from cold cache and
all 4470 unit tests pass under Go 1.25.9.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ixes Two real bugs surfaced by Copilot's PR review on #249: 1. Wildcard SSH/TCP rules silently bypassed enforcement. The route_map writer in TCPMappings hashed r.Dst verbatim (".example.com"), while the dnsbpf CoreDNS plugin hashes the normalized Corefile zone ("example.com"), so connect4 lookups missed for every wildcard TCP/SSH rule. Fix: normalize TCPMapping.Dst in the constructor so the route_map and dns_cache hashes agree. Adds two table-driven wildcard cases plus a TestTCPMappings_WildcardDomainHashMatchesDnsbpfZone regression guard that asserts the writer/reader hash invariant directly. (TLS rules unaffected — Envoy TLS routes by SNI filter chain, not route_map.) 2. Firewall daemon TOCTOU race on startup. Daemon.Run wrote the PID file before EnsureRunning completed, so concurrent EnsureDaemon callers could observe isDaemonRunning=true + isStackHealthy=false during normal startup and SIGTERM a still-starting daemon (flap loops, orphaned envoy/coredns/ebpf containers). Fix: write PID file after EnsureRunning succeeds — post-write, "PID present" implies "stack was healthy a moment ago", so a later unhealthy observation correctly triggers restart. Tear down the stack on PID-write failure to avoid leaking the same orphans. Also from the same review: - security.yml: pin govulncheck to v1.1.4 SHA (was @latest, violates CLAUDE.md version-pinning policy) - internal/firewall/mocks/stubs.go: delete dead code (zero callers, half-wired against a transport that returns 503 for everything; FirewallManagerMock is the canonical test double per firewall/CLAUDE.md) - internal/ebpf/gen.go: rewrite stale "committed bytecode" header to match the gitignored + pinned-Docker-build reality - internal/ebpf/types.go: package doc undercounted programs (5 → 7, add recvmsg4/recvmsg6) - CLAUDE.md: dnsbpf only writes A records, not A/AAAA - .claude/memories/* + internal/cmd/container/CLAUDE.md + .serena/memories/project-overview.md: NewFakeClient now requires a config.Config arg; updated all stale snippets/references Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 127 out of 129 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (3)
internal/docker/mocks/fake_client.go:5
- Package-level GoDoc comment uses the wrong package name ("Package mock") even though the package is named
mocks. This can confuse readers and may trip linters that enforcePackage <name>prefixes; update it toPackage mocks ....
internal/docker/mocks/fake_client.go:12 - The usage example in the package comment references
mock.NewFakeClient/mock.RunningContainerFixture, but the package ismocks. Copy/pasting this example won’t compile; update the snippet to usemocks.*.
internal/docker/mocks/helpers.go:522 - This GoDoc example uses
mock.NewFakeClient(cfg)but the package ismocks. Update the snippet tomocks.NewFakeClient(cfg)so the documentation is copy/paste-correct.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…e docs, add missing package docs Drops always-loaded context from 1522 → 1208 lines (−314), trims 4 oversized package CLAUDE.md files back under the 200-line budget, refreshes 8 stale rules/docs that had drifted from their Go sources, removes 4 completed/stale serena memories, and adds CLAUDE.md docs for internal/ebpf, internal/ebpf/cmd, internal/dnsbpf, internal/cmd/settings, and the two internal/bundler subpackages. Extracts the root Key Concepts table to .claude/docs/KEY-CONCEPTS.md and splits .claude/rules/storeui.md into a thin rule + .claude/docs/STOREUI-REFERENCE.md so the heavy reference material is lazy-loaded. No code changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 151 out of 153 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (3)
internal/docker/mocks/fake_client.go:1
- The package doc comment says
Package mockbut the actual package name ismocks. This is misleading in GoDoc and when searching for the package; update the comment toPackage mocksto match the directory/package name.
internal/docker/mocks/fake_client.go:12 - The usage example references
mock.NewFakeClient/mock.RunningContainerFixture, but the package ismocks. Update the example to usemocks.*so copy/paste works.
internal/docker/mocks/helpers.go:520 - This comment example uses
mock.NewFakeClient(cfg)but the package ismocks. Update the example tomocks.NewFakeClient(cfg)to avoid confusion.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
dockertest→docker/mock: Moves the fake client package and addsmock_moby.gofor lightweight mobyclient.APIClientmocking without whail dependency.firewall/mocks.NewTestManager(t, cfg)creates a real Manager backed by a mock moby client for rule store tests.Test plan
priority_test.goverify local-layer rules appear first in TCP mappingsmake test)🤖 Generated with Claude Code