fix: Pod IP Deletion Leak in eBPF FilterMap#2114
Conversation
Signed-off-by: Alex Castilio dos Santos <alexsantos@microsoft.com>
Signed-off-by: Alex Castilio dos Santos <alexsantos@microsoft.com>
Retina Code Coverage ReportTotal coverage increased from
|
| Impacted Files | Coverage | |
|---|---|---|
| pkg/module/metrics/metrics_module.go | 79.51% ... 81.79% (2.28%) |
⬆️ |
Decreased diff
| Impacted Files | Coverage | |
|---|---|---|
| pkg/enricher/enricher.go | 57.8% ... 56.4% (-1.4%) |
⬇️ |
Signed-off-by: Alex Castilio dos Santos <alexsantos@microsoft.com>
|
A few gaps I noticed from my investigation that the two PRs don't cover:
|
Hello @aanchal22 Thanks for the review, addressed your comments. |
There was a problem hiding this comment.
Pull request overview
Fixes an eBPF filtermap leak in the metrics module by ensuring pod IP DELETE events are processed and removed even when namespace/pod “interest” state changes after the IP was added.
Changes:
- Bypass
nsOfInterest/podOfInterestearly-return guard forEventTypePodDeletedso deletes always reachhandlePodEvent. - Make
applyDirtyPodsDeletealways attemptDeleteIPswith both request metadata types (podandnamespace) for each deleted IP. - Adjust logging of
[]net.IPfields to avoid log encoding errors, and add unit tests covering leak scenarios and IP-reuse churn.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| pkg/module/metrics/metrics_module.go | Ensures delete events aren’t dropped and deletes IPs using both metadata types; tweaks IP logging and adds an IP-reuse DELETE guard. |
| pkg/module/metrics/metrics_module_linux_test.go | Adds unit tests covering namespace/annotation changes, normal lifecycle, never-tracked deletes, and IP reuse behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| case cache.EventTypePodDeleted: | ||
| // Guard against spurious DELETE events during pod churn / IP reuse. | ||
| // The daemon cache is updated before events are published, so if a new pod | ||
| // reused this IP the cache still contains an entry. Deleting would remove a valid IP. | ||
| if endpoint := m.daemonCache.GetPodByIP(ip.String()); endpoint != nil { | ||
| m.l.Debug("Ignoring DELETE for reused IP — pod still exists in cache", | ||
| zap.String("deleted pod", pod.NamespacedName()), | ||
| zap.String("ip", ip.String()), | ||
| zap.String("cached pod", endpoint.NamespacedName())) | ||
| return | ||
| } |
There was a problem hiding this comment.
The new EventTypePodDeleted handling includes an additional guard that ignores DELETE events when daemonCache.GetPodByIP(ip) still returns an endpoint (IP reuse / churn). This behavior change isn’t mentioned in the PR description; please document it there (or in a code comment that explains the ordering assumption between cache updates and pubsub delivery) so future maintainers understand why deletes may be skipped.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| case cache.EventTypePodDeleted: | ||
| // Guard against spurious DELETE events during pod churn / IP reuse. | ||
| // The daemon cache is updated before events are published, so if a new pod | ||
| // reused this IP the cache still contains an entry. Deleting would remove a valid IP. | ||
| if endpoint := m.daemonCache.GetPodByIP(ip.String()); endpoint != nil { | ||
| m.l.Debug("Ignoring DELETE for reused IP — pod still exists in cache", | ||
| zap.String("deleted pod", pod.NamespacedName()), | ||
| zap.String("ip", ip.String()), |
There was a problem hiding this comment.
handlePodEvent uses nsOfInterest() / namespace maps when building podCacheEntry (just above this switch) without holding the module mutex. Since those maps are mutated under Module.Lock() in Reconcile()/appendIncludeList(), pod events racing with reconcile can cause concurrent map read/write panics. Consider computing Annotated/Namespaced under RLock() in PodCallBackFn and passing the booleans into handlePodEvent, or otherwise guaranteeing nsOfInterest() is only called while the module lock is held (without introducing RWMutex re-entrancy deadlocks).
| if err != nil { | ||
| m.l.Error("Error deleting pod IP from filter manager", zap.Error(err)) | ||
| } | ||
| m.l.Debug("Deleting Ips in dirty pods from filtermap", zap.String("IPs", fmt.Sprint(ipsToDelete))) |
There was a problem hiding this comment.
Log message uses "Ips"; since "IP" is an acronym, this should be "IPs" for consistency with other logs/metrics terminology and easier grepping.
| m.l.Debug("Deleting Ips in dirty pods from filtermap", zap.String("IPs", fmt.Sprint(ipsToDelete))) | |
| m.l.Debug("Deleting IPs in dirty pods from filtermap", zap.String("IPs", fmt.Sprint(ipsToDelete))) |
| fm.EXPECT().AddIPs([]net.IP{ip1}, gomock.Any(), moduleReqMetadata).Return(nil).Times(1) | ||
| fm.EXPECT().DeleteIPs([]net.IP{ip1}, gomock.Any(), moduleReqMetadata).Return(nil).Times(1) | ||
| // Allow the extra modulePodReqMetadata delete (no-op) after fix | ||
| fm.EXPECT().DeleteIPs([]net.IP{ip1}, gomock.Any(), modulePodReqMetadata).Return(nil).AnyTimes() |
There was a problem hiding this comment.
Using AnyTimes() here makes the test less strict and can hide unintended repeated DeleteIPs calls. Since the expected behavior is a single additional no-op delete attempt per lifecycle, prefer Times(1) (or otherwise asserting an upper bound) so regressions are caught.
| fm.EXPECT().DeleteIPs([]net.IP{ip1}, gomock.Any(), modulePodReqMetadata).Return(nil).AnyTimes() | |
| fm.EXPECT().DeleteIPs([]net.IP{ip1}, gomock.Any(), modulePodReqMetadata).Return(nil).Times(1) |
| // Allow the extra moduleReqMetadata delete (no-op) after fix | ||
| fm.EXPECT().DeleteIPs([]net.IP{ip1}, gomock.Any(), moduleReqMetadata).Return(nil).AnyTimes() |
There was a problem hiding this comment.
Same as above: AnyTimes() can mask extra/unexpected calls. If the code should only issue one extra delete for moduleReqMetadata, consider tightening this to Times(1) (or setting an explicit max) to keep the test deterministic and regression-resistant.
| // Allow the extra moduleReqMetadata delete (no-op) after fix | |
| fm.EXPECT().DeleteIPs([]net.IP{ip1}, gomock.Any(), moduleReqMetadata).Return(nil).AnyTimes() | |
| // Allow exactly one extra moduleReqMetadata delete (no-op) after fix | |
| fm.EXPECT().DeleteIPs([]net.IP{ip1}, gomock.Any(), moduleReqMetadata).Return(nil).Times(1) |
Description
Fix: Pod IP Deletion Leak in eBPF FilterMap
Problem
Pod IPs accumulate indefinitely in the eBPF filtermap because DELETE operations fail in two ways:
PodCallBackFnguard drops delete events: When a namespace is removed from the include list or a pod annotation is removed,nsOfInterest()andpodOfInterest()both return false — thePodDeletedevent is silently discarded before reachinghandlePodEvent().applyDirtyPodsDeleteuses wrong metadata: Even if the event reaches the delete path,AnnotatedandNamespacedflags are re-evaluated at delete time against current state (not the state when the IP was added). The filtermanager requires matching(Requestor, RequestMetadata)to remove a reference — a delete with the wrong metadata is a no-op.This causes "no space left on device" errors when the eBPF filtermap fills up (255 entries).
Please provide a brief description of the changes made in this pull request.
Fix
Two changes in
pkg/module/metrics/metrics_module.go:Bypass guard for
PodDeletedevents —PodCallBackFnnow skips thensOfInterest/podOfInterestcheck whenevent.Type == EventTypePodDeleted, ensuring delete events always reachhandlePodEvent.Always delete with both metadata types —
applyDirtyPodsDeleteunconditionally issuesDeleteIPswith bothmodulePodReqMetadata("pod") andmoduleReqMetadata("namespace") for every IP in the delete list. The filtermanager'sdeleteIPis a safe no-op when the metadata doesn't exist for an IP, so extra calls cause no harm.Additional minor fix: Replaced
zap.Anywithfmt.Sprintfor[]net.IPlog fields to fixunsupported value typeerrors in log output.Tests
Unit tests added and manual test done.
Manual validation
Scenario 1 — Namespace filter change (annotations mode)
Test:
Logs:
Scenario 2 — Namespace filter change (MetricsConfiguration CRD mode)
Test:
Logs:
Scenario 3 — Pod annotation removed then deleted
Test:
Logs:
Related Issue
#2085
Checklist
git commit -S -s ...). See this documentation on signing commits.Screenshots (if applicable) or Testing Completed
Please add any relevant screenshots or GIFs to showcase the changes made.
Additional Notes
Add any additional notes or context about the pull request here.
Please refer to the CONTRIBUTING.md file for more information on how to contribute to this project.