Skip to content

fix(doctor): use live open-fd counts for accurate fd exhaustion eta #115#202

Open
Rakshak05 wants to merge 3 commits into
optiqor:mainfrom
Rakshak05:issue-#115
Open

fix(doctor): use live open-fd counts for accurate fd exhaustion eta #115#202
Rakshak05 wants to merge 3 commits into
optiqor:mainfrom
Rakshak05:issue-#115

Conversation

@Rakshak05

Copy link
Copy Markdown
Contributor

Closes #115

What

This PR fixes file descriptor exhaustion ETA estimations by collecting and utilizing the actual live open file descriptor count and soft limits from /proc/<pid>/fd and /proc/<pid>/limits for top leaking processes rather than system-wide window net deltas.

How

  • Collector Updates: Extended FDEntry and FDSnapshot models to include live FD counts and limit fields, and populated them in Snapshot() for the top 5 net-delta processes.
  • Proc Parsers: Implemented safe, side-effect-free /proc parsing helpers (CountProcFDs and ReadProcFDLimit) that utilize a mockable package-level base path to ensure tests can run reliably on non-Linux platforms.
  • Headroom Calculations: Created a shared fdHeadroom function to calculate exact remaining headroom with prioritized fallback strategies when /proc metrics are unavailable.
  • Refactoring & UI: Updated evalFDLeak and predictFDExhaustion to use fdHeadroom, appending a clear "(estimated from window delta — actual may be lower)" warning to the impact string when the live count cannot be measured.

Testing

  • go build ./... passes
  • go test ./... passes
  • go vet ./... passes
  • golangci-lint run ./... passes
  • Tested locally with: Unit tests mocking /proc structures and rules/predict engines.
  • N/A — pure docs/refactor
  • sudo ./bin/bpf-verify --read 5s confirms 6/6 programs still load
  • ./scripts/verify.sh passes (or specific phase: ./scripts/verify.sh quality)

Checklist

  • PR title follows Conventional Commits (feat(scope): subject)
  • All commits are DCO-signed (git commit -s)
  • No unrelated changes pulled in
  • Documentation updated where user-visible behavior changed
  • Added/updated tests for new code paths
  • If a new doctor rule, paired with a chaos scenario in scripts/verify.sh

@Rakshak05 Rakshak05 requested a review from btwshivam as a code owner June 8, 2026 08:48
@github-actions github-actions Bot added the level:advanced 200+ lines or 6+ files (auto-applied) label Jun 8, 2026
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

🚀 First PR — welcome aboard!

A few things to expect:

  1. CI: every PR runs build + race tests + lint + (eventually) the kernel matrix. If something fails, the log will tell you exactly which gate.
  2. DCO: every commit needs Signed-off-by:git commit -s adds it automatically.
  3. Conventional Commits: PR titles like feat(doctor): add new rule or fix(bpf): handle X. We squash-merge by default.
  4. Review: a maintainer will review within 72 hours. Suggestions are conversations, not orders — push back if something doesn't fit your context.

If you get stuck, reply here or jump to Discussions. We want this PR to land.

@github-actions github-actions Bot added testing Tests and test coverage area/doctor Diagnostic engine and rules labels Jun 8, 2026
Rakshak05 added 2 commits June 8, 2026 14:58
Signed-off-by: Rakshak05 <rakshakbarkur@gmail.com>
Signed-off-by: Rakshak05 <rakshakbarkur@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/doctor Diagnostic engine and rules level:advanced 200+ lines or 6+ files (auto-applied) testing Tests and test coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(doctor): FD ETA treats window net-delta as absolute fd count

1 participant