Skip to content

fix(collector): fix critical SwapUsedBytes underflow and table-drive memory tests#208

Open
ramnnn2006 wants to merge 1 commit into
optiqor:mainfrom
ramnnn2006:test/table-drive-memory-tests
Open

fix(collector): fix critical SwapUsedBytes underflow and table-drive memory tests#208
ramnnn2006 wants to merge 1 commit into
optiqor:mainfrom
ramnnn2006:test/table-drive-memory-tests

Conversation

@ramnnn2006

Copy link
Copy Markdown
Contributor

What

fixes a critical uint64 underflow bug in poll() where SwapTotal - SwapFree
silently wraps to ~18 exabytes when the kernel reports SwapFree > SwapTotal,
corrupting the SwapUsedBytes metric fed into the doctor's oom_imminent rule.
also a testing refactor converting all sequential cases in memory_test.go and
memory_collector_test.go into table-driven tests with t.Run.

Why

Fixes #137

the bug is critical because the corrupted value propagates silently into oom
detection with no visible error, either triggering false alerts or masking real
memory pressure on kernels where this condition occurs (vm snapshots, hibernate
resume, live patching).

How

  • bug fix: added swapTotal > swapFree guard before the subtraction in
    poll(), clamps to 0 instead of wrapping
  • testing refactor: TestMemoryCollectorPoll converted to a 6-case table
    covering zero total, no swap, swapfree > swaptotal (exercises the guard),
    fully used memory, exactly-at-threshold pct, and error path
  • TestParseMeminfoLine wrapped existing cases in t.Run
  • TestCollectorMemoryHighCardinalityBurst, Sustained100Kps,
    ConcurrentRecord all converted to parametrized table cases with multiple
    load and concurrency profiles

Testing

  • go build ./... passes

  • go test ./... passes

  • go vet ./... passes

  • golangci-lint run ./... passes

  • Tested locally.

  • N/A — pure docs/refactor

  • sudo ./bin/bpf-verify --read 5s confirms 6/6 programs still load

  • ./scripts/verify.sh passes (or specific phase: ./scripts/verify.sh quality)

Checklist

  • PR title follows Conventional Commits (feat(scope): subject)
  • All commits are DCO-signed (git commit -s)
  • No unrelated changes pulled in
  • Documentation updated where user-visible behavior changed
  • Added/updated tests for new code paths
  • If a new doctor rule, paired with a chaos scenario in scripts/verify.sh

convert sequential tests in both files to table-driven with t.Run(tc.name).
cover boundaries: zero total, swapfree > swaptotal underflow guard, fully
used memory, exactly-at-threshold pct, and multiple concurrency profiles.

also fix uint64 underflow in poll() when SwapFree exceeds SwapTotal.

Signed-off-by: ramnnn2006 <phrkvvp@gmail.com>
@ramnnn2006 ramnnn2006 requested a review from btwshivam as a code owner June 8, 2026 20:30
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

🚀 First PR — welcome aboard!

A few things to expect:

  1. CI: every PR runs build + race tests + lint + (eventually) the kernel matrix. If something fails, the log will tell you exactly which gate.
  2. DCO: every commit needs Signed-off-by:git commit -s adds it automatically.
  3. Conventional Commits: PR titles like feat(doctor): add new rule or fix(bpf): handle X. We squash-merge by default.
  4. Review: a maintainer will review within 72 hours. Suggestions are conversations, not orders — push back if something doesn't fit your context.

If you get stuck, reply here or jump to Discussions. We want this PR to land.

@github-actions github-actions Bot added testing Tests and test coverage area/doctor Diagnostic engine and rules level:advanced 200+ lines or 6+ files (auto-applied) labels Jun 8, 2026
@ramnnn2006

Copy link
Copy Markdown
Contributor Author

hey @btwshivam ,
i've been contributing to the repo for a bit now. could you add type:bug (underflow fix in poll()), type:refactor + type:testing (table-driven rewrite) and gssoc:approved when the changes look good?

lmk if anything needs fixing, looking forward to contributing more!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/doctor Diagnostic engine and rules level:advanced 200+ lines or 6+ files (auto-applied) testing Tests and test coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test(collector): table-drive memory_test.go and memory_collector_test.go

1 participant