Skip to content

fix(bpf): widen nr_bytes to __u64 to prevent overflow on large I/O#197

Open
Kokila-chandrakar wants to merge 8 commits into
optiqor:mainfrom
Kokila-chandrakar:fix/disk-io-nr-bytes-u64-overflow
Open

fix(bpf): widen nr_bytes to __u64 to prevent overflow on large I/O#197
Kokila-chandrakar wants to merge 8 commits into
optiqor:mainfrom
Kokila-chandrakar:fix/disk-io-nr-bytes-u64-overflow

Conversation

@Kokila-chandrakar

@Kokila-chandrakar Kokila-chandrakar commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Problem

disk_io.c:55 computes nr_bytes = ctx->nr_sector * 512 in __u32 arithmetic. Any merged or discard request larger than ~8 MiB (nr_sector > 8_388_607) silently wraps, producing a garbage byte count in the ring buffer event and corrupting the throughput totals accumulated in the collector.

Regular single requests are usually well under this limit, but merged block requests and discard operations routinely exceed it.

Fix

  • c/disk_io.c — cast ctx->nr_sector to __u64 before multiplying by 512 so the product is computed in 64-bit arithmetic.
  • c/headers/kerno.h — widen struct disk_event.nr_bytes from __u32 to __u64; reorder pid before nr_bytes for natural alignment; expand _pad from [3] to [7] bytes to keep the struct size a multiple of 8 (total: 32 → 40 bytes).
  • internal/bpf/events.go — mirror the new C layout exactly in DiskEvent so binary.Read deserializes correctly.
  • internal/bpf/decode_test.go — add TestDecodeDiskEventLargeIO: encodes a 16 MiB discard event (nr_sector=32768, expected NrBytes=16_777_216) and asserts it is not truncated. This test would have failed on the old uint32 field.

Struct layout change

Before After
dev __u32 __u32
nr_bytes __u32 moved
pid __u32 __u32
nr_bytes __u64
op __u8 __u8
_pad __u8[3] __u8[7]
Total 32 bytes 40 bytes

The Go DiskEvent struct is updated to match field-for-field.

Testing

go test ./internal/bpf/... -run TestDecodeDisk

close #124

ctx->nr_sector * 512 was computed in __u32, wrapping for any merged or
discard request larger than ~8 MiB (nr_sector > 8_388_607). Cast the
operand to __u64 so the multiply is done in 64-bit arithmetic.

Widen the nr_bytes field in struct disk_event (kerno.h) from __u32 to
__u64 and re-pad _pad from [3]byte to [7]byte to keep the struct size a
multiple of 8. Mirror the new layout in the Go DiskEvent struct and
update the bpf2go-generated field order accordingly.

Add TestDecodeDiskEventLargeIO to catch regressions: a 16 MiB discard
(nr_sector=32768) previously decoded to NrBytes=0.

Fixes: disk_io.c:55 __u32 overflow on large merged/discard requests
@github-actions github-actions Bot added level:critical Touches BPF, security, or release surfaces (auto-applied) testing Tests and test coverage area/bpf eBPF programs and loaders and removed level:critical Touches BPF, security, or release surfaces (auto-applied) labels Jun 7, 2026
@github-actions

github-actions Bot commented Jun 7, 2026

Copy link
Copy Markdown

🚀 First PR — welcome aboard!

A few things to expect:

  1. CI: every PR runs build + race tests + lint + (eventually) the kernel matrix. If something fails, the log will tell you exactly which gate.
  2. DCO: every commit needs Signed-off-by:git commit -s adds it automatically.
  3. Conventional Commits: PR titles like feat(doctor): add new rule or fix(bpf): handle X. We squash-merge by default.
  4. Review: a maintainer will review within 72 hours. Suggestions are conversations, not orders — push back if something doesn't fit your context.

If you get stuck, reply here or jump to Discussions. We want this PR to land.

@github-actions github-actions Bot added the level:critical Touches BPF, security, or release surfaces (auto-applied) label Jun 7, 2026
@Kokila-chandrakar Kokila-chandrakar changed the title fix(bpf/disk_io): widen nr_bytes to __u64 to prevent overflow on large I/O fix(bpf): widen nr_bytes to __u64 to prevent overflow on large I/O Jun 7, 2026
@github-actions github-actions Bot added the area/doctor Diagnostic engine and rules label Jun 7, 2026
@btwshivam

Copy link
Copy Markdown
Member

lint failing

@btwshivam btwshivam left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the u64 widening and the go-side struct mirror are correct (offsets line up at 64 bytes), but events.go isn't gofmt-clean so lint is red, and the fix drops a load-bearing verifier comment in disk_io.c. fix those two and it's an approve.

Comment thread internal/bpf/events.go Outdated
Comment thread internal/bpf/c/disk_io.c
@Kokila-chandrakar

Copy link
Copy Markdown
Contributor Author

All review comments addressed:

  • Widened nr_bytes to __u64 in C struct and restored verifier comment in disk_io.c
  • Mirrored layout in Go DiskEvent struct with gofmt-clean spacing
  • Updated all callsites (cli, collector, metrics) and removed unnecessary uint64 conversions
  • Added TestDecodeDiskEventLargeIO regression test

@Kokila-chandrakar

Copy link
Copy Markdown
Contributor Author

Hi @btwshivam ,

This PR has been waiting for review for a few days now. All conflicts have been resolved and the changes are ready for maintainer review. I'd be grateful if you could take a look when available.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/bpf eBPF programs and loaders area/doctor Diagnostic engine and rules level:critical Touches BPF, security, or release surfaces (auto-applied) testing Tests and test coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(bpf): disk_io nr_bytes overflows u32 on large requests

2 participants