Skip to content

metricshandler: stop busy-looping and honor ctx in collector#428

Open
tamalsaha wants to merge 1 commit into
masterfrom
metrics-collector-loop
Open

metricshandler: stop busy-looping and honor ctx in collector#428
tamalsaha wants to merge 1 commit into
masterfrom
metrics-collector-loop

Conversation

@tamalsaha

Copy link
Copy Markdown
Contributor

Problem

StartMetricsCollector ran an unbounded for { ... } loop with two issues (pkg/metricshandler/handler.go):

err := collector.collectMetrics()
if err != nil {
    klog.Errorf(...)
    continue          // skips the time.Sleep below
}
...
time.Sleep(MetricsRefreshPeriod)
  1. Busy-loop on error: the continue jumps back to the top without sleeping, so any persistent collection error turns this into a hot loop hammering the API server.
  2. Ignores ctx: the loop never checks the passed ctx, so the goroutine keeps running after the manager shuts down.

Fix

  • Replace the loop with wait.UntilWithContext, which always waits a full MetricsRefreshPeriod between runs (success or failure) and returns when ctx is cancelled.
  • Defensively bound MetricsStore.WriteAll so a partially populated store (fewer families than headers) can't panic the /metrics handler with an index-out-of-range. In normal operation headers and families are built behind the same install flags and stay in sync, so this is hardening rather than a live bug.

Testing

  • go build ./... passes.
  • go vet and gofmt clean on the changed files.

StartMetricsCollector ran an unbounded for-loop that:
  - called "continue" on a collection error, skipping the time.Sleep and
    busy-looping (hammering the API server) whenever collection kept failing;
  - never observed the passed ctx, so the goroutine kept running after the
    manager shut down.

Replace the loop with wait.UntilWithContext, which waits a full
MetricsRefreshPeriod between runs regardless of success/failure and returns
when ctx is cancelled.

Also harden MetricsStore.WriteAll with a bounds check so a partially
populated store (fewer families than headers) cannot panic the /metrics
handler with an index-out-of-range.

Signed-off-by: Tamal Saha <tamal@appscode.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant