OBSINTA-1219: add e2e tests for UIPlugin incident detection by DavidRajnoha · Pull Request #1038 · rhobs/observability-operator

DavidRajnoha · 2026-03-16T08:58:12Z

Add end-to-end tests that validate the monitoring UIPlugin with cluster-health-analyzer: a deployment readiness check and a functional test that triggers a CrashLoopBackOff alert and verifies the cluster_health_components_map incident metric is produced.

Also introduce AssertPromQLResultWithOptions to allow callers to override the default poll interval and timeout, and generalize waitForDBUIPluginDeletion to waitForUIPluginDeletion.

openshift-ci · 2026-03-16T08:58:23Z

Hi @DavidRajnoha. Thanks for your PR.

I'm waiting for a rhobs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

tremes · 2026-03-17T08:42:07Z

/ok-to-test

DavidRajnoha · 2026-03-17T11:34:04Z

/retest

openshift-ci-robot · 2026-03-18T08:00:12Z

@DavidRajnoha: This pull request references OBSINTA-1219 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Add end-to-end tests that validate the monitoring UIPlugin with cluster-health-analyzer: a deployment readiness check and a functional test that triggers a CrashLoopBackOff alert and verifies the cluster_health_components_map incident metric is produced.

Also introduce AssertPromQLResultWithOptions to allow callers to override the default poll interval and timeout, and generalize waitForDBUIPluginDeletion to waitForUIPluginDeletion.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

tremes · 2026-03-19T11:45:15Z

test/e2e/uiplugin_cluster_health_analyzer_test.go

+	deployName := "e2e-crasher-" + suffix
+
+	rule := newCrashLoopRule(t, ruleName, alertName, deployName)
+	err = f.K8sClient.Create(context.Background(), rule)


nit: you can use t.Context()

test/e2e/uiplugin_cluster_health_analyzer_test.go

DavidRajnoha · 2026-03-19T12:11:15Z

/hold

DavidRajnoha · 2026-03-24T13:54:45Z

/test observability-operator-e2e

DavidRajnoha · 2026-03-24T19:59:38Z

/retest

DavidRajnoha · 2026-03-25T14:19:31Z

/unhold

tremes · 2026-03-25T14:50:48Z

Thank you!
/approve
/lgtm

openshift-ci · 2026-03-25T14:51:02Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: DavidRajnoha, tremes
Once this PR has been reviewed and has the lgtm label, please assign jan--f for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

simonpasquier

I've checked the test output and it adds 3m30s to the total duration which is not negligible. Do we really need to deploy a crashlooping pod? Can't we write a "dummy" alerting rule with a static expression (vector(1)) and hardcoded labels?

simonpasquier · 2026-03-25T16:32:15Z

test/e2e/uiplugin_cluster_health_analyzer_test.go

+
+	t.Log("Waiting for health-analyzer deployment to become ready...")
+	haDeployment := appsv1.Deployment{}
+	f.GetResourceWithRetry(t, healthAnalyzerDeploymentName, uiPluginInstallNS, &haDeployment)


(nit) do we really to get the resource since we check that it's ready just after?

The getResourceWithRetry has a shorter timeout and provides explicit error message in case when the deployment is never created.

simonpasquier · 2026-03-25T16:34:26Z

test/e2e/uiplugin_cluster_health_analyzer_test.go

+					Expr: intstr.FromString(fmt.Sprintf(
+						`max_over_time(kube_pod_container_status_waiting_reason{reason="CrashLoopBackOff", namespace="%s", pod=~"%s.*", job="kube-state-metrics"}[5m]) >= 1`,
+						e2eTestNamespace, podPrefix)),
+					For:    ptr.To(monv1.Duration("1m")),


removing the for clause would speed up the test.

simonpasquier · 2026-03-25T16:36:25Z

test/e2e/uiplugin_cluster_health_analyzer_test.go

+	return *p
+}
+
+func skipIfClusterVersionBelow(t *testing.T, minVersion string) {


(nit) could it be a framework function?

simonpasquier · 2026-03-25T16:36:41Z

test/e2e/uiplugin_cluster_health_analyzer_test.go

+	t.Log("=== END DEBUG DUMP ===")
+}
+
+func ptrInt32(p *int32) int32 {


(nit) ptr.To() already exists

simonpasquier · 2026-03-25T16:37:18Z

test/e2e/uiplugin_cluster_health_analyzer_test.go

+	}
+}
+
+func dumpClusterHealthAnalyzerDebug(t *testing.T, pluginName string) {


eventually this should be generalized for reuse across different tests.

simonpasquier · 2026-03-25T16:39:24Z

test/e2e/uiplugin_cluster_health_analyzer_test.go

+	assert.NilError(t, err, "alert %s never fired", alertName)
+
+	t.Log("Waiting for cluster-health-analyzer to expose incident metric...")
+	incidentQuery := fmt.Sprintf(`cluster_health_components_map{src_alertname="%s"}`, alertName)


apart from checking the metric generated by the cluster health analyzer, is there any other outcome that we should verify?

simonpasquier · 2026-03-25T16:44:09Z

test/e2e/uiplugin_cluster_health_analyzer_test.go

+				return fmt.Errorf("expected incident metric, got: %v", v)
+			}
+			for _, sample := range vec {
+				if string(sample.Metric["src_alertname"]) != alertName {


already checked by the PromQL expression

simonpasquier · 2026-03-25T16:44:25Z

test/e2e/uiplugin_cluster_health_analyzer_test.go

+					return fmt.Errorf("expected src_alertname=%s, got %s", alertName, sample.Metric["src_alertname"])
+				}
+				if string(sample.Metric["src_severity"]) != "warning" {
+					return fmt.Errorf("expected src_severity=warning, got %s", sample.Metric["src_severity"])


could also be checked in the promql expression

openshift-ci · 2026-03-31T13:19:03Z

New changes are detected. LGTM label has been removed.

DavidRajnoha · 2026-03-31T13:19:05Z

/hold

coderabbitai · 2026-03-31T13:19:14Z

📝 Walkthrough

Walkthrough

Adds a configurable PromQL assertion helper, framework utilities to skip tests by cluster version and dump namespace diagnostics, and a new E2E test that deploys an always-firing PrometheusRule to verify the cluster health analyzer produces an incident metric.

Changes

Cohort / File(s)	Summary
Assertion polling refactor `test/e2e/framework/assertions.go`	Extracted polling into `AssertPromQLResultWithOptions(t, expr, callback, fns...)`; `AssertPromQLResult` now delegates to it with default PollInterval/WaitTimeout.
Framework helpers `test/e2e/framework/framework.go`	Added `(Framework).SkipIfClusterVersionBelow(t, minVersion)` to skip tests on older clusters and `(Framework).DumpNamespaceDebug(t, namespace)` to log Deployments, Pods, and Events for diagnostics.
Cluster health analyzer E2E test `test/e2e/uiplugin_cluster_health_analyzer_test.go`	New test `clusterHealthAnalyzer` that ensures the `monitoring` UIPlugin is enabled, waits for `health-analyzer` Deployment, creates an always-firing `PrometheusRule`, polls Prometheus for the alert and for `cluster_health_components_map` incident metric, and registers cleanup and debug dumps on failure.
Test suite wiring & rename `test/e2e/uiplugin_test.go`	Added "Cluster health analyzer" case to `TestUIPlugin`; renamed `waitForDBUIPluginDeletion` → `waitForUIPluginDeletion` and updated the cleanup call site.

Sequence Diagram(s)

sequenceDiagram
    participant TestRunner as Test Runner
    participant Framework as E2E Framework
    participant K8s as Kubernetes API
    participant Prom as Prometheus
    participant Analyzer as Health Analyzer

    TestRunner->>Framework: clusterHealthAnalyzer()
    Framework->>K8s: Get ClusterVersion (skip if < minVersion)
    Framework->>K8s: Ensure health-analyzer Deployment ready
    TestRunner->>K8s: Create PrometheusRule (always-firing)
    loop Poll until alert fires
        TestRunner->>Prom: Query ALERTS for rule
        Prom-->>TestRunner: alert state
    end
    loop Poll until incident metric appears
        TestRunner->>Prom: Query cluster_health_components_map by alert
        Prom-->>TestRunner: incident metric
    end
    TestRunner->>K8s: Cleanup PrometheusRule & resources
    alt on failure
        TestRunner->>Framework: DumpNamespaceDebug(...) and other diagnostics
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: adding end-to-end tests for UIPlugin incident detection, matching the primary purpose of the changeset.
Description check	✅ Passed	The description is directly related to the changeset, detailing the e2e tests for monitoring UIPlugin with cluster-health-analyzer, new PromQL assertion options, and the generalized deletion helper function.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.4)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

DavidRajnoha · 2026-03-31T13:19:25Z

/test observability-operator-e2e

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

test/e2e/uiplugin_cluster_health_analyzer_test.go (2)

101-109: Consider checking the Delete error for pre-existing UIPlugin.

While rare, the Delete call on line 105 could fail for reasons other than the resource being already gone. Logging or checking this error would improve debuggability.

Proposed fix

 	if err == nil {
 		t.Log("UIPlugin 'monitoring' already exists, deleting before recreation...")
-		f.K8sClient.Delete(t.Context(), existing)
+		if err := f.K8sClient.Delete(t.Context(), existing); err != nil && !errors.IsNotFound(err) {
+			t.Logf("Warning: failed to delete existing UIPlugin: %v", err)
+		}
 		waitForUIPluginDeletion(existing)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/e2e/uiplugin_cluster_health_analyzer_test.go` around lines 101 - 109,
The Delete call on the pre-existing UIPlugin is ignored; update the block that
finds an existing UIPlugin (variable existing) to capture and handle the error
returned from f.K8sClient.Delete(t.Context(), existing) — after calling Delete
check the returned error and either t.Logf the failure with the error details or
t.Fatalf if deletion must succeed; keep waitForUIPluginDeletion(existing) only
after a successful delete, and reference the existing variable and
f.K8sClient.Delete to locate the change.

43-45: GetResourceWithRetry is redundant before AssertDeploymentReady.

AssertDeploymentReady already polls until the deployment exists and is ready, so the prior GetResourceWithRetry call adds unnecessary overhead without additional guarantees.

Proposed simplification

 	t.Log("Waiting for health-analyzer deployment to become ready...")
-	haDeployment := appsv1.Deployment{}
-	f.GetResourceWithRetry(t, healthAnalyzerDeploymentName, uiPluginInstallNS, &haDeployment)
 	f.AssertDeploymentReady(healthAnalyzerDeploymentName, uiPluginInstallNS, framework.WithTimeout(5*time.Minute))(t)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/e2e/uiplugin_cluster_health_analyzer_test.go` around lines 43 - 45,
Remove the redundant call to GetResourceWithRetry: the test currently calls
f.GetResourceWithRetry(t, healthAnalyzerDeploymentName, uiPluginInstallNS,
&haDeployment) before f.AssertDeploymentReady(...), but AssertDeploymentReady
already waits for the deployment to exist and be ready; delete the
GetResourceWithRetry invocation and keep
AssertDeploymentReady(healthAnalyzerDeploymentName, uiPluginInstallNS,
framework.WithTimeout(5*time.Minute))(t); if you actually need the deployment
object (haDeployment) after readiness, call a single non-retrying getter (e.g.,
GetResource) or populate haDeployment after AssertDeploymentReady returns.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/uiplugin_cluster_health_analyzer_test.go`:
- Line 30: monv1.AddToScheme currently ignores its returned error; update the
call to capture and handle the error (e.g., err :=
monv1.AddToScheme(f.K8sClient.Scheme()); if err != nil { t.Fatalf("failed to add
monv1 to scheme: %v", err) }) so the test fails loudly on scheme registration
problems; refer to the monv1.AddToScheme call and f.K8sClient.Scheme() to locate
where to add the error check and use the test context (t) to report the failure.

---

Nitpick comments:
In `@test/e2e/uiplugin_cluster_health_analyzer_test.go`:
- Around line 101-109: The Delete call on the pre-existing UIPlugin is ignored;
update the block that finds an existing UIPlugin (variable existing) to capture
and handle the error returned from f.K8sClient.Delete(t.Context(), existing) —
after calling Delete check the returned error and either t.Logf the failure with
the error details or t.Fatalf if deletion must succeed; keep
waitForUIPluginDeletion(existing) only after a successful delete, and reference
the existing variable and f.K8sClient.Delete to locate the change.
- Around line 43-45: Remove the redundant call to GetResourceWithRetry: the test
currently calls f.GetResourceWithRetry(t, healthAnalyzerDeploymentName,
uiPluginInstallNS, &haDeployment) before f.AssertDeploymentReady(...), but
AssertDeploymentReady already waits for the deployment to exist and be ready;
delete the GetResourceWithRetry invocation and keep
AssertDeploymentReady(healthAnalyzerDeploymentName, uiPluginInstallNS,
framework.WithTimeout(5*time.Minute))(t); if you actually need the deployment
object (haDeployment) after readiness, call a single non-retrying getter (e.g.,
GetResource) or populate haDeployment after AssertDeploymentReady returns.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 47e2c224-67bf-4d35-b07e-a68060e8765c

📥 Commits

Reviewing files that changed from the base of the PR and between cbd6ba3 and bfb1cf6.

📒 Files selected for processing (4)

test/e2e/framework/assertions.go
test/e2e/framework/framework.go
test/e2e/uiplugin_cluster_health_analyzer_test.go
test/e2e/uiplugin_test.go

test/e2e/uiplugin_cluster_health_analyzer_test.go

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

test/e2e/uiplugin_cluster_health_analyzer_test.go (1)

48-50: Use a collision-resistant suffix for test resource names.

UnixNano()%100000 gives only 100k possibilities; parallel/retried CI runs can collide and fail with AlreadyExists. Prefer UUID/full nanoseconds.

Proposed fix

-	suffix := strconv.FormatInt(time.Now().UnixNano()%100000, 10)
+	suffix := strconv.FormatInt(time.Now().UnixNano(), 10)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/e2e/uiplugin_cluster_health_analyzer_test.go` around lines 48 - 50, The
test uses a short, collision-prone suffix (suffix :=
strconv.FormatInt(time.Now().UnixNano()%100000, 10)) to build ruleName and
alertName; replace that suffix with a collision-resistant value (e.g.,
uuid.New().String() or the full time.Now().UnixNano() without modulus) and
update imports accordingly so ruleName ("e2e-health-analyzer-"+suffix) and
alertName ("E2EHealthAnalyzer"+suffix) use the new suffix generation to avoid
AlreadyExists failures in parallel CI.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/uiplugin_cluster_health_analyzer_test.go`:
- Around line 106-107: The Delete calls like f.K8sClient.Delete(t.Context(),
existing) currently ignore returned errors which can mask API failures; update
each deletion (the calls that precede waitForUIPluginDeletion(existing) and the
similar calls later) to capture the error (e.g., err :=
f.K8sClient.Delete(t.Context(), existing)) and fail the test immediately on
error (use t.Fatalf or your test helper like require.NoError(t, err)) before
calling waitForUIPluginDeletion(existing) so the real delete failure is reported
instead of timing out waiting for deletion.
- Around line 102-116: Currently the test permanently deletes a pre-existing
UIPlugin named "monitoring" and only cleans up the recreated plugin; instead,
detect the existing resource (existing from f.K8sClient.Get), make a DeepCopy or
clone of it into a variable like originalPlugin, then proceed to delete it to
allow recreation but register a cleanup that will restore originalPlugin (using
f.K8sClient.Create or Update) after the test finishes; also keep the existing
cleanup for the newly created plugin (plugin) and ensure waitForUIPluginDeletion
is called as needed and errors from restore/create are handled and logged.

---

Nitpick comments:
In `@test/e2e/uiplugin_cluster_health_analyzer_test.go`:
- Around line 48-50: The test uses a short, collision-prone suffix (suffix :=
strconv.FormatInt(time.Now().UnixNano()%100000, 10)) to build ruleName and
alertName; replace that suffix with a collision-resistant value (e.g.,
uuid.New().String() or the full time.Now().UnixNano() without modulus) and
update imports accordingly so ruleName ("e2e-health-analyzer-"+suffix) and
alertName ("E2EHealthAnalyzer"+suffix) use the new suffix generation to avoid
AlreadyExists failures in parallel CI.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 370d48ba-2d12-4849-b8bd-1d792776f68f

📥 Commits

Reviewing files that changed from the base of the PR and between bfb1cf6 and 7dc043f.

📒 Files selected for processing (2)

test/e2e/framework/framework.go
test/e2e/uiplugin_cluster_health_analyzer_test.go

✅ Files skipped from review due to trivial changes (1)

test/e2e/framework/framework.go

test/e2e/uiplugin_cluster_health_analyzer_test.go

DavidRajnoha · 2026-04-01T11:05:57Z

/retest

Validate that the Monitoring UIPlugin with ClusterHealthAnalyzer enabled deploys the health-analyzer and correctly processes alerts into incident metrics. The test creates a crashing pod, waits for the corresponding PrometheusRule alert to fire, then verifies the cluster_health_components_map metric is exposed with the expected labels. Also extends the framework's AssertPromQLResult with configurable timeout and poll interval options (AssertPromQLResultWithOptions). Made-with: Cursor

DavidRajnoha · 2026-04-01T12:08:39Z

@simonpasquier I updated the test to use a static rule. For reference, here is the current timing breakdown of the test:

Step	Duration	Elapsed
UIPlugin creation	824ms	0.8s
Deployment ready	5.4s	6.2s
PrometheusRule creation	239ms	6.4s
Alert firing (PromQL poll)	32.2s	38.6s
Incident metric (PromQL poll)	32.0s	1m 10.6s

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/framework/framework.go`:
- Around line 297-302: The DumpNamespaceDebug method uses t.Context() which is
cancelled before t.Cleanup runs; replace the cancelled context with a
non-cancelled one (e.g. use context.Background() or a background context with a
timeout) so the List calls inside DumpNamespaceDebug succeed during cleanup.
Update the ctx variable in Framework.DumpNamespaceDebug (replace ctx :=
t.Context()) to use a persistent background context and ensure the context
package is imported; keep all List calls (deployments, pods, events) using this
new ctx.

In `@test/e2e/uiplugin_cluster_health_analyzer_test.go`:
- Around line 42-46: The cleanup callbacks and the debug helper are using
t.Context() which is already canceled when t.Cleanup runs; replace usages of
t.Context() in the t.Cleanup closures and inside dumpClusterHealthAnalyzerDebug
by creating an uncancelable context first (ctx :=
context.WithoutCancel(t.Context())) and pass that ctx into any API calls
(delete/get/list) and into dumpClusterHealthAnalyzerDebug instead of
t.Context(); update the anonymous t.Cleanup funcs and the
dumpClusterHealthAnalyzerDebug function to accept/use the new ctx so
cleanup-time API calls are not given a canceled context.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 99ba088d-c938-4352-9e06-056a58ac5d82

📥 Commits

Reviewing files that changed from the base of the PR and between 7dc043f and 25f4b52.

📒 Files selected for processing (4)

test/e2e/framework/assertions.go
test/e2e/framework/framework.go
test/e2e/uiplugin_cluster_health_analyzer_test.go
test/e2e/uiplugin_test.go

🚧 Files skipped from review as they are similar to previous changes (2)

test/e2e/uiplugin_test.go
test/e2e/framework/assertions.go

coderabbitai · 2026-04-01T12:15:55Z

test/e2e/framework/framework.go

+// DumpNamespaceDebug logs deployments (with conditions), pods (with container
+// statuses), and events for the given namespace. Useful as a t.Cleanup or
+// on-failure diagnostic helper.
+func (f *Framework) DumpNamespaceDebug(t *testing.T, namespace string) {
+	t.Helper()
+	ctx := t.Context()


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n test/e2e/framework/framework.go | sed -n '297,354p'

Repository: rhobs/observability-operator

Length of output: 2987

🏁 Script executed:

head -20 test/e2e/framework/framework.go

Repository: rhobs/observability-operator

Length of output: 391

🌐 Web query:

Go testing.T.Context() cleanup callbacks behavior when context is canceled

💡 Result:

In Go 1.24 and later, testing.T.Context returns a context.Context that is canceled just before any T.Cleanup-registered functions are called. This allows cleanup functions to wait for resources (e.g., goroutines) that shut down when listening on ctx.Done= before the test or subtest completes. When the context is canceled (ctx.Done= is closed): - Cleanup callbacks registered via T.Cleanup are executed after the cancellation. - The context error (ctx.Err)= is context.Canceled (not DeadlineExceeded, unless test timeout interacts). - Goroutines or resources watching <-ctx.Done= can perform shutdown before cleanups run, ensuring proper ordering. Example usage: func TestFoo(t *testing.T) { ctx := t.Context var wg sync.WaitGroup wg.Add(1) go func { <-ctx.Done // Detects cancellation // Cleanup resources wg.Done } t.Cleanup(wg.Wait) // Waits after ctx canceled } This behavior was finalized in proposal golang/go#36532 (accepted Aug 2024, Go 1.24). Note: Do not use t.Context inside Cleanup functions, as it may lead to unexpected cancellation of cleanup logic itself (observed in practice). Use context.Background= or manual contexts for cleanup internals.

Citations:

1: testing: reconsider adding Context method to testing.T golang/go#36532

2: https://pkg.go.dev/testing

3: https://pkg.go.dev/testing@go1.25.6

4: https://pkg.go.dev/testing/

5: https://pkg.go.dev/testing@go1.24.3

6: https://pkg.go.dev/testing@go1.24.2

🏁 Script executed:

rg "DumpNamespaceDebug" --type go -B 2 -A 5

Repository: rhobs/observability-operator

Length of output: 1156

🏁 Script executed:

# Check Go version requirement in go.mod cat go.mod | head -5

Repository: rhobs/observability-operator

Length of output: 140

🏁 Script executed:

rg "t\.Cleanup" test/e2e/ -B 3 -A 3 --type go

Repository: rhobs/observability-operator

Length of output: 1524

🏁 Script executed:

cat -n test/e2e/uiplugin_cluster_health_analyzer_test.go | sed -n '1,50p'

Repository: rhobs/observability-operator

Length of output: 1983

🏁 Script executed:

# Check if context.WithoutCancel is available in Go 1.25 python3 <<'EOF' # context.WithoutCancel was introduced in Go 1.21 # The project uses Go 1.25.5, so it's definitely available print("context.WithoutCancel is available in Go 1.21+") print("Project requires Go 1.25.5, so the API is available.") EOF

Repository: rhobs/observability-operator

Length of output: 171

🏁 Script executed:

# Verify if the imports include context rg "import.*context" test/e2e/framework/framework.go

Repository: rhobs/observability-operator

Length of output: 54

🏁 Script executed:

# Check the imports in framework.go to see if context is imported head -30 test/e2e/framework/framework.go | grep -E "^import|context"

Repository: rhobs/observability-operator

Length of output: 91

Use a non-cancelled context in DumpNamespaceDebug.

This helper is documented for t.Cleanup, but testing.T.Context() is cancelled before cleanup callbacks run (Go 1.24+). When called from cleanup, the List calls degrade to context canceled, so the namespace dump fails exactly when diagnostic output is needed most.

Suggested fix

func (f *Framework) DumpNamespaceDebug(t *testing.T, namespace string) { t.Helper() - ctx := t.Context() + ctx := context.WithoutCancel(t.Context())

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// DumpNamespaceDebug logs deployments (with conditions), pods (with container

// statuses), and events for the given namespace. Useful as a t.Cleanup or

// on-failure diagnostic helper.

func (f *Framework) DumpNamespaceDebug(t *testing.T, namespace string) {

t.Helper()

ctx := t.Context()

// DumpNamespaceDebug logs deployments (with conditions), pods (with container

// statuses), and events for the given namespace. Useful as a t.Cleanup or

// on-failure diagnostic helper.

func (f *Framework) DumpNamespaceDebug(t *testing.T, namespace string) {

t.Helper()

ctx := context.WithoutCancel(t.Context())

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/framework/framework.go` around lines 297 - 302, The DumpNamespaceDebug method uses t.Context() which is cancelled before t.Cleanup runs; replace the cancelled context with a non-cancelled one (e.g. use context.Background() or a background context with a timeout) so the List calls inside DumpNamespaceDebug succeed during cleanup. Update the ctx variable in Framework.DumpNamespaceDebug (replace ctx := t.Context()) to use a persistent background context and ensure the context package is imported; keep all List calls (deployments, pods, events) using this new ctx.

coderabbitai · 2026-04-01T12:15:55Z

test/e2e/uiplugin_cluster_health_analyzer_test.go

+	t.Cleanup(func() {
+		if t.Failed() {
+			dumpClusterHealthAnalyzerDebug(t, plugin.Name)
+		}
+	})


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check the test file mentioned in the review head -250 test/e2e/uiplugin_cluster_health_analyzer_test.go

Repository: rhobs/observability-operator

Length of output: 6648

🏁 Script executed:

# Check the framework implementation to understand how f.CleanUp works cat -n test/e2e/framework/framework.go | sed -n '250,270p'

Repository: rhobs/observability-operator

Length of output: 885

🏁 Script executed:

# Check if there are other usages of t.Context() in cleanup-related code rg "t\.Context\(\)" test/e2e/uiplugin_cluster_health_analyzer_test.go -B 2 -A 2

Repository: rhobs/observability-operator

Length of output: 1268

Use context.WithoutCancel() to avoid passing canceled context to cleanup-time API calls.

During cleanup, t.Context() is already canceled since t.Cleanup() callbacks execute after the test completes. This causes delete/get/list operations to fail with "context canceled", resulting in resource leaks and incomplete debug output.

Apply the fix at lines 42-46, 129-134, 164-168, and 172-175 by wrapping t.Context() with context.WithoutCancel():

Fix details

import ( + "context" "fmt" "strconv" "testing" "time"

In cleanup closures (lines 129-134, 164-168):

f.CleanUp(t, func() { - if err := f.K8sClient.Delete(t.Context(), plugin); err != nil && !errors.IsNotFound(err) { + ctx := context.WithoutCancel(t.Context()) + if err := f.K8sClient.Delete(ctx, plugin); err != nil && !errors.IsNotFound(err) {

In dumpClusterHealthAnalyzerDebug (line 172):

func dumpClusterHealthAnalyzerDebug(t *testing.T, pluginName string) { t.Helper() - ctx := t.Context() + ctx := context.WithoutCancel(t.Context())

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/uiplugin_cluster_health_analyzer_test.go` around lines 42 - 46, The cleanup callbacks and the debug helper are using t.Context() which is already canceled when t.Cleanup runs; replace usages of t.Context() in the t.Cleanup closures and inside dumpClusterHealthAnalyzerDebug by creating an uncancelable context first (ctx := context.WithoutCancel(t.Context())) and pass that ctx into any API calls (delete/get/list) and into dumpClusterHealthAnalyzerDebug instead of t.Context(); update the anonymous t.Cleanup funcs and the dumpClusterHealthAnalyzerDebug function to accept/use the new ctx so cleanup-time API calls are not given a canceled context.

coderabbitai

♻️ Duplicate comments (1)

test/e2e/framework/framework.go (1)
300-303: ⚠️ Potential issue | 🟠 Major

Use a non-cancelled context in cleanup diagnostics (already reported).

Line 302 uses t.Context(). When this helper is invoked from t.Cleanup, list calls can fail with context canceled, which defeats failure-time diagnostics.
Suggested fix
 func (f *Framework) DumpNamespaceDebug(t *testing.T, namespace string) {
 	t.Helper()
-	ctx := t.Context()
+	ctx := context.WithoutCancel(t.Context())
In Go 1.24+ testing package behavior: is testing.T.Context canceled before T.Cleanup callbacks run?
As per coding guidelines "**: Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/framework/framework.go` around lines 300 - 303, The helper
DumpNamespaceDebug currently grabs ctx from t.Context() which can be canceled
when run inside t.Cleanup; change it to use a non-cancelled context (e.g., ctx
:= context.Background() or context.TODO()) so diagnostic list calls in
DumpNamespaceDebug won’t fail with “context canceled”; update the function
(DumpNamespaceDebug) to import context if needed and replace the t.Context()
usage with the background context.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@test/e2e/framework/framework.go`:
- Around line 300-303: The helper DumpNamespaceDebug currently grabs ctx from
t.Context() which can be canceled when run inside t.Cleanup; change it to use a
non-cancelled context (e.g., ctx := context.Background() or context.TODO()) so
diagnostic list calls in DumpNamespaceDebug won’t fail with “context canceled”;
update the function (DumpNamespaceDebug) to import context if needed and replace
the t.Context() usage with the background context.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 566b5429-753e-4378-bcfa-22e74ddb0c0d

📥 Commits

Reviewing files that changed from the base of the PR and between 25f4b52 and a0245db.

📒 Files selected for processing (4)

test/e2e/framework/assertions.go
test/e2e/framework/framework.go
test/e2e/uiplugin_cluster_health_analyzer_test.go
test/e2e/uiplugin_test.go

🚧 Files skipped from review as they are similar to previous changes (3)

test/e2e/uiplugin_test.go
test/e2e/framework/assertions.go
test/e2e/uiplugin_cluster_health_analyzer_test.go

openshift-ci bot added the needs-ok-to-test label Mar 16, 2026

DavidRajnoha force-pushed the test/e2e-uiplugin-cluster-health-analyzer branch 2 times, most recently from 89ac678 to 0768ae2 Compare March 16, 2026 14:36

openshift-ci bot added ok-to-test and removed needs-ok-to-test labels Mar 17, 2026

DavidRajnoha changed the title ~~test: add e2e tests for UIPlugin incident detection~~ OBSINTA-1219: add e2e tests for UIPlugin incident detection Mar 18, 2026

openshift-ci-robot added the jira/valid-reference label Mar 18, 2026

DavidRajnoha force-pushed the test/e2e-uiplugin-cluster-health-analyzer branch 3 times, most recently from 970ac8c to 90b891c Compare March 18, 2026 11:25

tremes reviewed Mar 19, 2026

View reviewed changes

test/e2e/uiplugin_cluster_health_analyzer_test.go Show resolved Hide resolved

openshift-ci bot added the do-not-merge/hold label Mar 19, 2026

DavidRajnoha force-pushed the test/e2e-uiplugin-cluster-health-analyzer branch from 90b891c to 23c1e1d Compare March 25, 2026 09:58

openshift-ci bot removed the do-not-merge/hold label Mar 25, 2026

openshift-ci bot assigned tremes Mar 25, 2026

openshift-ci bot added the lgtm label Mar 25, 2026

simonpasquier reviewed Mar 25, 2026

View reviewed changes

openshift-ci bot removed the lgtm label Mar 31, 2026

openshift-ci bot added the do-not-merge/hold label Mar 31, 2026

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

test/e2e/uiplugin_cluster_health_analyzer_test.go Outdated Show resolved Hide resolved

DavidRajnoha force-pushed the test/e2e-uiplugin-cluster-health-analyzer branch from bfb1cf6 to 7dc043f Compare April 1, 2026 10:16

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

test/e2e/uiplugin_cluster_health_analyzer_test.go Show resolved Hide resolved

test/e2e/uiplugin_cluster_health_analyzer_test.go Outdated Show resolved Hide resolved

DavidRajnoha force-pushed the test/e2e-uiplugin-cluster-health-analyzer branch from 7dc043f to 25f4b52 Compare April 1, 2026 12:05

DavidRajnoha force-pushed the test/e2e-uiplugin-cluster-health-analyzer branch from 25f4b52 to a0245db Compare April 1, 2026 12:07

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

Conversation

DavidRajnoha commented Mar 16, 2026

Uh oh!

openshift-ci bot commented Mar 16, 2026

Uh oh!

tremes commented Mar 17, 2026

Uh oh!

DavidRajnoha commented Mar 17, 2026

Uh oh!

openshift-ci-robot commented Mar 18, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DavidRajnoha commented Mar 19, 2026

Uh oh!

DavidRajnoha commented Mar 24, 2026

Uh oh!

DavidRajnoha commented Mar 24, 2026

Uh oh!

DavidRajnoha commented Mar 25, 2026

Uh oh!

tremes commented Mar 25, 2026

Uh oh!

openshift-ci bot commented Mar 25, 2026

Uh oh!

simonpasquier left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Mar 31, 2026

Uh oh!

DavidRajnoha commented Mar 31, 2026

Uh oh!

coderabbitai bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

DavidRajnoha commented Mar 31, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DavidRajnoha commented Apr 1, 2026

Uh oh!

DavidRajnoha commented Apr 1, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Mar 31, 2026 •

edited

Loading