Skip to content

OBSINTA-1219: add e2e tests for UIPlugin incident detection#1038

Open
DavidRajnoha wants to merge 1 commit intorhobs:mainfrom
DavidRajnoha:test/e2e-uiplugin-cluster-health-analyzer
Open

OBSINTA-1219: add e2e tests for UIPlugin incident detection#1038
DavidRajnoha wants to merge 1 commit intorhobs:mainfrom
DavidRajnoha:test/e2e-uiplugin-cluster-health-analyzer

Conversation

@DavidRajnoha
Copy link
Copy Markdown

Add end-to-end tests that validate the monitoring UIPlugin with cluster-health-analyzer: a deployment readiness check and a functional test that triggers a CrashLoopBackOff alert and verifies the cluster_health_components_map incident metric is produced.

Also introduce AssertPromQLResultWithOptions to allow callers to override the default poll interval and timeout, and generalize waitForDBUIPluginDeletion to waitForUIPluginDeletion.

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 16, 2026

Hi @DavidRajnoha. Thanks for your PR.

I'm waiting for a rhobs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@DavidRajnoha DavidRajnoha force-pushed the test/e2e-uiplugin-cluster-health-analyzer branch 2 times, most recently from 89ac678 to 0768ae2 Compare March 16, 2026 14:36
@tremes
Copy link
Copy Markdown
Contributor

tremes commented Mar 17, 2026

/ok-to-test

@DavidRajnoha
Copy link
Copy Markdown
Author

/retest

@DavidRajnoha DavidRajnoha changed the title test: add e2e tests for UIPlugin incident detection OBSINTA-1219: add e2e tests for UIPlugin incident detection Mar 18, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Collaborator

@DavidRajnoha: This pull request references OBSINTA-1219 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Add end-to-end tests that validate the monitoring UIPlugin with cluster-health-analyzer: a deployment readiness check and a functional test that triggers a CrashLoopBackOff alert and verifies the cluster_health_components_map incident metric is produced.

Also introduce AssertPromQLResultWithOptions to allow callers to override the default poll interval and timeout, and generalize waitForDBUIPluginDeletion to waitForUIPluginDeletion.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@DavidRajnoha DavidRajnoha force-pushed the test/e2e-uiplugin-cluster-health-analyzer branch 3 times, most recently from 970ac8c to 90b891c Compare March 18, 2026 11:25
deployName := "e2e-crasher-" + suffix

rule := newCrashLoopRule(t, ruleName, alertName, deployName)
err = f.K8sClient.Create(context.Background(), rule)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can use t.Context()

@DavidRajnoha
Copy link
Copy Markdown
Author

/hold

@DavidRajnoha
Copy link
Copy Markdown
Author

/test observability-operator-e2e

@DavidRajnoha
Copy link
Copy Markdown
Author

/retest

@DavidRajnoha DavidRajnoha force-pushed the test/e2e-uiplugin-cluster-health-analyzer branch from 90b891c to 23c1e1d Compare March 25, 2026 09:58
@DavidRajnoha
Copy link
Copy Markdown
Author

/unhold

@tremes
Copy link
Copy Markdown
Contributor

tremes commented Mar 25, 2026

Thank you!
/approve
/lgtm

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 25, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: DavidRajnoha, tremes
Once this PR has been reviewed and has the lgtm label, please assign jan--f for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked the test output and it adds 3m30s to the total duration which is not negligible. Do we really need to deploy a crashlooping pod? Can't we write a "dummy" alerting rule with a static expression (vector(1)) and hardcoded labels?


t.Log("Waiting for health-analyzer deployment to become ready...")
haDeployment := appsv1.Deployment{}
f.GetResourceWithRetry(t, healthAnalyzerDeploymentName, uiPluginInstallNS, &haDeployment)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) do we really to get the resource since we check that it's ready just after?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getResourceWithRetry has a shorter timeout and provides explicit error message in case when the deployment is never created.

Expr: intstr.FromString(fmt.Sprintf(
`max_over_time(kube_pod_container_status_waiting_reason{reason="CrashLoopBackOff", namespace="%s", pod=~"%s.*", job="kube-state-metrics"}[5m]) >= 1`,
e2eTestNamespace, podPrefix)),
For: ptr.To(monv1.Duration("1m")),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing the for clause would speed up the test.

return *p
}

func skipIfClusterVersionBelow(t *testing.T, minVersion string) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) could it be a framework function?

t.Log("=== END DEBUG DUMP ===")
}

func ptrInt32(p *int32) int32 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) ptr.To() already exists

}
}

func dumpClusterHealthAnalyzerDebug(t *testing.T, pluginName string) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eventually this should be generalized for reuse across different tests.

assert.NilError(t, err, "alert %s never fired", alertName)

t.Log("Waiting for cluster-health-analyzer to expose incident metric...")
incidentQuery := fmt.Sprintf(`cluster_health_components_map{src_alertname="%s"}`, alertName)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apart from checking the metric generated by the cluster health analyzer, is there any other outcome that we should verify?

return fmt.Errorf("expected incident metric, got: %v", v)
}
for _, sample := range vec {
if string(sample.Metric["src_alertname"]) != alertName {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already checked by the PromQL expression

return fmt.Errorf("expected src_alertname=%s, got %s", alertName, sample.Metric["src_alertname"])
}
if string(sample.Metric["src_severity"]) != "warning" {
return fmt.Errorf("expected src_severity=warning, got %s", sample.Metric["src_severity"])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also be checked in the promql expression

@openshift-ci openshift-ci bot removed the lgtm label Mar 31, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 31, 2026

New changes are detected. LGTM label has been removed.

@DavidRajnoha
Copy link
Copy Markdown
Author

/hold

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 31, 2026

📝 Walkthrough

Walkthrough

Adds a configurable PromQL assertion helper, framework utilities to skip tests by cluster version and dump namespace diagnostics, and a new E2E test that deploys an always-firing PrometheusRule to verify the cluster health analyzer produces an incident metric.

Changes

Cohort / File(s) Summary
Assertion polling refactor
test/e2e/framework/assertions.go
Extracted polling into AssertPromQLResultWithOptions(t, expr, callback, fns...); AssertPromQLResult now delegates to it with default PollInterval/WaitTimeout.
Framework helpers
test/e2e/framework/framework.go
Added (*Framework).SkipIfClusterVersionBelow(t, minVersion) to skip tests on older clusters and (*Framework).DumpNamespaceDebug(t, namespace) to log Deployments, Pods, and Events for diagnostics.
Cluster health analyzer E2E test
test/e2e/uiplugin_cluster_health_analyzer_test.go
New test clusterHealthAnalyzer that ensures the monitoring UIPlugin is enabled, waits for health-analyzer Deployment, creates an always-firing PrometheusRule, polls Prometheus for the alert and for cluster_health_components_map incident metric, and registers cleanup and debug dumps on failure.
Test suite wiring & rename
test/e2e/uiplugin_test.go
Added "Cluster health analyzer" case to TestUIPlugin; renamed waitForDBUIPluginDeletionwaitForUIPluginDeletion and updated the cleanup call site.

Sequence Diagram(s)

sequenceDiagram
    participant TestRunner as Test Runner
    participant Framework as E2E Framework
    participant K8s as Kubernetes API
    participant Prom as Prometheus
    participant Analyzer as Health Analyzer

    TestRunner->>Framework: clusterHealthAnalyzer()
    Framework->>K8s: Get ClusterVersion (skip if < minVersion)
    Framework->>K8s: Ensure health-analyzer Deployment ready
    TestRunner->>K8s: Create PrometheusRule (always-firing)
    loop Poll until alert fires
        TestRunner->>Prom: Query ALERTS for rule
        Prom-->>TestRunner: alert state
    end
    loop Poll until incident metric appears
        TestRunner->>Prom: Query cluster_health_components_map by alert
        Prom-->>TestRunner: incident metric
    end
    TestRunner->>K8s: Cleanup PrometheusRule & resources
    alt on failure
        TestRunner->>Framework: DumpNamespaceDebug(...) and other diagnostics
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding end-to-end tests for UIPlugin incident detection, matching the primary purpose of the changeset.
Description check ✅ Passed The description is directly related to the changeset, detailing the e2e tests for monitoring UIPlugin with cluster-health-analyzer, new PromQL assertion options, and the generalized deletion helper function.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.4)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@DavidRajnoha
Copy link
Copy Markdown
Author

/test observability-operator-e2e

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
test/e2e/uiplugin_cluster_health_analyzer_test.go (2)

101-109: Consider checking the Delete error for pre-existing UIPlugin.

While rare, the Delete call on line 105 could fail for reasons other than the resource being already gone. Logging or checking this error would improve debuggability.

Proposed fix
 	if err == nil {
 		t.Log("UIPlugin 'monitoring' already exists, deleting before recreation...")
-		f.K8sClient.Delete(t.Context(), existing)
+		if err := f.K8sClient.Delete(t.Context(), existing); err != nil && !errors.IsNotFound(err) {
+			t.Logf("Warning: failed to delete existing UIPlugin: %v", err)
+		}
 		waitForUIPluginDeletion(existing)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/uiplugin_cluster_health_analyzer_test.go` around lines 101 - 109,
The Delete call on the pre-existing UIPlugin is ignored; update the block that
finds an existing UIPlugin (variable existing) to capture and handle the error
returned from f.K8sClient.Delete(t.Context(), existing) — after calling Delete
check the returned error and either t.Logf the failure with the error details or
t.Fatalf if deletion must succeed; keep waitForUIPluginDeletion(existing) only
after a successful delete, and reference the existing variable and
f.K8sClient.Delete to locate the change.

43-45: GetResourceWithRetry is redundant before AssertDeploymentReady.

AssertDeploymentReady already polls until the deployment exists and is ready, so the prior GetResourceWithRetry call adds unnecessary overhead without additional guarantees.

Proposed simplification
 	t.Log("Waiting for health-analyzer deployment to become ready...")
-	haDeployment := appsv1.Deployment{}
-	f.GetResourceWithRetry(t, healthAnalyzerDeploymentName, uiPluginInstallNS, &haDeployment)
 	f.AssertDeploymentReady(healthAnalyzerDeploymentName, uiPluginInstallNS, framework.WithTimeout(5*time.Minute))(t)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/uiplugin_cluster_health_analyzer_test.go` around lines 43 - 45,
Remove the redundant call to GetResourceWithRetry: the test currently calls
f.GetResourceWithRetry(t, healthAnalyzerDeploymentName, uiPluginInstallNS,
&haDeployment) before f.AssertDeploymentReady(...), but AssertDeploymentReady
already waits for the deployment to exist and be ready; delete the
GetResourceWithRetry invocation and keep
AssertDeploymentReady(healthAnalyzerDeploymentName, uiPluginInstallNS,
framework.WithTimeout(5*time.Minute))(t); if you actually need the deployment
object (haDeployment) after readiness, call a single non-retrying getter (e.g.,
GetResource) or populate haDeployment after AssertDeploymentReady returns.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/uiplugin_cluster_health_analyzer_test.go`:
- Line 30: monv1.AddToScheme currently ignores its returned error; update the
call to capture and handle the error (e.g., err :=
monv1.AddToScheme(f.K8sClient.Scheme()); if err != nil { t.Fatalf("failed to add
monv1 to scheme: %v", err) }) so the test fails loudly on scheme registration
problems; refer to the monv1.AddToScheme call and f.K8sClient.Scheme() to locate
where to add the error check and use the test context (t) to report the failure.

---

Nitpick comments:
In `@test/e2e/uiplugin_cluster_health_analyzer_test.go`:
- Around line 101-109: The Delete call on the pre-existing UIPlugin is ignored;
update the block that finds an existing UIPlugin (variable existing) to capture
and handle the error returned from f.K8sClient.Delete(t.Context(), existing) —
after calling Delete check the returned error and either t.Logf the failure with
the error details or t.Fatalf if deletion must succeed; keep
waitForUIPluginDeletion(existing) only after a successful delete, and reference
the existing variable and f.K8sClient.Delete to locate the change.
- Around line 43-45: Remove the redundant call to GetResourceWithRetry: the test
currently calls f.GetResourceWithRetry(t, healthAnalyzerDeploymentName,
uiPluginInstallNS, &haDeployment) before f.AssertDeploymentReady(...), but
AssertDeploymentReady already waits for the deployment to exist and be ready;
delete the GetResourceWithRetry invocation and keep
AssertDeploymentReady(healthAnalyzerDeploymentName, uiPluginInstallNS,
framework.WithTimeout(5*time.Minute))(t); if you actually need the deployment
object (haDeployment) after readiness, call a single non-retrying getter (e.g.,
GetResource) or populate haDeployment after AssertDeploymentReady returns.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 47e2c224-67bf-4d35-b07e-a68060e8765c

📥 Commits

Reviewing files that changed from the base of the PR and between cbd6ba3 and bfb1cf6.

📒 Files selected for processing (4)
  • test/e2e/framework/assertions.go
  • test/e2e/framework/framework.go
  • test/e2e/uiplugin_cluster_health_analyzer_test.go
  • test/e2e/uiplugin_test.go

@DavidRajnoha DavidRajnoha force-pushed the test/e2e-uiplugin-cluster-health-analyzer branch from bfb1cf6 to 7dc043f Compare April 1, 2026 10:16
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
test/e2e/uiplugin_cluster_health_analyzer_test.go (1)

48-50: Use a collision-resistant suffix for test resource names.

UnixNano()%100000 gives only 100k possibilities; parallel/retried CI runs can collide and fail with AlreadyExists. Prefer UUID/full nanoseconds.

Proposed fix
-	suffix := strconv.FormatInt(time.Now().UnixNano()%100000, 10)
+	suffix := strconv.FormatInt(time.Now().UnixNano(), 10)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/uiplugin_cluster_health_analyzer_test.go` around lines 48 - 50, The
test uses a short, collision-prone suffix (suffix :=
strconv.FormatInt(time.Now().UnixNano()%100000, 10)) to build ruleName and
alertName; replace that suffix with a collision-resistant value (e.g.,
uuid.New().String() or the full time.Now().UnixNano() without modulus) and
update imports accordingly so ruleName ("e2e-health-analyzer-"+suffix) and
alertName ("E2EHealthAnalyzer"+suffix) use the new suffix generation to avoid
AlreadyExists failures in parallel CI.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/uiplugin_cluster_health_analyzer_test.go`:
- Around line 106-107: The Delete calls like f.K8sClient.Delete(t.Context(),
existing) currently ignore returned errors which can mask API failures; update
each deletion (the calls that precede waitForUIPluginDeletion(existing) and the
similar calls later) to capture the error (e.g., err :=
f.K8sClient.Delete(t.Context(), existing)) and fail the test immediately on
error (use t.Fatalf or your test helper like require.NoError(t, err)) before
calling waitForUIPluginDeletion(existing) so the real delete failure is reported
instead of timing out waiting for deletion.
- Around line 102-116: Currently the test permanently deletes a pre-existing
UIPlugin named "monitoring" and only cleans up the recreated plugin; instead,
detect the existing resource (existing from f.K8sClient.Get), make a DeepCopy or
clone of it into a variable like originalPlugin, then proceed to delete it to
allow recreation but register a cleanup that will restore originalPlugin (using
f.K8sClient.Create or Update) after the test finishes; also keep the existing
cleanup for the newly created plugin (plugin) and ensure waitForUIPluginDeletion
is called as needed and errors from restore/create are handled and logged.

---

Nitpick comments:
In `@test/e2e/uiplugin_cluster_health_analyzer_test.go`:
- Around line 48-50: The test uses a short, collision-prone suffix (suffix :=
strconv.FormatInt(time.Now().UnixNano()%100000, 10)) to build ruleName and
alertName; replace that suffix with a collision-resistant value (e.g.,
uuid.New().String() or the full time.Now().UnixNano() without modulus) and
update imports accordingly so ruleName ("e2e-health-analyzer-"+suffix) and
alertName ("E2EHealthAnalyzer"+suffix) use the new suffix generation to avoid
AlreadyExists failures in parallel CI.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 370d48ba-2d12-4849-b8bd-1d792776f68f

📥 Commits

Reviewing files that changed from the base of the PR and between bfb1cf6 and 7dc043f.

📒 Files selected for processing (2)
  • test/e2e/framework/framework.go
  • test/e2e/uiplugin_cluster_health_analyzer_test.go
✅ Files skipped from review due to trivial changes (1)
  • test/e2e/framework/framework.go

@DavidRajnoha
Copy link
Copy Markdown
Author

/retest

@DavidRajnoha DavidRajnoha force-pushed the test/e2e-uiplugin-cluster-health-analyzer branch from 7dc043f to 25f4b52 Compare April 1, 2026 12:05
Validate that the Monitoring UIPlugin with ClusterHealthAnalyzer enabled
deploys the health-analyzer and correctly processes alerts into incident
metrics. The test creates a crashing pod, waits for the corresponding
PrometheusRule alert to fire, then verifies the cluster_health_components_map
metric is exposed with the expected labels.

Also extends the framework's AssertPromQLResult with configurable timeout
and poll interval options (AssertPromQLResultWithOptions).

Made-with: Cursor
@DavidRajnoha DavidRajnoha force-pushed the test/e2e-uiplugin-cluster-health-analyzer branch from 25f4b52 to a0245db Compare April 1, 2026 12:07
@DavidRajnoha
Copy link
Copy Markdown
Author

@simonpasquier I updated the test to use a static rule. For reference, here is the current timing breakdown of the test:

Step Duration Elapsed
UIPlugin creation 824ms 0.8s
Deployment ready 5.4s 6.2s
PrometheusRule creation 239ms 6.4s
Alert firing (PromQL poll) 32.2s 38.6s
Incident metric (PromQL poll) 32.0s 1m 10.6s

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/framework/framework.go`:
- Around line 297-302: The DumpNamespaceDebug method uses t.Context() which is
cancelled before t.Cleanup runs; replace the cancelled context with a
non-cancelled one (e.g. use context.Background() or a background context with a
timeout) so the List calls inside DumpNamespaceDebug succeed during cleanup.
Update the ctx variable in Framework.DumpNamespaceDebug (replace ctx :=
t.Context()) to use a persistent background context and ensure the context
package is imported; keep all List calls (deployments, pods, events) using this
new ctx.

In `@test/e2e/uiplugin_cluster_health_analyzer_test.go`:
- Around line 42-46: The cleanup callbacks and the debug helper are using
t.Context() which is already canceled when t.Cleanup runs; replace usages of
t.Context() in the t.Cleanup closures and inside dumpClusterHealthAnalyzerDebug
by creating an uncancelable context first (ctx :=
context.WithoutCancel(t.Context())) and pass that ctx into any API calls
(delete/get/list) and into dumpClusterHealthAnalyzerDebug instead of
t.Context(); update the anonymous t.Cleanup funcs and the
dumpClusterHealthAnalyzerDebug function to accept/use the new ctx so
cleanup-time API calls are not given a canceled context.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 99ba088d-c938-4352-9e06-056a58ac5d82

📥 Commits

Reviewing files that changed from the base of the PR and between 7dc043f and 25f4b52.

📒 Files selected for processing (4)
  • test/e2e/framework/assertions.go
  • test/e2e/framework/framework.go
  • test/e2e/uiplugin_cluster_health_analyzer_test.go
  • test/e2e/uiplugin_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • test/e2e/uiplugin_test.go
  • test/e2e/framework/assertions.go

Comment on lines +297 to +302
// DumpNamespaceDebug logs deployments (with conditions), pods (with container
// statuses), and events for the given namespace. Useful as a t.Cleanup or
// on-failure diagnostic helper.
func (f *Framework) DumpNamespaceDebug(t *testing.T, namespace string) {
t.Helper()
ctx := t.Context()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n test/e2e/framework/framework.go | sed -n '297,354p'

Repository: rhobs/observability-operator

Length of output: 2987


🏁 Script executed:

head -20 test/e2e/framework/framework.go

Repository: rhobs/observability-operator

Length of output: 391


🌐 Web query:

Go testing.T.Context() cleanup callbacks behavior when context is canceled

💡 Result:

In Go 1.24 and later, testing.T.Context returns a context.Context that is canceled just before any T.Cleanup-registered functions are called. This allows cleanup functions to wait for resources (e.g., goroutines) that shut down when listening on ctx.Done= before the test or subtest completes. When the context is canceled (ctx.Done= is closed): - Cleanup callbacks registered via T.Cleanup are executed after the cancellation. - The context error (ctx.Err)= is context.Canceled (not DeadlineExceeded, unless test timeout interacts). - Goroutines or resources watching <-ctx.Done= can perform shutdown before cleanups run, ensuring proper ordering. Example usage: func TestFoo(t *testing.T) { ctx := t.Context var wg sync.WaitGroup wg.Add(1) go func { <-ctx.Done // Detects cancellation // Cleanup resources wg.Done } t.Cleanup(wg.Wait) // Waits after ctx canceled } This behavior was finalized in proposal golang/go#36532 (accepted Aug 2024, Go 1.24). Note: Do not use t.Context inside Cleanup functions, as it may lead to unexpected cancellation of cleanup logic itself (observed in practice). Use context.Background= or manual contexts for cleanup internals.

Citations:


🏁 Script executed:

rg "DumpNamespaceDebug" --type go -B 2 -A 5

Repository: rhobs/observability-operator

Length of output: 1156


🏁 Script executed:

# Check Go version requirement in go.mod
cat go.mod | head -5

Repository: rhobs/observability-operator

Length of output: 140


🏁 Script executed:

rg "t\.Cleanup" test/e2e/ -B 3 -A 3 --type go

Repository: rhobs/observability-operator

Length of output: 1524


🏁 Script executed:

cat -n test/e2e/uiplugin_cluster_health_analyzer_test.go | sed -n '1,50p'

Repository: rhobs/observability-operator

Length of output: 1983


🏁 Script executed:

# Check if context.WithoutCancel is available in Go 1.25
python3 <<'EOF'
# context.WithoutCancel was introduced in Go 1.21
# The project uses Go 1.25.5, so it's definitely available
print("context.WithoutCancel is available in Go 1.21+")
print("Project requires Go 1.25.5, so the API is available.")
EOF

Repository: rhobs/observability-operator

Length of output: 171


🏁 Script executed:

# Verify if the imports include context
rg "import.*context" test/e2e/framework/framework.go

Repository: rhobs/observability-operator

Length of output: 54


🏁 Script executed:

# Check the imports in framework.go to see if context is imported
head -30 test/e2e/framework/framework.go | grep -E "^import|context"

Repository: rhobs/observability-operator

Length of output: 91


Use a non-cancelled context in DumpNamespaceDebug.

This helper is documented for t.Cleanup, but testing.T.Context() is cancelled before cleanup callbacks run (Go 1.24+). When called from cleanup, the List calls degrade to context canceled, so the namespace dump fails exactly when diagnostic output is needed most.

Suggested fix
func (f *Framework) DumpNamespaceDebug(t *testing.T, namespace string) {
	t.Helper()
-	ctx := t.Context()
+	ctx := context.WithoutCancel(t.Context())
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// DumpNamespaceDebug logs deployments (with conditions), pods (with container
// statuses), and events for the given namespace. Useful as a t.Cleanup or
// on-failure diagnostic helper.
func (f *Framework) DumpNamespaceDebug(t *testing.T, namespace string) {
t.Helper()
ctx := t.Context()
// DumpNamespaceDebug logs deployments (with conditions), pods (with container
// statuses), and events for the given namespace. Useful as a t.Cleanup or
// on-failure diagnostic helper.
func (f *Framework) DumpNamespaceDebug(t *testing.T, namespace string) {
t.Helper()
ctx := context.WithoutCancel(t.Context())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/framework/framework.go` around lines 297 - 302, The
DumpNamespaceDebug method uses t.Context() which is cancelled before t.Cleanup
runs; replace the cancelled context with a non-cancelled one (e.g. use
context.Background() or a background context with a timeout) so the List calls
inside DumpNamespaceDebug succeed during cleanup. Update the ctx variable in
Framework.DumpNamespaceDebug (replace ctx := t.Context()) to use a persistent
background context and ensure the context package is imported; keep all List
calls (deployments, pods, events) using this new ctx.

Comment on lines +42 to +46
t.Cleanup(func() {
if t.Failed() {
dumpClusterHealthAnalyzerDebug(t, plugin.Name)
}
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check the test file mentioned in the review
head -250 test/e2e/uiplugin_cluster_health_analyzer_test.go

Repository: rhobs/observability-operator

Length of output: 6648


🏁 Script executed:

# Check the framework implementation to understand how f.CleanUp works
cat -n test/e2e/framework/framework.go | sed -n '250,270p'

Repository: rhobs/observability-operator

Length of output: 885


🏁 Script executed:

# Check if there are other usages of t.Context() in cleanup-related code
rg "t\.Context\(\)" test/e2e/uiplugin_cluster_health_analyzer_test.go -B 2 -A 2

Repository: rhobs/observability-operator

Length of output: 1268


Use context.WithoutCancel() to avoid passing canceled context to cleanup-time API calls.

During cleanup, t.Context() is already canceled since t.Cleanup() callbacks execute after the test completes. This causes delete/get/list operations to fail with "context canceled", resulting in resource leaks and incomplete debug output.

Apply the fix at lines 42-46, 129-134, 164-168, and 172-175 by wrapping t.Context() with context.WithoutCancel():

Fix details
import (
+	"context"
 	"fmt"
 	"strconv"
 	"testing"
 	"time"

In cleanup closures (lines 129-134, 164-168):

 	f.CleanUp(t, func() {
-		if err := f.K8sClient.Delete(t.Context(), plugin); err != nil && !errors.IsNotFound(err) {
+		ctx := context.WithoutCancel(t.Context())
+		if err := f.K8sClient.Delete(ctx, plugin); err != nil && !errors.IsNotFound(err) {

In dumpClusterHealthAnalyzerDebug (line 172):

 func dumpClusterHealthAnalyzerDebug(t *testing.T, pluginName string) {
 	t.Helper()
-	ctx := t.Context()
+	ctx := context.WithoutCancel(t.Context())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/uiplugin_cluster_health_analyzer_test.go` around lines 42 - 46, The
cleanup callbacks and the debug helper are using t.Context() which is already
canceled when t.Cleanup runs; replace usages of t.Context() in the t.Cleanup
closures and inside dumpClusterHealthAnalyzerDebug by creating an uncancelable
context first (ctx := context.WithoutCancel(t.Context())) and pass that ctx into
any API calls (delete/get/list) and into dumpClusterHealthAnalyzerDebug instead
of t.Context(); update the anonymous t.Cleanup funcs and the
dumpClusterHealthAnalyzerDebug function to accept/use the new ctx so
cleanup-time API calls are not given a canceled context.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
test/e2e/framework/framework.go (1)

300-303: ⚠️ Potential issue | 🟠 Major

Use a non-cancelled context in cleanup diagnostics (already reported).

Line 302 uses t.Context(). When this helper is invoked from t.Cleanup, list calls can fail with context canceled, which defeats failure-time diagnostics.

Suggested fix
 func (f *Framework) DumpNamespaceDebug(t *testing.T, namespace string) {
 	t.Helper()
-	ctx := t.Context()
+	ctx := context.WithoutCancel(t.Context())
In Go 1.24+ testing package behavior: is testing.T.Context canceled before T.Cleanup callbacks run?

As per coding guidelines "**: Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/framework/framework.go` around lines 300 - 303, The helper
DumpNamespaceDebug currently grabs ctx from t.Context() which can be canceled
when run inside t.Cleanup; change it to use a non-cancelled context (e.g., ctx
:= context.Background() or context.TODO()) so diagnostic list calls in
DumpNamespaceDebug won’t fail with “context canceled”; update the function
(DumpNamespaceDebug) to import context if needed and replace the t.Context()
usage with the background context.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@test/e2e/framework/framework.go`:
- Around line 300-303: The helper DumpNamespaceDebug currently grabs ctx from
t.Context() which can be canceled when run inside t.Cleanup; change it to use a
non-cancelled context (e.g., ctx := context.Background() or context.TODO()) so
diagnostic list calls in DumpNamespaceDebug won’t fail with “context canceled”;
update the function (DumpNamespaceDebug) to import context if needed and replace
the t.Context() usage with the background context.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 566b5429-753e-4378-bcfa-22e74ddb0c0d

📥 Commits

Reviewing files that changed from the base of the PR and between 25f4b52 and a0245db.

📒 Files selected for processing (4)
  • test/e2e/framework/assertions.go
  • test/e2e/framework/framework.go
  • test/e2e/uiplugin_cluster_health_analyzer_test.go
  • test/e2e/uiplugin_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • test/e2e/uiplugin_test.go
  • test/e2e/framework/assertions.go
  • test/e2e/uiplugin_cluster_health_analyzer_test.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants