OBSINTA-1219: add e2e tests for UIPlugin incident detection#1038
OBSINTA-1219: add e2e tests for UIPlugin incident detection#1038DavidRajnoha wants to merge 1 commit intorhobs:mainfrom
Conversation
|
Hi @DavidRajnoha. Thanks for your PR. I'm waiting for a rhobs member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
89ac678 to
0768ae2
Compare
|
/ok-to-test |
|
/retest |
|
@DavidRajnoha: This pull request references OBSINTA-1219 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
970ac8c to
90b891c
Compare
| deployName := "e2e-crasher-" + suffix | ||
|
|
||
| rule := newCrashLoopRule(t, ruleName, alertName, deployName) | ||
| err = f.K8sClient.Create(context.Background(), rule) |
There was a problem hiding this comment.
nit: you can use t.Context()
|
/hold |
|
/test observability-operator-e2e |
|
/retest |
90b891c to
23c1e1d
Compare
|
/unhold |
|
Thank you! |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: DavidRajnoha, tremes The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
simonpasquier
left a comment
There was a problem hiding this comment.
I've checked the test output and it adds 3m30s to the total duration which is not negligible. Do we really need to deploy a crashlooping pod? Can't we write a "dummy" alerting rule with a static expression (vector(1)) and hardcoded labels?
|
|
||
| t.Log("Waiting for health-analyzer deployment to become ready...") | ||
| haDeployment := appsv1.Deployment{} | ||
| f.GetResourceWithRetry(t, healthAnalyzerDeploymentName, uiPluginInstallNS, &haDeployment) |
There was a problem hiding this comment.
(nit) do we really to get the resource since we check that it's ready just after?
There was a problem hiding this comment.
The getResourceWithRetry has a shorter timeout and provides explicit error message in case when the deployment is never created.
| Expr: intstr.FromString(fmt.Sprintf( | ||
| `max_over_time(kube_pod_container_status_waiting_reason{reason="CrashLoopBackOff", namespace="%s", pod=~"%s.*", job="kube-state-metrics"}[5m]) >= 1`, | ||
| e2eTestNamespace, podPrefix)), | ||
| For: ptr.To(monv1.Duration("1m")), |
There was a problem hiding this comment.
removing the for clause would speed up the test.
| return *p | ||
| } | ||
|
|
||
| func skipIfClusterVersionBelow(t *testing.T, minVersion string) { |
There was a problem hiding this comment.
(nit) could it be a framework function?
| t.Log("=== END DEBUG DUMP ===") | ||
| } | ||
|
|
||
| func ptrInt32(p *int32) int32 { |
There was a problem hiding this comment.
(nit) ptr.To() already exists
| } | ||
| } | ||
|
|
||
| func dumpClusterHealthAnalyzerDebug(t *testing.T, pluginName string) { |
There was a problem hiding this comment.
eventually this should be generalized for reuse across different tests.
| assert.NilError(t, err, "alert %s never fired", alertName) | ||
|
|
||
| t.Log("Waiting for cluster-health-analyzer to expose incident metric...") | ||
| incidentQuery := fmt.Sprintf(`cluster_health_components_map{src_alertname="%s"}`, alertName) |
There was a problem hiding this comment.
apart from checking the metric generated by the cluster health analyzer, is there any other outcome that we should verify?
| return fmt.Errorf("expected incident metric, got: %v", v) | ||
| } | ||
| for _, sample := range vec { | ||
| if string(sample.Metric["src_alertname"]) != alertName { |
There was a problem hiding this comment.
already checked by the PromQL expression
| return fmt.Errorf("expected src_alertname=%s, got %s", alertName, sample.Metric["src_alertname"]) | ||
| } | ||
| if string(sample.Metric["src_severity"]) != "warning" { | ||
| return fmt.Errorf("expected src_severity=warning, got %s", sample.Metric["src_severity"]) |
There was a problem hiding this comment.
could also be checked in the promql expression
|
New changes are detected. LGTM label has been removed. |
|
/hold |
📝 WalkthroughWalkthroughAdds a configurable PromQL assertion helper, framework utilities to skip tests by cluster version and dump namespace diagnostics, and a new E2E test that deploys an always-firing PrometheusRule to verify the cluster health analyzer produces an incident metric. Changes
Sequence Diagram(s)sequenceDiagram
participant TestRunner as Test Runner
participant Framework as E2E Framework
participant K8s as Kubernetes API
participant Prom as Prometheus
participant Analyzer as Health Analyzer
TestRunner->>Framework: clusterHealthAnalyzer()
Framework->>K8s: Get ClusterVersion (skip if < minVersion)
Framework->>K8s: Ensure health-analyzer Deployment ready
TestRunner->>K8s: Create PrometheusRule (always-firing)
loop Poll until alert fires
TestRunner->>Prom: Query ALERTS for rule
Prom-->>TestRunner: alert state
end
loop Poll until incident metric appears
TestRunner->>Prom: Query cluster_health_components_map by alert
Prom-->>TestRunner: incident metric
end
TestRunner->>K8s: Cleanup PrometheusRule & resources
alt on failure
TestRunner->>Framework: DumpNamespaceDebug(...) and other diagnostics
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.4)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/test observability-operator-e2e |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
test/e2e/uiplugin_cluster_health_analyzer_test.go (2)
101-109: Consider checking the Delete error for pre-existing UIPlugin.While rare, the
Deletecall on line 105 could fail for reasons other than the resource being already gone. Logging or checking this error would improve debuggability.Proposed fix
if err == nil { t.Log("UIPlugin 'monitoring' already exists, deleting before recreation...") - f.K8sClient.Delete(t.Context(), existing) + if err := f.K8sClient.Delete(t.Context(), existing); err != nil && !errors.IsNotFound(err) { + t.Logf("Warning: failed to delete existing UIPlugin: %v", err) + } waitForUIPluginDeletion(existing)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/uiplugin_cluster_health_analyzer_test.go` around lines 101 - 109, The Delete call on the pre-existing UIPlugin is ignored; update the block that finds an existing UIPlugin (variable existing) to capture and handle the error returned from f.K8sClient.Delete(t.Context(), existing) — after calling Delete check the returned error and either t.Logf the failure with the error details or t.Fatalf if deletion must succeed; keep waitForUIPluginDeletion(existing) only after a successful delete, and reference the existing variable and f.K8sClient.Delete to locate the change.
43-45:GetResourceWithRetryis redundant beforeAssertDeploymentReady.
AssertDeploymentReadyalready polls until the deployment exists and is ready, so the priorGetResourceWithRetrycall adds unnecessary overhead without additional guarantees.Proposed simplification
t.Log("Waiting for health-analyzer deployment to become ready...") - haDeployment := appsv1.Deployment{} - f.GetResourceWithRetry(t, healthAnalyzerDeploymentName, uiPluginInstallNS, &haDeployment) f.AssertDeploymentReady(healthAnalyzerDeploymentName, uiPluginInstallNS, framework.WithTimeout(5*time.Minute))(t)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/uiplugin_cluster_health_analyzer_test.go` around lines 43 - 45, Remove the redundant call to GetResourceWithRetry: the test currently calls f.GetResourceWithRetry(t, healthAnalyzerDeploymentName, uiPluginInstallNS, &haDeployment) before f.AssertDeploymentReady(...), but AssertDeploymentReady already waits for the deployment to exist and be ready; delete the GetResourceWithRetry invocation and keep AssertDeploymentReady(healthAnalyzerDeploymentName, uiPluginInstallNS, framework.WithTimeout(5*time.Minute))(t); if you actually need the deployment object (haDeployment) after readiness, call a single non-retrying getter (e.g., GetResource) or populate haDeployment after AssertDeploymentReady returns.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/uiplugin_cluster_health_analyzer_test.go`:
- Line 30: monv1.AddToScheme currently ignores its returned error; update the
call to capture and handle the error (e.g., err :=
monv1.AddToScheme(f.K8sClient.Scheme()); if err != nil { t.Fatalf("failed to add
monv1 to scheme: %v", err) }) so the test fails loudly on scheme registration
problems; refer to the monv1.AddToScheme call and f.K8sClient.Scheme() to locate
where to add the error check and use the test context (t) to report the failure.
---
Nitpick comments:
In `@test/e2e/uiplugin_cluster_health_analyzer_test.go`:
- Around line 101-109: The Delete call on the pre-existing UIPlugin is ignored;
update the block that finds an existing UIPlugin (variable existing) to capture
and handle the error returned from f.K8sClient.Delete(t.Context(), existing) —
after calling Delete check the returned error and either t.Logf the failure with
the error details or t.Fatalf if deletion must succeed; keep
waitForUIPluginDeletion(existing) only after a successful delete, and reference
the existing variable and f.K8sClient.Delete to locate the change.
- Around line 43-45: Remove the redundant call to GetResourceWithRetry: the test
currently calls f.GetResourceWithRetry(t, healthAnalyzerDeploymentName,
uiPluginInstallNS, &haDeployment) before f.AssertDeploymentReady(...), but
AssertDeploymentReady already waits for the deployment to exist and be ready;
delete the GetResourceWithRetry invocation and keep
AssertDeploymentReady(healthAnalyzerDeploymentName, uiPluginInstallNS,
framework.WithTimeout(5*time.Minute))(t); if you actually need the deployment
object (haDeployment) after readiness, call a single non-retrying getter (e.g.,
GetResource) or populate haDeployment after AssertDeploymentReady returns.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 47e2c224-67bf-4d35-b07e-a68060e8765c
📒 Files selected for processing (4)
test/e2e/framework/assertions.gotest/e2e/framework/framework.gotest/e2e/uiplugin_cluster_health_analyzer_test.gotest/e2e/uiplugin_test.go
bfb1cf6 to
7dc043f
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
test/e2e/uiplugin_cluster_health_analyzer_test.go (1)
48-50: Use a collision-resistant suffix for test resource names.
UnixNano()%100000gives only 100k possibilities; parallel/retried CI runs can collide and fail withAlreadyExists. Prefer UUID/full nanoseconds.Proposed fix
- suffix := strconv.FormatInt(time.Now().UnixNano()%100000, 10) + suffix := strconv.FormatInt(time.Now().UnixNano(), 10)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/uiplugin_cluster_health_analyzer_test.go` around lines 48 - 50, The test uses a short, collision-prone suffix (suffix := strconv.FormatInt(time.Now().UnixNano()%100000, 10)) to build ruleName and alertName; replace that suffix with a collision-resistant value (e.g., uuid.New().String() or the full time.Now().UnixNano() without modulus) and update imports accordingly so ruleName ("e2e-health-analyzer-"+suffix) and alertName ("E2EHealthAnalyzer"+suffix) use the new suffix generation to avoid AlreadyExists failures in parallel CI.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/uiplugin_cluster_health_analyzer_test.go`:
- Around line 106-107: The Delete calls like f.K8sClient.Delete(t.Context(),
existing) currently ignore returned errors which can mask API failures; update
each deletion (the calls that precede waitForUIPluginDeletion(existing) and the
similar calls later) to capture the error (e.g., err :=
f.K8sClient.Delete(t.Context(), existing)) and fail the test immediately on
error (use t.Fatalf or your test helper like require.NoError(t, err)) before
calling waitForUIPluginDeletion(existing) so the real delete failure is reported
instead of timing out waiting for deletion.
- Around line 102-116: Currently the test permanently deletes a pre-existing
UIPlugin named "monitoring" and only cleans up the recreated plugin; instead,
detect the existing resource (existing from f.K8sClient.Get), make a DeepCopy or
clone of it into a variable like originalPlugin, then proceed to delete it to
allow recreation but register a cleanup that will restore originalPlugin (using
f.K8sClient.Create or Update) after the test finishes; also keep the existing
cleanup for the newly created plugin (plugin) and ensure waitForUIPluginDeletion
is called as needed and errors from restore/create are handled and logged.
---
Nitpick comments:
In `@test/e2e/uiplugin_cluster_health_analyzer_test.go`:
- Around line 48-50: The test uses a short, collision-prone suffix (suffix :=
strconv.FormatInt(time.Now().UnixNano()%100000, 10)) to build ruleName and
alertName; replace that suffix with a collision-resistant value (e.g.,
uuid.New().String() or the full time.Now().UnixNano() without modulus) and
update imports accordingly so ruleName ("e2e-health-analyzer-"+suffix) and
alertName ("E2EHealthAnalyzer"+suffix) use the new suffix generation to avoid
AlreadyExists failures in parallel CI.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 370d48ba-2d12-4849-b8bd-1d792776f68f
📒 Files selected for processing (2)
test/e2e/framework/framework.gotest/e2e/uiplugin_cluster_health_analyzer_test.go
✅ Files skipped from review due to trivial changes (1)
- test/e2e/framework/framework.go
|
/retest |
7dc043f to
25f4b52
Compare
Validate that the Monitoring UIPlugin with ClusterHealthAnalyzer enabled deploys the health-analyzer and correctly processes alerts into incident metrics. The test creates a crashing pod, waits for the corresponding PrometheusRule alert to fire, then verifies the cluster_health_components_map metric is exposed with the expected labels. Also extends the framework's AssertPromQLResult with configurable timeout and poll interval options (AssertPromQLResultWithOptions). Made-with: Cursor
25f4b52 to
a0245db
Compare
|
@simonpasquier I updated the test to use a static rule. For reference, here is the current timing breakdown of the test:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/framework/framework.go`:
- Around line 297-302: The DumpNamespaceDebug method uses t.Context() which is
cancelled before t.Cleanup runs; replace the cancelled context with a
non-cancelled one (e.g. use context.Background() or a background context with a
timeout) so the List calls inside DumpNamespaceDebug succeed during cleanup.
Update the ctx variable in Framework.DumpNamespaceDebug (replace ctx :=
t.Context()) to use a persistent background context and ensure the context
package is imported; keep all List calls (deployments, pods, events) using this
new ctx.
In `@test/e2e/uiplugin_cluster_health_analyzer_test.go`:
- Around line 42-46: The cleanup callbacks and the debug helper are using
t.Context() which is already canceled when t.Cleanup runs; replace usages of
t.Context() in the t.Cleanup closures and inside dumpClusterHealthAnalyzerDebug
by creating an uncancelable context first (ctx :=
context.WithoutCancel(t.Context())) and pass that ctx into any API calls
(delete/get/list) and into dumpClusterHealthAnalyzerDebug instead of
t.Context(); update the anonymous t.Cleanup funcs and the
dumpClusterHealthAnalyzerDebug function to accept/use the new ctx so
cleanup-time API calls are not given a canceled context.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 99ba088d-c938-4352-9e06-056a58ac5d82
📒 Files selected for processing (4)
test/e2e/framework/assertions.gotest/e2e/framework/framework.gotest/e2e/uiplugin_cluster_health_analyzer_test.gotest/e2e/uiplugin_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- test/e2e/uiplugin_test.go
- test/e2e/framework/assertions.go
| // DumpNamespaceDebug logs deployments (with conditions), pods (with container | ||
| // statuses), and events for the given namespace. Useful as a t.Cleanup or | ||
| // on-failure diagnostic helper. | ||
| func (f *Framework) DumpNamespaceDebug(t *testing.T, namespace string) { | ||
| t.Helper() | ||
| ctx := t.Context() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n test/e2e/framework/framework.go | sed -n '297,354p'Repository: rhobs/observability-operator
Length of output: 2987
🏁 Script executed:
head -20 test/e2e/framework/framework.goRepository: rhobs/observability-operator
Length of output: 391
🌐 Web query:
Go testing.T.Context() cleanup callbacks behavior when context is canceled
💡 Result:
In Go 1.24 and later, testing.T.Context returns a context.Context that is canceled just before any T.Cleanup-registered functions are called. This allows cleanup functions to wait for resources (e.g., goroutines) that shut down when listening on ctx.Done= before the test or subtest completes. When the context is canceled (ctx.Done= is closed): - Cleanup callbacks registered via T.Cleanup are executed after the cancellation. - The context error (ctx.Err)= is context.Canceled (not DeadlineExceeded, unless test timeout interacts). - Goroutines or resources watching <-ctx.Done= can perform shutdown before cleanups run, ensuring proper ordering. Example usage: func TestFoo(t *testing.T) { ctx := t.Context var wg sync.WaitGroup wg.Add(1) go func { <-ctx.Done // Detects cancellation // Cleanup resources wg.Done } t.Cleanup(wg.Wait) // Waits after ctx canceled } This behavior was finalized in proposal golang/go#36532 (accepted Aug 2024, Go 1.24). Note: Do not use t.Context inside Cleanup functions, as it may lead to unexpected cancellation of cleanup logic itself (observed in practice). Use context.Background= or manual contexts for cleanup internals.
Citations:
- 1: testing: reconsider adding Context method to testing.T golang/go#36532
- 2: https://pkg.go.dev/testing
- 3: https://pkg.go.dev/testing@go1.25.6
- 4: https://pkg.go.dev/testing/
- 5: https://pkg.go.dev/testing@go1.24.3
- 6: https://pkg.go.dev/testing@go1.24.2
🏁 Script executed:
rg "DumpNamespaceDebug" --type go -B 2 -A 5Repository: rhobs/observability-operator
Length of output: 1156
🏁 Script executed:
# Check Go version requirement in go.mod
cat go.mod | head -5Repository: rhobs/observability-operator
Length of output: 140
🏁 Script executed:
rg "t\.Cleanup" test/e2e/ -B 3 -A 3 --type goRepository: rhobs/observability-operator
Length of output: 1524
🏁 Script executed:
cat -n test/e2e/uiplugin_cluster_health_analyzer_test.go | sed -n '1,50p'Repository: rhobs/observability-operator
Length of output: 1983
🏁 Script executed:
# Check if context.WithoutCancel is available in Go 1.25
python3 <<'EOF'
# context.WithoutCancel was introduced in Go 1.21
# The project uses Go 1.25.5, so it's definitely available
print("context.WithoutCancel is available in Go 1.21+")
print("Project requires Go 1.25.5, so the API is available.")
EOFRepository: rhobs/observability-operator
Length of output: 171
🏁 Script executed:
# Verify if the imports include context
rg "import.*context" test/e2e/framework/framework.goRepository: rhobs/observability-operator
Length of output: 54
🏁 Script executed:
# Check the imports in framework.go to see if context is imported
head -30 test/e2e/framework/framework.go | grep -E "^import|context"Repository: rhobs/observability-operator
Length of output: 91
Use a non-cancelled context in DumpNamespaceDebug.
This helper is documented for t.Cleanup, but testing.T.Context() is cancelled before cleanup callbacks run (Go 1.24+). When called from cleanup, the List calls degrade to context canceled, so the namespace dump fails exactly when diagnostic output is needed most.
Suggested fix
func (f *Framework) DumpNamespaceDebug(t *testing.T, namespace string) {
t.Helper()
- ctx := t.Context()
+ ctx := context.WithoutCancel(t.Context())📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // DumpNamespaceDebug logs deployments (with conditions), pods (with container | |
| // statuses), and events for the given namespace. Useful as a t.Cleanup or | |
| // on-failure diagnostic helper. | |
| func (f *Framework) DumpNamespaceDebug(t *testing.T, namespace string) { | |
| t.Helper() | |
| ctx := t.Context() | |
| // DumpNamespaceDebug logs deployments (with conditions), pods (with container | |
| // statuses), and events for the given namespace. Useful as a t.Cleanup or | |
| // on-failure diagnostic helper. | |
| func (f *Framework) DumpNamespaceDebug(t *testing.T, namespace string) { | |
| t.Helper() | |
| ctx := context.WithoutCancel(t.Context()) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/e2e/framework/framework.go` around lines 297 - 302, The
DumpNamespaceDebug method uses t.Context() which is cancelled before t.Cleanup
runs; replace the cancelled context with a non-cancelled one (e.g. use
context.Background() or a background context with a timeout) so the List calls
inside DumpNamespaceDebug succeed during cleanup. Update the ctx variable in
Framework.DumpNamespaceDebug (replace ctx := t.Context()) to use a persistent
background context and ensure the context package is imported; keep all List
calls (deployments, pods, events) using this new ctx.
| t.Cleanup(func() { | ||
| if t.Failed() { | ||
| dumpClusterHealthAnalyzerDebug(t, plugin.Name) | ||
| } | ||
| }) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, check the test file mentioned in the review
head -250 test/e2e/uiplugin_cluster_health_analyzer_test.goRepository: rhobs/observability-operator
Length of output: 6648
🏁 Script executed:
# Check the framework implementation to understand how f.CleanUp works
cat -n test/e2e/framework/framework.go | sed -n '250,270p'Repository: rhobs/observability-operator
Length of output: 885
🏁 Script executed:
# Check if there are other usages of t.Context() in cleanup-related code
rg "t\.Context\(\)" test/e2e/uiplugin_cluster_health_analyzer_test.go -B 2 -A 2Repository: rhobs/observability-operator
Length of output: 1268
Use context.WithoutCancel() to avoid passing canceled context to cleanup-time API calls.
During cleanup, t.Context() is already canceled since t.Cleanup() callbacks execute after the test completes. This causes delete/get/list operations to fail with "context canceled", resulting in resource leaks and incomplete debug output.
Apply the fix at lines 42-46, 129-134, 164-168, and 172-175 by wrapping t.Context() with context.WithoutCancel():
Fix details
import (
+ "context"
"fmt"
"strconv"
"testing"
"time"In cleanup closures (lines 129-134, 164-168):
f.CleanUp(t, func() {
- if err := f.K8sClient.Delete(t.Context(), plugin); err != nil && !errors.IsNotFound(err) {
+ ctx := context.WithoutCancel(t.Context())
+ if err := f.K8sClient.Delete(ctx, plugin); err != nil && !errors.IsNotFound(err) {In dumpClusterHealthAnalyzerDebug (line 172):
func dumpClusterHealthAnalyzerDebug(t *testing.T, pluginName string) {
t.Helper()
- ctx := t.Context()
+ ctx := context.WithoutCancel(t.Context())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/e2e/uiplugin_cluster_health_analyzer_test.go` around lines 42 - 46, The
cleanup callbacks and the debug helper are using t.Context() which is already
canceled when t.Cleanup runs; replace usages of t.Context() in the t.Cleanup
closures and inside dumpClusterHealthAnalyzerDebug by creating an uncancelable
context first (ctx := context.WithoutCancel(t.Context())) and pass that ctx into
any API calls (delete/get/list) and into dumpClusterHealthAnalyzerDebug instead
of t.Context(); update the anonymous t.Cleanup funcs and the
dumpClusterHealthAnalyzerDebug function to accept/use the new ctx so
cleanup-time API calls are not given a canceled context.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
test/e2e/framework/framework.go (1)
300-303:⚠️ Potential issue | 🟠 MajorUse a non-cancelled context in cleanup diagnostics (already reported).
Line 302 uses
t.Context(). When this helper is invoked fromt.Cleanup, list calls can fail withcontext canceled, which defeats failure-time diagnostics.Suggested fix
func (f *Framework) DumpNamespaceDebug(t *testing.T, namespace string) { t.Helper() - ctx := t.Context() + ctx := context.WithoutCancel(t.Context())In Go 1.24+ testing package behavior: is testing.T.Context canceled before T.Cleanup callbacks run?As per coding guidelines "**: Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/framework/framework.go` around lines 300 - 303, The helper DumpNamespaceDebug currently grabs ctx from t.Context() which can be canceled when run inside t.Cleanup; change it to use a non-cancelled context (e.g., ctx := context.Background() or context.TODO()) so diagnostic list calls in DumpNamespaceDebug won’t fail with “context canceled”; update the function (DumpNamespaceDebug) to import context if needed and replace the t.Context() usage with the background context.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@test/e2e/framework/framework.go`:
- Around line 300-303: The helper DumpNamespaceDebug currently grabs ctx from
t.Context() which can be canceled when run inside t.Cleanup; change it to use a
non-cancelled context (e.g., ctx := context.Background() or context.TODO()) so
diagnostic list calls in DumpNamespaceDebug won’t fail with “context canceled”;
update the function (DumpNamespaceDebug) to import context if needed and replace
the t.Context() usage with the background context.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 566b5429-753e-4378-bcfa-22e74ddb0c0d
📒 Files selected for processing (4)
test/e2e/framework/assertions.gotest/e2e/framework/framework.gotest/e2e/uiplugin_cluster_health_analyzer_test.gotest/e2e/uiplugin_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
- test/e2e/uiplugin_test.go
- test/e2e/framework/assertions.go
- test/e2e/uiplugin_cluster_health_analyzer_test.go
Add end-to-end tests that validate the monitoring UIPlugin with cluster-health-analyzer: a deployment readiness check and a functional test that triggers a CrashLoopBackOff alert and verifies the cluster_health_components_map incident metric is produced.
Also introduce AssertPromQLResultWithOptions to allow callers to override the default poll interval and timeout, and generalize waitForDBUIPluginDeletion to waitForUIPluginDeletion.