OCPBUGS-81568: fix(storage): wait for CSI driver daemonset rollout to prevent pathological event failures#30974
Conversation
…ogical event failures The vSphere CSI Driver Configuration tests modify clusterCSIDriver which triggers a daemonset rollout across all nodes. Previously, the tests only waited for the storage operator to report Progressing=False, but this only means the operator updated the daemonset spec. The daemonset controller then asynchronously recreates pods on all nodes over 5-12 minutes. These trailing SuccessfulCreate events were emitted after the test interval ended, causing client-go to aggregate them with lastTimestamp outside the test window. The pathological event matcher's ±10min padding couldn't reliably cover this nondeterministic gap, leading to CI failures like: "event happened 27 times, something is wrong: namespace/openshift-cluster-csi-drivers daemonset/vmware-vsphere-csi-driver-node - reason/SuccessfulCreate" The fix adds waitForCSIDriverDaemonSetRollout() which polls the daemonset status until NumberReady == DesiredNumberScheduled && NumberUnavailable == 0. This is called in both the test body and AfterEach after Progressing=False, ensuring all pod creation events fall within the test's time interval. This follows the same pattern used by networking tests (test/extended/networking/util.go:isDaemonSetRunningOnGeneration) and DRA tests (test/extended/node/dra/nvidia/prerequisites_installer.go:waitForDaemonSet). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
@RomanBednar: This pull request references Jira Issue OCPBUGS-81568, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Caution Review failedAn error occurred during the review process. Please try again later. WalkthroughA helper function Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@RomanBednar: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: RomanBednar The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
/payload-job periodic-ci-openshift-release-main-nightly-4.22-e2e-vsphere-ovn-serial |
|
@RomanBednar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/46d25800-3287-11f1-92a5-166f459ebe91-0 |
|
@RomanBednar: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command |
|
Scheduling required tests: |
|
@RomanBednar: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
No description provided.