OCPNODE-4381: Migrate OCP-38271 from openshift-tests-private#30960
OCPNODE-4381: Migrate OCP-38271 from openshift-tests-private#30960BhargaviGudi wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
WalkthroughA new Ginkgo test case (OCP-38271) was added to the node E2E test suite that verifies init container restart behavior when its process is forcibly removed from the host via crictl, ensuring the container does not restart after deletion. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@BhargaviGudi: This pull request references OCPNODE-4381 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
cba718d to
b05cf8a
Compare
|
/test verify |
f9f4ebe to
44944db
Compare
|
@BhargaviGudi: This pull request references OCPNODE-4381 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/verified by @BhargaviGudi |
|
@BhargaviGudi: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@BhargaviGudi: This pull request references OCPNODE-4381 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Add test to verify init containers do not restart when removed from node. - Add pod-initContainer.yaml template - Add helper functions in node_utils.go - Add OCP-38271 test in node_e2e/node.go Author: minmli@redhat.com (original) Migrated-by: bgudi@redhat.com Move OCP-38271 test to separate Describe block Refactor OCP-38271 to use standard origin patterns instead of compat_otp
44944db to
3c8a2c2
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: BhargaviGudi The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
test/extended/node/node_e2e/node.go (2)
151-151: Use absolute path/bin/shinstead of relativebin/sh.The command uses a relative path
bin/shwhich only works because the container's default working directory is/. Using the absolute path/bin/shis the standard convention and avoids potential issues ifWorkingDiris ever specified.♻️ Suggested fix
- Command: []string{"bin/sh", "-ec", "echo running >> /mnt/data/test"}, + Command: []string{"/bin/sh", "-ec", "echo running >> /mnt/data/test"},- Command: []string{"bin/sh", "-c", "sleep 3600"}, + Command: []string{"/bin/sh", "-c", "sleep 3600"},Also applies to: 164-164
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/node/node_e2e/node.go` at line 151, Replace the relative shell path in the container Command arrays with the absolute path; specifically update the Command entry in the container spec(s) in node.go (the Command: []string{"bin/sh", "-ec", ...} occurrences) to use "/bin/sh" instead of "bin/sh" (apply the same change to the other occurrence mentioned).
116-116: Remove unnecessarydefer g.GinkgoRecover().
GinkgoRecover()is designed for recovering panics in goroutines spawned within tests. At the Describe level, this defer executes when the Describe function returns (before any tests run), serving no purpose. Ginkgo already has built-in panic recovery for test blocks.♻️ Suggested fix
var _ = g.Describe("[sig-node] [Jira:Node/Kubelet] NODE initContainer policy,volume,readines,quota", func() { - defer g.GinkgoRecover() - var (🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/node/node_e2e/node.go` at line 116, Remove the unnecessary defer g.GinkgoRecover() call from the Describe-level scope: the defer is only useful for recovering panics in goroutines spawned inside test blocks, and at the Describe level it runs too early and is redundant since Ginkgo already handles test panic recovery. Locate the defer g.GinkgoRecover() invocation (inside the Describe/registration setup in node.go) and delete that single line so no external panic recovery is deferred at describe initialization.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/extended/node/node_e2e/node.go`:
- Around line 241-258: The test currently returns success as soon as
RestartCount==0, which allows a race; change the wait.Poll predicate so it does
NOT return success when RestartCount==0 but instead keeps polling for the full
duration: inside the func passed to wait.Poll iterate
pod.Status.InitContainerStatuses for the "inittest" entry and if RestartCount>0
immediately return true, fmt.Errorf("init container restarted"); otherwise
return false, nil so polling continues; after wait.Poll, assert that the error
is a timeout (wait.ErrWaitTimeout) and fail if err is nil or a different error
(i.e. a restart was detected) — update the expectation that currently uses
o.Expect(err).NotTo(o.HaveOccurred()) accordingly.
---
Nitpick comments:
In `@test/extended/node/node_e2e/node.go`:
- Line 151: Replace the relative shell path in the container Command arrays with
the absolute path; specifically update the Command entry in the container
spec(s) in node.go (the Command: []string{"bin/sh", "-ec", ...} occurrences) to
use "/bin/sh" instead of "bin/sh" (apply the same change to the other occurrence
mentioned).
- Line 116: Remove the unnecessary defer g.GinkgoRecover() call from the
Describe-level scope: the defer is only useful for recovering panics in
goroutines spawned inside test blocks, and at the Describe level it runs too
early and is redundant since Ginkgo already handles test panic recovery. Locate
the defer g.GinkgoRecover() invocation (inside the Describe/registration setup
in node.go) and delete that single line so no external panic recovery is
deferred at describe initialization.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9d5d9835-9727-4042-b00a-649da2b71dd8
📒 Files selected for processing (1)
test/extended/node/node_e2e/node.go
| g.By("Check init container not restart again") | ||
| err = wait.Poll(5*time.Second, 1*time.Minute, func() (bool, error) { | ||
| pod, err := oc.KubeClient().CoreV1().Pods(namespace).Get(ctx, podName, metav1.GetOptions{}) | ||
| if err != nil { | ||
| return false, err | ||
| } | ||
| for _, status := range pod.Status.InitContainerStatuses { | ||
| if status.Name == "inittest" { | ||
| if status.RestartCount > 0 { | ||
| e2e.Logf("Init container restarted, restart count: %d", status.RestartCount) | ||
| return false, fmt.Errorf("init container restarted") | ||
| } | ||
| } | ||
| } | ||
| e2e.Logf("Init container has not restarted") | ||
| return true, nil | ||
| }) | ||
| o.Expect(err).NotTo(o.HaveOccurred(), "init container restart") |
There was a problem hiding this comment.
Test logic doesn't properly verify that the container does not restart.
The current polling logic returns success immediately on the first check where RestartCount == 0. Since this check runs right after the crictl rm, the kubelet may not have had time to detect the container removal. The test could pass even if the container would restart a moment later.
To properly verify the container does NOT restart, observe for a minimum duration while checking that RestartCount remains 0 throughout:
🐛 Suggested fix: observe for full duration
g.By("Check init container not restart again")
- err = wait.Poll(5*time.Second, 1*time.Minute, func() (bool, error) {
+ // Observe for 30 seconds to ensure no restart occurs
+ observationDuration := 30 * time.Second
+ pollInterval := 5 * time.Second
+ startTime := time.Now()
+ err = wait.Poll(pollInterval, observationDuration+pollInterval, func() (bool, error) {
pod, err := oc.KubeClient().CoreV1().Pods(namespace).Get(ctx, podName, metav1.GetOptions{})
if err != nil {
return false, err
}
for _, status := range pod.Status.InitContainerStatuses {
if status.Name == "inittest" {
if status.RestartCount > 0 {
e2e.Logf("Init container restarted, restart count: %d", status.RestartCount)
return false, fmt.Errorf("init container restarted")
}
}
}
e2e.Logf("Init container has not restarted")
- return true, nil
+ // Continue observing until the duration has elapsed
+ if time.Since(startTime) >= observationDuration {
+ return true, nil
+ }
+ return false, nil
})
o.Expect(err).NotTo(o.HaveOccurred(), "init container restart")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| g.By("Check init container not restart again") | |
| err = wait.Poll(5*time.Second, 1*time.Minute, func() (bool, error) { | |
| pod, err := oc.KubeClient().CoreV1().Pods(namespace).Get(ctx, podName, metav1.GetOptions{}) | |
| if err != nil { | |
| return false, err | |
| } | |
| for _, status := range pod.Status.InitContainerStatuses { | |
| if status.Name == "inittest" { | |
| if status.RestartCount > 0 { | |
| e2e.Logf("Init container restarted, restart count: %d", status.RestartCount) | |
| return false, fmt.Errorf("init container restarted") | |
| } | |
| } | |
| } | |
| e2e.Logf("Init container has not restarted") | |
| return true, nil | |
| }) | |
| o.Expect(err).NotTo(o.HaveOccurred(), "init container restart") | |
| g.By("Check init container not restart again") | |
| // Observe for 30 seconds to ensure no restart occurs | |
| observationDuration := 30 * time.Second | |
| pollInterval := 5 * time.Second | |
| startTime := time.Now() | |
| err = wait.Poll(pollInterval, observationDuration+pollInterval, func() (bool, error) { | |
| pod, err := oc.KubeClient().CoreV1().Pods(namespace).Get(ctx, podName, metav1.GetOptions{}) | |
| if err != nil { | |
| return false, err | |
| } | |
| for _, status := range pod.Status.InitContainerStatuses { | |
| if status.Name == "inittest" { | |
| if status.RestartCount > 0 { | |
| e2e.Logf("Init container restarted, restart count: %d", status.RestartCount) | |
| return false, fmt.Errorf("init container restarted") | |
| } | |
| } | |
| } | |
| e2e.Logf("Init container has not restarted") | |
| // Continue observing until the duration has elapsed | |
| if time.Since(startTime) >= observationDuration { | |
| return true, nil | |
| } | |
| return false, nil | |
| }) | |
| o.Expect(err).NotTo(o.HaveOccurred(), "init container restart") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/extended/node/node_e2e/node.go` around lines 241 - 258, The test
currently returns success as soon as RestartCount==0, which allows a race;
change the wait.Poll predicate so it does NOT return success when
RestartCount==0 but instead keeps polling for the full duration: inside the func
passed to wait.Poll iterate pod.Status.InitContainerStatuses for the "inittest"
entry and if RestartCount>0 immediately return true, fmt.Errorf("init container
restarted"); otherwise return false, nil so polling continues; after wait.Poll,
assert that the error is a timeout (wait.ErrWaitTimeout) and fail if err is nil
or a different error (i.e. a restart was detected) — update the expectation that
currently uses o.Expect(err).NotTo(o.HaveOccurred()) accordingly.
|
Scheduling required tests: |
| // Skip all tests on MicroShift clusters as MachineConfig resources are not available | ||
| g.BeforeEach(func() { | ||
| isMicroShift, err := exutil.IsMicroShiftCluster(oc.AdminKubeClient()) | ||
| o.Expect(err).NotTo(o.HaveOccurred()) |
There was a problem hiding this comment.
Can you change this expect() to just log err if it exists?
There was a problem hiding this comment.
I just realized that if the check does fail, ginkgo wont stop. Maybe it needs an explicit Fail(). The goal is to make setup failures clearly different from test failures.
There was a problem hiding this comment.
A loop around the check might make it more robust. Openshift should eventually respond.
|
@BhargaviGudi: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Risk analysis has seen new tests most likely introduced by this PR. New Test Risks for sha: 3c8a2c2
New tests seen in this PR at sha: 3c8a2c2
|
|
/jira refresh |
|
@cpmeadors: This pull request references OCPNODE-4381 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Summary Adds test to verify init containers do not restart when the exited init container is removed from the node.
Changes
test/extended/testdata/node/node_e2e/pod-initContainer.yaml- Pod template with init containertest/extended/node/node_utils.go:PodInitConDescriptionstruct with methods:Create(),Delete(),ContainerExit(),DeleteInitContainer(),InitContainerNotRestart()PodStatus()function to check pod readinesstest/extended/node/node_e2e/node.go:[OTP] Init containers should not restart when the exited init container is removed from node [OCP-38271]Original author: minmli@redhat.com
Migrated by: bgudi@redhat.com
tested manually on 4.22.0-0.nightly-2026-04-06-051707