do not merge - vsphere poweron bug alt#1466
do not merge - vsphere poweron bug alt#1466jcpowermac wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
Skipping CI for Draft Pull Request. |
WalkthroughThis pull request adds six new AGENTS.md documentation files across various controllers (Machine, MachineSet, vSphere, Operator, and Webhooks) to provide guidance on architecture, conventions, and anti-patterns. Additionally, a new helper method Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.5.0)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/test ? |
|
/test e2e-vsphere-ovn |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/controller/vsphere/reconciler.go`:
- Around line 301-304: The code calls vm.powerOnVM() directly which bypasses the
shared DRS-aware helper; replace this call with the existing
powerOn(r.machineScope) helper so the same datacenter/DRS override behavior is
used (ensure you capture and handle the returned task and err the same way as
before and keep the existing log message and error handling around task/err).
Target symbols: replace vm.powerOnVM() with powerOn(r.machineScope) and preserve
surrounding logic that inspects task and err.
- Around line 301-327: The code currently only handles
types.VirtualMachinePowerStatePoweredOff and falls through for Suspended,
clearing r.providerStatus.TaskRef as a success; change this by adding an
explicit branch for types.VirtualMachinePowerStateSuspended that mirrors the
PoweredOff handling: call vm.powerOnVM() (or the appropriate resume API),
register failure metrics and set conditionFailed on error, set
r.providerStatus.TaskRef and log on success, and call setProviderStatus(...)
with conditionSuccess(); ensure you do not clear TaskRef for Suspended without
initiating the resume/power-on workflow. Use the existing symbols
vm.powerOnVM(), powerState, types.VirtualMachinePowerStateSuspended,
r.providerStatus.TaskRef, conditionFailed(), conditionSuccess(),
setProviderStatus() and metrics.RegisterFailedInstanceCreate to implement
consistent handling.
ℹ️ Review info
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (6)
pkg/controller/machine/AGENTS.mdpkg/controller/machineset/AGENTS.mdpkg/controller/vsphere/AGENTS.mdpkg/controller/vsphere/reconciler.gopkg/operator/AGENTS.mdpkg/webhooks/AGENTS.md
| if powerState == types.VirtualMachinePowerStatePoweredOff { | ||
| klog.Infof("%v: VM exists but is powered off after task cleanup, powering on", r.machine.GetName()) | ||
| task, err := vm.powerOnVM() | ||
| if err != nil { |
There was a problem hiding this comment.
Use the existing powerOn() helper here to preserve DRS override behavior.
Line 303 calls vm.powerOnVM() directly, bypassing the existing powerOn(r.machineScope) path used elsewhere in create flow. That can diverge behavior for environments relying on the datacenter power-on path.
⚙️ Proposed fix
- task, err := vm.powerOnVM()
+ task, err := powerOn(r.machineScope)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if powerState == types.VirtualMachinePowerStatePoweredOff { | |
| klog.Infof("%v: VM exists but is powered off after task cleanup, powering on", r.machine.GetName()) | |
| task, err := vm.powerOnVM() | |
| if err != nil { | |
| if powerState == types.VirtualMachinePowerStatePoweredOff { | |
| klog.Infof("%v: VM exists but is powered off after task cleanup, powering on", r.machine.GetName()) | |
| task, err := powerOn(r.machineScope) | |
| if err != nil { |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/controller/vsphere/reconciler.go` around lines 301 - 304, The code calls
vm.powerOnVM() directly which bypasses the shared DRS-aware helper; replace this
call with the existing powerOn(r.machineScope) helper so the same datacenter/DRS
override behavior is used (ensure you capture and handle the returned task and
err the same way as before and keep the existing log message and error handling
around task/err). Target symbols: replace vm.powerOnVM() with
powerOn(r.machineScope) and preserve surrounding logic that inspects task and
err.
| if powerState == types.VirtualMachinePowerStatePoweredOff { | ||
| klog.Infof("%v: VM exists but is powered off after task cleanup, powering on", r.machine.GetName()) | ||
| task, err := vm.powerOnVM() | ||
| if err != nil { | ||
| metrics.RegisterFailedInstanceCreate(&metrics.MachineLabels{ | ||
| Name: r.machine.Name, | ||
| Namespace: r.machine.Namespace, | ||
| Reason: "PowerOn task finished with error", | ||
| }) | ||
| conditionFailed := conditionFailed() | ||
| conditionFailed.Message = err.Error() | ||
| statusError := setProviderStatus(task, conditionFailed, r.machineScope, vm) | ||
| if statusError != nil { | ||
| return fmt.Errorf("failed to set provider status: %w", statusError) | ||
| } | ||
| return fmt.Errorf("%v: failed to power on machine: %w", r.machine.GetName(), err) | ||
| } | ||
|
|
||
| // Clear the old TaskRef and set the new power-on task | ||
| r.providerStatus.TaskRef = task | ||
| klog.Infof("%v: Successfully initiated power-on, tracking new task %s", r.machine.GetName(), task) | ||
| return setProviderStatus(task, conditionSuccess(), r.machineScope, vm) | ||
| } | ||
|
|
||
| // VM is already powered on (or in another state), clear the old TaskRef | ||
| // and let the next reconcile proceed to update | ||
| klog.V(2).Infof("%v: VM is already powered on (%s), clearing old TaskRef", r.machine.GetName(), powerState) |
There was a problem hiding this comment.
Handle suspended VMs explicitly instead of clearing TaskRef as success.
Current logic only powers on when state is PoweredOff. A Suspended VM falls through to Line 328 and clears TaskRef, which can advance reconciliation without actually restoring runtime state.
🛠️ Proposed fix
- if powerState == types.VirtualMachinePowerStatePoweredOff {
+ if powerState == types.VirtualMachinePowerStatePoweredOff ||
+ powerState == types.VirtualMachinePowerStateSuspended {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if powerState == types.VirtualMachinePowerStatePoweredOff { | |
| klog.Infof("%v: VM exists but is powered off after task cleanup, powering on", r.machine.GetName()) | |
| task, err := vm.powerOnVM() | |
| if err != nil { | |
| metrics.RegisterFailedInstanceCreate(&metrics.MachineLabels{ | |
| Name: r.machine.Name, | |
| Namespace: r.machine.Namespace, | |
| Reason: "PowerOn task finished with error", | |
| }) | |
| conditionFailed := conditionFailed() | |
| conditionFailed.Message = err.Error() | |
| statusError := setProviderStatus(task, conditionFailed, r.machineScope, vm) | |
| if statusError != nil { | |
| return fmt.Errorf("failed to set provider status: %w", statusError) | |
| } | |
| return fmt.Errorf("%v: failed to power on machine: %w", r.machine.GetName(), err) | |
| } | |
| // Clear the old TaskRef and set the new power-on task | |
| r.providerStatus.TaskRef = task | |
| klog.Infof("%v: Successfully initiated power-on, tracking new task %s", r.machine.GetName(), task) | |
| return setProviderStatus(task, conditionSuccess(), r.machineScope, vm) | |
| } | |
| // VM is already powered on (or in another state), clear the old TaskRef | |
| // and let the next reconcile proceed to update | |
| klog.V(2).Infof("%v: VM is already powered on (%s), clearing old TaskRef", r.machine.GetName(), powerState) | |
| if powerState == types.VirtualMachinePowerStatePoweredOff || | |
| powerState == types.VirtualMachinePowerStateSuspended { | |
| klog.Infof("%v: VM exists but is powered off after task cleanup, powering on", r.machine.GetName()) | |
| task, err := vm.powerOnVM() | |
| if err != nil { | |
| metrics.RegisterFailedInstanceCreate(&metrics.MachineLabels{ | |
| Name: r.machine.Name, | |
| Namespace: r.machine.Namespace, | |
| Reason: "PowerOn task finished with error", | |
| }) | |
| conditionFailed := conditionFailed() | |
| conditionFailed.Message = err.Error() | |
| statusError := setProviderStatus(task, conditionFailed, r.machineScope, vm) | |
| if statusError != nil { | |
| return fmt.Errorf("failed to set provider status: %w", statusError) | |
| } | |
| return fmt.Errorf("%v: failed to power on machine: %w", r.machine.GetName(), err) | |
| } | |
| // Clear the old TaskRef and set the new power-on task | |
| r.providerStatus.TaskRef = task | |
| klog.Infof("%v: Successfully initiated power-on, tracking new task %s", r.machine.GetName(), task) | |
| return setProviderStatus(task, conditionSuccess(), r.machineScope, vm) | |
| } | |
| // VM is already powered on (or in another state), clear the old TaskRef | |
| // and let the next reconcile proceed to update | |
| klog.V(2).Infof("%v: VM is already powered on (%s), clearing old TaskRef", r.machine.GetName(), powerState) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/controller/vsphere/reconciler.go` around lines 301 - 327, The code
currently only handles types.VirtualMachinePowerStatePoweredOff and falls
through for Suspended, clearing r.providerStatus.TaskRef as a success; change
this by adding an explicit branch for types.VirtualMachinePowerStateSuspended
that mirrors the PoweredOff handling: call vm.powerOnVM() (or the appropriate
resume API), register failure metrics and set conditionFailed on error, set
r.providerStatus.TaskRef and log on success, and call setProviderStatus(...)
with conditionSuccess(); ensure you do not clear TaskRef for Suspended without
initiating the resume/power-on workflow. Use the existing symbols
vm.powerOnVM(), powerState, types.VirtualMachinePowerStateSuspended,
r.providerStatus.TaskRef, conditionFailed(), conditionSuccess(),
setProviderStatus() and metrics.RegisterFailedInstanceCreate to implement
consistent handling.
|
@jcpowermac: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary by CodeRabbit
Documentation
Bug Fixes