CNF-23402: Add context.Context to Execute and Ops interfaces#6187
CNF-23402: Add context.Context to Execute and Ops interfaces#6187sebrandon1 wants to merge 1 commit into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughPropagates context.Context broadly across the codebase: interfaces, helpers, executors, controllers, CLI commands, and mocks. Call sites updated to pass ctx into command execution, ostree/rpm-ostree, reboot, and cleanup flows; tests and generated mocks adapted to new context-first signatures. ChangesContext propagation DAG
Sequence diagramsequenceDiagram
participant Controller
participant OpsExecutor
participant RPMOstree
participant Reboot
Controller->>OpsExecutor: RunInHostNamespace(ctx, nmstatectl/podman/bash)
OpsExecutor->>OpsExecutor: exec.CommandContext(ctx, ...)
Controller->>RPMOstree: IsStaterootBooted(ctx, name)
RPMOstree-->>Controller: bool,error
Controller->>Reboot: InitiateRollback(ctx,msg) / RebootToNewStateRoot(ctx,rationale)
Reboot->>RPMOstree: GetUnbootedStaterootName(ctx) / SetDefaultDeployment(ctx, idx)
Reboot->>OpsExecutor: RunSystemdAction(ctx, ...)
OpsExecutor-->>Reboot: result
Reboot-->>Controller: rollback/reboot result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (9)
CLAUDE.md (1)
1-118:⚠️ Potential issue | 🔴 CriticalCorrect the Operator SDK version: v1.40.0, not v2.
The line claiming "Operator SDK v2" is incorrect. The Makefile specifies
OPERATOR_SDK_VERSION ?= 1.40.0, which is v1.x. Operator SDK does not have a v2 major release (only v1.x releases exist).The Go 1.24 claim is correct (go.mod specifies 1.24.6). Kubebuilder v4 is consistent with the use of controller-gen v0.19.0 and controller-runtime v0.22.5.
Update the overview to read: "Built with Kubebuilder v4, Operator SDK v1.40.0 in Go 1.24" or similar.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CLAUDE.md` around lines 1 - 118, Replace the incorrect "Operator SDK v2" claim in the project overview with the actual version from the Makefile (OPERATOR_SDK_VERSION ?= 1.40.0); specifically update the sentence that currently reads "Built with Kubebuilder v4 and Operator SDK v2 in Go 1.24" to something like "Built with Kubebuilder v4, Operator SDK v1.40.0 in Go 1.24" so the CLAUDE.md text matches the Makefile's OPERATOR_SDK_VERSION.lca-cli/cmd/ipconfig/postpivot.go (1)
78-90:⚠️ Potential issue | 🟠 MajorThread
cmd.Context()through to preserve cancellation propagation from the Cobra command lifecycle.At Line 78,
context.Background()prevents SIGTERM/SIGINT cancellation from reaching the operation. ModifyrunIPConfigPostPivot()to accept acontext.Contextparameter, then passcmd.Context()from the Run callback (Line 62) and reuse that context for bothpostPivotHandler.Run(ctx)andAutoRollbackIfEnabled(ctx).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lca-cli/cmd/ipconfig/postpivot.go` around lines 78 - 90, Change runIPConfigPostPivot to accept a context.Context parameter and use that context instead of context.Background(); specifically, update the Run callback to call runIPConfigPostPivot(cmd.Context(), ...) and inside runIPConfigPostPivot replace the local ctx := context.Background() with the passed ctx and use that ctx when invoking postPivotHandler.Run(ctx) and rbClient.AutoRollbackIfEnabled(ctx, reboot.IPConfigRunComponent, ...). Ensure all occurrences of postPivotHandler.Run and AutoRollbackIfEnabled referenced in runIPConfigPostPivot use the passed context so cancellation from the Cobra command (cmd.Context()) propagates.controllers/upgrade_handlers.go (1)
344-356:⚠️ Potential issue | 🟠 MajorUse
context.Background()forInitiateRollbackto ensure rollback completes independently of reconciliation context cancellation.Auto-rollback is a critical recovery mechanism that should not be aborted by a canceled reconciliation context. Currently, line 355 passes
ctxtoInitiateRollback, which could fail silently if the context has been canceled during error handling. This pattern is already established inlca-cli/initmonitor/initmonitor.go, which correctly usescontext.Background()for rollback operations.Proposed fix
- if err := u.RebootClient.InitiateRollback(ctx, msg); err != nil { + rollbackCtx := context.Background() + if err := u.RebootClient.InitiateRollback(rollbackCtx, msg); err != nil { u.Log.Info(fmt.Sprintf("Unable to auto rollback: %s", err)) return }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@controllers/upgrade_handlers.go` around lines 344 - 356, In autoRollbackIfEnabled, the call to u.RebootClient.InitiateRollback(ctx, msg) should use a root context so rollback isn't canceled with the reconciliation context; replace the passed-in ctx with context.Background() (i.e., call u.RebootClient.InitiateRollback(context.Background(), msg)) inside UpgHandler.autoRollbackIfEnabled and ensure the function references InitiateRollback and UpgHandler correctly.lca-cli/ipconfig/prepivot.go (1)
185-193:⚠️ Potential issue | 🟠 MajorUse a non-cancelable context for the deferred service recovery.
Line 187 reuses the request context in a critical deferred recovery path. If that context is canceled during the subsequent operations (lines 197-232),
EnableClusterServiceswill fail immediately becauseexec.CommandContextrespects context cancellation. This leaves the cluster in a broken state: services are stopped but kubelet is not re-enabled.Proposed fix
+ cleanupCtx := context.Background() defer func() { p.log.Info("Enabling cluster services in old stateroot") - if internalErr := p.ops.EnableClusterServices(ctx); internalErr != nil { + if internalErr := p.ops.EnableClusterServices(cleanupCtx); internalErr != nil { if err == nil { err = internalErr } else { err = fmt.Errorf("%s; also failed to enable cluster services: %w", err.Error(), internalErr) } } }()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lca-cli/ipconfig/prepivot.go` around lines 185 - 193, The deferred recovery block currently reuses the incoming request context (ctx) when calling p.ops.EnableClusterServices inside the deferred func, which can be canceled and cause EnableClusterServices to fail; change the deferred call to use a non-cancelable context (e.g., ctx := context.Background() or context.TODO()) when invoking p.ops.EnableClusterServices so the recovery always runs regardless of request cancellation, keeping the existing error-aggregation logic (the deferred func that updates err) intact and referencing p.ops.EnableClusterServices and the deferred func in prepivot.go.internal/imagemgmt/imagemgmt.go (1)
240-261:⚠️ Potential issue | 🟠 MajorPreserve cancellation in the image-deletion loop.
The
podman rmifailures are intentionally best-effort here, but ignoring errors with_at line 260 also discards context cancellation signals. When the context is cancelled,Executereturns acontext.Canceledorcontext.DeadlineExceedederror which is swallowed, allowing the loop to continue deleting images instead of stopping. Check for context errors before treating other failures as ignorable.Suggested fix
- // Ignore errors, as images may be in use by transient containers or other images - output, _ := c.Executor.Execute(ctx, "podman", args...) - c.Log.Info("Deleted unused images", "output", output) + // Ignore non-context errors, as images may be in use by transient containers or other images + output, err := c.Executor.Execute(ctx, "podman", args...) + if err != nil { + if ctxErr := ctx.Err(); ctxErr != nil { + return ctxErr + } + c.Log.Error(err, "Ignoring podman rmi failure during cleanup", "args", args) + } else { + c.Log.Info("Deleted unused images", "output", output) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/imagemgmt/imagemgmt.go` around lines 240 - 261, In CleanupUnusedImages, don't swallow Execute errors blindly; preserve cancellation by checking the returned error from c.Executor.Execute (called in the image deletion loop) and if it's context.Canceled or context.DeadlineExceeded (or ctx.Err() != nil) return that error immediately; only treat non-context errors as ignorable (log them and continue). Update the loop around the Execute call in CleanupUnusedImages to capture the error, check ctx.Err() or the specific context errors, and return on cancellation before continuing to the next batch.lca-cli/ops/ops.go (1)
252-276:⚠️ Potential issue | 🟠 Major
waitForEtcdignores the context parameter and blocks HTTP requests indefinitely.The
ctxparameter is declared but never used. The function uses a hardcodedtime.Aftertimeout instead of observingctx.Done(), andhttp.Getis not bound to the context, so a hung HTTP request can block the caller regardless of context cancellation.Use
context.WithTimeoutandhttp.NewRequestWithContextto properly bind both the timeout and the HTTP request to the provided context:Suggested fix
func (o *ops) waitForEtcd(ctx context.Context, healthzEndpoint string) error { - timeout := time.After(1 * time.Minute) + ctx, cancel := context.WithTimeout(ctx, time.Minute) + defer cancel() ticker := time.NewTicker(1 * time.Second) defer ticker.Stop() for { select { - case <-timeout: - return fmt.Errorf("timeout waiting for etcd") + case <-ctx.Done(): + return ctx.Err() case <-ticker.C: - resp, err := http.Get(healthzEndpoint) //nolint:gosec + req, err := http.NewRequestWithContext(ctx, http.MethodGet, healthzEndpoint, nil) + if err != nil { + return fmt.Errorf("failed to create etcd health request: %w", err) + } + resp, err := http.DefaultClient.Do(req) //nolint:gosec if err != nil { o.log.Infof("Waiting for etcd: %s", err) continue } - defer resp.Body.Close() + resp.Body.Close() if resp.StatusCode != http.StatusOK { o.log.Infof("Waiting for etcd, status: %d", resp.StatusCode) continue } return nil } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lca-cli/ops/ops.go` around lines 252 - 276, The waitForEtcd function ignores the passed ctx and uses a hard timeout and http.Get which can hang; update waitForEtcd to derive a cancellable context (use context.WithTimeout(ctx, 1*time.Minute) or respect ctx directly), replace http.Get with an http.NewRequestWithContext using that context (or an http.Client with the context-bound request), and in the polling loop select on ctx.Done() (or the derived timeout context) in addition to the ticker so cancellation/unblock works; ensure resp.Body.Close() is deferred only after a successful non-nil resp and propagate/return the context error when ctx is cancelled.lca-cli/ibi-preparation/ibipreparation.go (1)
54-86:⚠️ Potential issue | 🟠 MajorPropagate context through precacheFlow to enable cancellation during image pulling.
The
Run(ctx)method now passes context todiskPreparation,postDeployment,cleanupRhcosSysroot, andshutdownNode, but the call toi.precacheFlow()at line ~75 bypasses context entirely. TheprecacheFlowmethod invokesworkload.Precache(), which spawns concurrent image-pull goroutines viaPullImages()without any cancellation control. Image pulling can be a long-running phase; missing context here leaves it outside the new cancellation model introduced by the recent context propagation effort.Update
precacheFlow(imageListFile, seedInfoFile, registriesConfFile string)to accept and forwardctx context.Context, and updateworkload.Precache()andPullImages()signatures accordingly to respect context cancellation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lca-cli/ibi-preparation/ibipreparation.go` around lines 54 - 86, The Run method currently calls i.precacheFlow(...) without a context which prevents cancellation during long image pulls; change precacheFlow(imageListFile, seedInfoFile, registriesConfFile string) to precacheFlow(ctx context.Context, imageListFile, seedInfoFile, registriesConfFile string), update its internal call to workload.Precache(...) to accept and forward ctx, and modify workload.Precache(...) and PullImages(...) signatures to take ctx context.Context and to use ctx for cancellation inside goroutines; finally update the call site in Run to call i.precacheFlow(ctx, common.ContainersListFilePath, common.IBISeedInfoFilePath, defaultRegistriesConfFile) so context is propagated end-to-end.internal/reboot/reboot.go (1)
208-270:⚠️ Potential issue | 🟠 MajorFix context reuse and error handling in rollback sequences.
Two issues in
InitiateRollbackat lines 208-270 (IBURebootClient) and 382-440 (IPCRebootClient):
Context reuse: Both implementations pass the caller's context through the entire rollback sequence (
SetDefaultDeployment,RebootToNewStateRoot). Controllers pass reconcile contexts that may timeout or be cancelled. Once rollback is committed, derive a fresh unbounded context for the sequence instead of reusing the request-scoped one.Error handling bug (IBURebootClient only): Lines 268-270 unconditionally return an error via
fmt.Errorf(...)even whenRebootToNewStateRootsucceeds. The error message also incorrectly says "unable to get set deployment" instead of a reboot failure. Match the IPCRebootClient pattern: directly return the result fromRebootToNewStateRoot.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/reboot/reboot.go` around lines 208 - 270, InitiateRollback currently reuses the caller's context for the post-commit rollback steps and also has a bug that always returns an error after calling RebootToNewStateRoot; fix by deriving a fresh context (e.g., ctx := context.Background() or context.WithCancel(context.Background())) for the sequence of SetDefaultDeployment and RebootToNewStateRoot instead of passing the incoming reconcile context, and change the final error handling in IBURebootClient.InitiateRollback to return the result of RebootToNewStateRoot directly (like IPCRebootClient) rather than unconditionally wrapping/returning a misleading fmt.Errorf; apply the same context-derivation pattern to the IPCRebootClient path that performs SetDefaultDeployment and RebootToNewStateRoot.lca-cli/postpivot/postpivot.go (1)
272-295:⚠️ Potential issue | 🟠 MajorStop the recert flow when the image pull times out or is canceled.
The
wait.PollUntilContextCancelerror is ignored here. If the 10-minute pull window expires, the code silently falls through toRecertFullFlow, which then proceeds with no guarantee the image was pulled successfully.Suggested fix
if _, err := p.ops.RunInHostNamespace(ctx, "podman", "image", "exists", seedClusterInfo.RecertImagePullSpec); err != nil { ctxWithTimeout, cancel := context.WithTimeout(ctx, 10*time.Minute) defer cancel() - _ = wait.PollUntilContextCancel(ctxWithTimeout, time.Second, true, func(ctx context.Context) (bool, error) { + if err := wait.PollUntilContextCancel(ctxWithTimeout, time.Second, true, func(ctx context.Context) (bool, error) { p.log.Info("pulling recert image") command := "podman" if seedReconfiguration.Proxy != nil { command = fmt.Sprintf("HTTP_PROXY=%s HTTPS_PROXY=%s NO_PROXY=%s %s", seedReconfiguration.Proxy.HTTPProxy, seedReconfiguration.Proxy.HTTPSProxy, seedReconfiguration.Proxy.NoProxy, command) } if _, err := p.ops.RunBashInHostNamespace(ctx, command, "pull", "--authfile", common.ImageRegistryAuthFile, seedClusterInfo.RecertImagePullSpec); err != nil { p.log.Warnf("failed to pull recert image, will retry, err: %s", err.Error()) return false, nil } return true, nil - }) + }); err != nil { + return fmt.Errorf("failed to pull recert image: %w", err) + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lca-cli/postpivot/postpivot.go` around lines 272 - 295, The pull loop ignores the error returned by wait.PollUntilContextCancel so a timeout/cancel still calls RecertFullFlow; capture the error (e.g. pullErr := wait.PollUntilContextCancel(...)) after calling wait.PollUntilContextCancel and if pullErr != nil return or propagate an error and stop the recert flow (handle context.DeadlineExceeded/context.Canceled specially to log and return early). Update the block around RunInHostNamespace/RunBashInHostNamespace and the subsequent RecertFullFlow call to check pullErr and only call p.ops.RecertFullFlow when the pull succeeded; include informative logging with the pullErr when aborting.
🧹 Nitpick comments (1)
lca-cli/cmd/ipconfig/rollback.go (1)
70-79: Context handling in CLI commands is a codebase-wide pattern; this suggestion aligns with broader refactoring goals.At line 70,
context.Background()is created and used for handler and reboot operations. While passingcmd.Context()would be architecturally cleaner and enable proper signal propagation, this pattern is consistent across all CLI commands in the codebase (restore, prepivot, postpivot, etc.). The reboot operation itself does not respond to context cancellation. If the codebase adopts command-context propagation as a standard pattern, this should be refactored along with other CLI commands rather than in isolation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lca-cli/cmd/ipconfig/rollback.go` around lines 70 - 79, Replace the locally created context.Background() with the command's context so cancellation/signal propagation works across CLI commands: use cmd.Context() instead of context.Background() when constructing ctx passed into reboot.NewIPCRebootClient, ipconfig.NewRollbackHandler and when calling exec.Run and rb.RebootToNewStateRoot; keep in mind rb.RebootToNewStateRoot may not observe cancellation (so callers should still pass cmd.Context() for consistency and future-proofing).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@controllers/seedgen_controller.go`:
- Around line 420-423: The helper methods (e.g., rmPreviousImagerContainer)
create a new context via context.Background(), which ignores reconcile
cancellation/timeouts; change each helper to accept a context parameter (ctx
context.Context) and use that ctx when calling r.Executor.Execute for commands
like "podman rm", "systemd-run", and "podman inspect"; then update all callers
(setupWorkspace, generateSeedImage, finishSeedgen and Reconcile) to pass the
incoming reconcile ctx through to these helpers so command execution respects
cancellation and deadlines.
In `@internal/ostreeclient/ostreeclient.go`:
- Around line 105-113: Change IsOstreeAdminSetDefaultFeatureEnabled to return
(bool, error) instead of just bool: call c.executor.Execute(ctx, "ostree",
"admin", "--help"), if err != nil return false, err; otherwise return
strings.Contains(output, "set-default"), nil. Update callers (e.g.,
InitiateRollback) to handle the returned error and distinguish execution
failures (context cancellation/deadlines) from a legitimate "feature not
present" false result so they don't misreport support when the probe itself
failed.
In `@internal/precache/workload/pullImages.go`:
- Around line 51-53: The helpers create new background contexts and drop caller
cancellation; thread a context.Context through the call chain instead: change
Precache to Precache(ctx, precacheSpec, authFile, bestEffort), PullImages to
PullImages(ctx, precacheSpec, authFile), pullImage to pullImage(ctx, image,
authFile, progress), podmanImgPull to podmanImgPull(ctx, image, authFile),
podmanImgExists to podmanImgExists(ctx, image) and CheckPodman to
CheckPodman(ctx); remove any ctx := context.Background() occurrences in
pullImages.go (the noted lines 51, 60–66, 72–74), update all call sites (e.g.,
ibipreparation.go and ibuPrecacheWorkload.go callers) to pass the incoming ctx,
and ensure Executor.ExecuteWithLiveLogger and other podman invocations use the
passed ctx so cancellation/timeouts propagate.
In `@internal/prep/prep.go`:
- Around line 98-99: The deferred cleanup currently calls
ops.UnmountAndRemoveImage with the request-scoped ctx which may be cancelled
before the defer runs; change the defer to use a non-cancellable context (e.g.,
context.Background() or a ctxNoCancel created via context.WithoutDeadline
semantics) when invoking UnmountAndRemoveImage so seedImage is always
unmounted/removed even if the original ctx is cancelled. Update the defer that
references ops.UnmountAndRemoveImage(ctx, seedImage) to pass the new
non-cancellable context instead of the request-scoped ctx.
In `@lca-cli/cmd/postpivotcmd.go`:
- Around line 68-71: The code creates a detached context with
context.Background() which prevents cancellation/timeouts from cmd.Context()
from reaching PostPivotConfiguration and the rollback; replace
context.Background() with the Cobra command's context (use cmd.Context()) and
pass that ctx into postPivotRunner.PostPivotConfiguration(ctx) and
rebootClient.AutoRollbackIfEnabled(ctx, ...) so both PostPivotConfiguration and
AutoRollbackIfEnabled observe cancellation/timeouts from the command.
In `@lca-cli/postpivot/postpivot.go`:
- Around line 951-954: The deferred Umount uses the request context which may be
canceled before cleanup; change the defer to call p.ops.Umount with
context.Background() (or a new non-cancelable context) instead of ctx so unmount
runs even if the original context is canceled — update the defer at the
Mount/Umount pair (p.ops.Mount(..., deviceName, mountFolder) and defer
p.ops.Umount(...)) to use context.Background().
In `@lca-cli/seedcreator/seedcreator.go`:
- Around line 69-73: CreateSeedImage currently roots the workflow in
context.Background(), preventing cancellation; change CreateSeedImage to accept
a caller-owned context (e.g., func (s *SeedCreator) CreateSeedImage(ctx
context.Context) error) and thread that ctx into downstream calls (replace local
ctx := context.Background() and pass ctx into s.copyConfigurationFiles, and any
other helper calls such as podman/rpm-ostree/tar invocations) so cancellation
and deadlines propagate from the caller through SeedCreator.CreateSeedImage and
related methods.
---
Outside diff comments:
In `@CLAUDE.md`:
- Around line 1-118: Replace the incorrect "Operator SDK v2" claim in the
project overview with the actual version from the Makefile (OPERATOR_SDK_VERSION
?= 1.40.0); specifically update the sentence that currently reads "Built with
Kubebuilder v4 and Operator SDK v2 in Go 1.24" to something like "Built with
Kubebuilder v4, Operator SDK v1.40.0 in Go 1.24" so the CLAUDE.md text matches
the Makefile's OPERATOR_SDK_VERSION.
In `@controllers/upgrade_handlers.go`:
- Around line 344-356: In autoRollbackIfEnabled, the call to
u.RebootClient.InitiateRollback(ctx, msg) should use a root context so rollback
isn't canceled with the reconciliation context; replace the passed-in ctx with
context.Background() (i.e., call
u.RebootClient.InitiateRollback(context.Background(), msg)) inside
UpgHandler.autoRollbackIfEnabled and ensure the function references
InitiateRollback and UpgHandler correctly.
In `@internal/imagemgmt/imagemgmt.go`:
- Around line 240-261: In CleanupUnusedImages, don't swallow Execute errors
blindly; preserve cancellation by checking the returned error from
c.Executor.Execute (called in the image deletion loop) and if it's
context.Canceled or context.DeadlineExceeded (or ctx.Err() != nil) return that
error immediately; only treat non-context errors as ignorable (log them and
continue). Update the loop around the Execute call in CleanupUnusedImages to
capture the error, check ctx.Err() or the specific context errors, and return on
cancellation before continuing to the next batch.
In `@internal/reboot/reboot.go`:
- Around line 208-270: InitiateRollback currently reuses the caller's context
for the post-commit rollback steps and also has a bug that always returns an
error after calling RebootToNewStateRoot; fix by deriving a fresh context (e.g.,
ctx := context.Background() or context.WithCancel(context.Background())) for the
sequence of SetDefaultDeployment and RebootToNewStateRoot instead of passing the
incoming reconcile context, and change the final error handling in
IBURebootClient.InitiateRollback to return the result of RebootToNewStateRoot
directly (like IPCRebootClient) rather than unconditionally wrapping/returning a
misleading fmt.Errorf; apply the same context-derivation pattern to the
IPCRebootClient path that performs SetDefaultDeployment and
RebootToNewStateRoot.
In `@lca-cli/cmd/ipconfig/postpivot.go`:
- Around line 78-90: Change runIPConfigPostPivot to accept a context.Context
parameter and use that context instead of context.Background(); specifically,
update the Run callback to call runIPConfigPostPivot(cmd.Context(), ...) and
inside runIPConfigPostPivot replace the local ctx := context.Background() with
the passed ctx and use that ctx when invoking postPivotHandler.Run(ctx) and
rbClient.AutoRollbackIfEnabled(ctx, reboot.IPConfigRunComponent, ...). Ensure
all occurrences of postPivotHandler.Run and AutoRollbackIfEnabled referenced in
runIPConfigPostPivot use the passed context so cancellation from the Cobra
command (cmd.Context()) propagates.
In `@lca-cli/ibi-preparation/ibipreparation.go`:
- Around line 54-86: The Run method currently calls i.precacheFlow(...) without
a context which prevents cancellation during long image pulls; change
precacheFlow(imageListFile, seedInfoFile, registriesConfFile string) to
precacheFlow(ctx context.Context, imageListFile, seedInfoFile,
registriesConfFile string), update its internal call to workload.Precache(...)
to accept and forward ctx, and modify workload.Precache(...) and PullImages(...)
signatures to take ctx context.Context and to use ctx for cancellation inside
goroutines; finally update the call site in Run to call i.precacheFlow(ctx,
common.ContainersListFilePath, common.IBISeedInfoFilePath,
defaultRegistriesConfFile) so context is propagated end-to-end.
In `@lca-cli/ipconfig/prepivot.go`:
- Around line 185-193: The deferred recovery block currently reuses the incoming
request context (ctx) when calling p.ops.EnableClusterServices inside the
deferred func, which can be canceled and cause EnableClusterServices to fail;
change the deferred call to use a non-cancelable context (e.g., ctx :=
context.Background() or context.TODO()) when invoking
p.ops.EnableClusterServices so the recovery always runs regardless of request
cancellation, keeping the existing error-aggregation logic (the deferred func
that updates err) intact and referencing p.ops.EnableClusterServices and the
deferred func in prepivot.go.
In `@lca-cli/ops/ops.go`:
- Around line 252-276: The waitForEtcd function ignores the passed ctx and uses
a hard timeout and http.Get which can hang; update waitForEtcd to derive a
cancellable context (use context.WithTimeout(ctx, 1*time.Minute) or respect ctx
directly), replace http.Get with an http.NewRequestWithContext using that
context (or an http.Client with the context-bound request), and in the polling
loop select on ctx.Done() (or the derived timeout context) in addition to the
ticker so cancellation/unblock works; ensure resp.Body.Close() is deferred only
after a successful non-nil resp and propagate/return the context error when ctx
is cancelled.
In `@lca-cli/postpivot/postpivot.go`:
- Around line 272-295: The pull loop ignores the error returned by
wait.PollUntilContextCancel so a timeout/cancel still calls RecertFullFlow;
capture the error (e.g. pullErr := wait.PollUntilContextCancel(...)) after
calling wait.PollUntilContextCancel and if pullErr != nil return or propagate an
error and stop the recert flow (handle context.DeadlineExceeded/context.Canceled
specially to log and return early). Update the block around
RunInHostNamespace/RunBashInHostNamespace and the subsequent RecertFullFlow call
to check pullErr and only call p.ops.RecertFullFlow when the pull succeeded;
include informative logging with the pullErr when aborting.
---
Nitpick comments:
In `@lca-cli/cmd/ipconfig/rollback.go`:
- Around line 70-79: Replace the locally created context.Background() with the
command's context so cancellation/signal propagation works across CLI commands:
use cmd.Context() instead of context.Background() when constructing ctx passed
into reboot.NewIPCRebootClient, ipconfig.NewRollbackHandler and when calling
exec.Run and rb.RebootToNewStateRoot; keep in mind rb.RebootToNewStateRoot may
not observe cancellation (so callers should still pass cmd.Context() for
consistency and future-proofing).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: d7240f78-76d7-46d4-a368-eb23f07f0963
📒 Files selected for processing (56)
CLAUDE.mdcontrollers/ibu_controller.gocontrollers/ibu_controller_test.gocontrollers/idle_handlers.gocontrollers/idle_handlers_test.gocontrollers/ipc_config_handlers.gocontrollers/ipc_config_handlers_test.gocontrollers/ipc_controller.gocontrollers/ipc_controller_test.gocontrollers/ipc_idle_handlers.gocontrollers/ipc_idle_handlers_test.gocontrollers/ipc_rollback_handlers.gocontrollers/ipc_rollback_handlers_test.gocontrollers/prep_handlers.gocontrollers/rollback_handlers.gocontrollers/seedgen_controller.gocontrollers/upgrade_handlers.gocontrollers/upgrade_handlers_test.gointernal/imagemgmt/imagemgmt.gointernal/imagemgmt/imagemgmt_test.gointernal/imagemgmt/mock_imagemgmt.gointernal/ostreeclient/mock_ostreeclient.gointernal/ostreeclient/ostreeclient.gointernal/precache/workload/pullImages.gointernal/prep/prep.gointernal/reboot/mock_reboot.gointernal/reboot/reboot.gointernal/reboot/reboot_test.golca-cli/cmd/create.golca-cli/cmd/ibi.golca-cli/cmd/ibuStaterootSetup.golca-cli/cmd/ipconfig/postpivot.golca-cli/cmd/ipconfig/prepivot.golca-cli/cmd/ipconfig/rollback.golca-cli/cmd/postpivotcmd.golca-cli/cmd/restore.golca-cli/ibi-preparation/ibipreparation.golca-cli/ibi-preparation/ibipreparation_test.golca-cli/initmonitor/initmonitor.golca-cli/ipconfig/postpivot.golca-cli/ipconfig/postpivot_test.golca-cli/ipconfig/prepivot.golca-cli/ipconfig/prepivot_test.golca-cli/ipconfig/rollback.golca-cli/ipconfig/rollback_test.golca-cli/ops/execute.golca-cli/ops/mock_execute.golca-cli/ops/mock_ops.golca-cli/ops/ops.golca-cli/ostreeclient/mock_rpmostreeclient.golca-cli/ostreeclient/rpmostreeclient.golca-cli/postpivot/postpivot.golca-cli/postpivot/postpivot_test.golca-cli/seedcreator/seedcreator.golca-cli/seedrestoration/seedrestoration.gomain/main.go
efce46e to
70adfe3
Compare
|
/retest-required |
70adfe3 to
0f3c830
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
lca-cli/ipconfig/prepivot.go (1)
185-193:⚠️ Potential issue | 🟠 MajorUse a fresh context for the deferred service re-enable.
This cleanup now reuses the main operation
ctx. If that context is canceled or times out afterStopClusterServices(ctx)succeeds,EnableClusterServices(ctx)can fail for the same reason and leave kubelet disabled after a partial pre-pivot. Cleanup should run with its own bounded context instead of inheriting cancellation from the work it is trying to unwind.Suggested fix
defer func() { p.log.Info("Enabling cluster services in old stateroot") - if internalErr := p.ops.EnableClusterServices(ctx); internalErr != nil { + cleanupCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second) + defer cancel() + if internalErr := p.ops.EnableClusterServices(cleanupCtx); internalErr != nil { if err == nil { err = internalErr } else { err = fmt.Errorf("%s; also failed to enable cluster services: %w", err.Error(), internalErr) } } }()This needs a
timeimport as well.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lca-cli/ipconfig/prepivot.go` around lines 185 - 193, The deferred cleanup currently reuses the operation's ctx which may be canceled; change the defer to create a fresh bounded context (e.g., ctx2, cancel := context.WithTimeout(context.Background(), <sensibleDuration>); defer cancel()) and call p.ops.EnableClusterServices(ctx2) so re-enabling cluster services is not affected by the original context's cancellation; update the defer block to use ctx2 and ensure you add the time import for the timeout value.controllers/ipc_config_handlers.go (1)
801-817:⚠️ Potential issue | 🟠 MajorDon't tie
systemd-run --waitto the reconcile context.Once the transient unit is created, canceling
ctxaborts the waiting client viaexec.CommandContext, not necessarily the unit that is now mutating the host. This path will report pre-pivot failure even though the systemd unit may still be running, which can lead to duplicate/conflicting attempts on the next reconcile. Use a detached context here, or explicitly stop/query the spawned unit when cancellation happens. The same risk exists incontrollers/ipc_rollback_handlers.goaroundscheduleIPConfigRollback.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@controllers/ipc_config_handlers.go` around lines 801 - 817, The call to RunSystemdAction in RunLcaCliIPConfigPrePivot ties systemd-run --wait to the reconcile ctx so cancelling ctx can kill the client while the transient systemd unit continues running; change this to use a detached context (e.g., create a new context.Background()-based ctx or a derived context not cancelled by the reconcile) when invoking h.ChrootOps.RunSystemdAction so the systemd unit is created and allowed to run independently, or alternatively implement explicit unit follow-up logic on ctx cancellation (query/stop the unit). Make the same change in the corresponding scheduleIPConfigRollback call in controllers/ipc_rollback_handlers.go.lca-cli/ops/ops.go (2)
737-745:⚠️ Potential issue | 🟠 MajorDon't drop the container-stop polling error.
The result of
wait.PollUntilContextCancelis ignored. Ifctxtimes out orcrictl stopkeeps failing, this method proceeds as if containers were stopped and then shuts down CRI-O anyway.As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lca-cli/ops/ops.go` around lines 737 - 745, The polling call to wait.PollUntilContextCancel is currently ignored so failures or timeouts during container stop are swallowed; capture its returned error and handle it (e.g., assign err := wait.PollUntilContextCancel(...); if err != nil { return fmt.Errorf("failed waiting for containers to stop: %w", err) } ) instead of proceeding to shut down CRI-O; update the surrounding function to propagate or log the error appropriately so RunBashInHostNamespace/crictl failures abort the shutdown flow.
252-275:⚠️ Potential issue | 🟠 Major
waitForEtcdstill ignores the passed context.The loop never checks
ctx.Done(), andhttp.Getis not bound toctx, so cancellation can still block here until the hardcoded timeout or a hung GET returns. Usectxin the select and issue the request with a context-aware HTTP call.As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lca-cli/ops/ops.go` around lines 252 - 275, The waitForEtcd function ignores the passed ctx and uses http.Get; modify it to listen for ctx cancellation and issue a context-aware HTTP request: inside the select add a case <-ctx.Done() that returns ctx.Err(), replace http.Get(healthzEndpoint) with creating an http.Request via http.NewRequestWithContext(ctx, "GET", healthzEndpoint, nil) and use http.DefaultClient.Do(req) (or a client variable) so the request is bound to ctx, and avoid deferring resp.Body.Close() inside the loop by closing resp.Body immediately after you're done checking StatusCode; keep the existing timeout/ticker logic but ensure ctx cancellation short-circuits the loop.internal/precache/workload/pullImages.go (1)
82-95:⚠️ Potential issue | 🟠 MajorStop retrying once the context is canceled.
pullImageretries up toMaxRetrieseven forcontext.Canceled/DeadlineExceeded. On a canceled precache, that turns each remaining image into repeated failed execs instead of aborting promptly. Treat cancellation as terminal and return immediately.As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/precache/workload/pullImages.go` around lines 82 - 95, The loop that calls podmanImgPull (inside the pullImage/pullImages.go retry loop bounded by MaxRetries) currently treats context cancellation like any other error and continues retrying; update the error handling so that if the context has been canceled or its deadline exceeded (check ctx.Err() != nil or use errors.Is(err, context.Canceled)/errors.Is(err, context.DeadlineExceeded)) you log and return immediately instead of retrying. Modify the branch after podmanImgPull fails to detect context cancellation and exit the function (no further retries), keeping the existing "manifest unknown" short-circuit behavior intact.lca-cli/postpivot/postpivot.go (1)
276-291:⚠️ Potential issue | 🟠 MajorSurface recert image pull timeout/cancellation.
This poll result is discarded. If the pull never succeeds within 10 minutes or
ctxis canceled, the code still falls through intoRecertFullFlow, which hides the real failure and can start recert without a pulled image.As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lca-cli/postpivot/postpivot.go` around lines 276 - 291, The PollUntilContextCancel call using ctxWithTimeout currently discards its result so a timeout/cancel is ignored and RecertFullFlow proceeds without the image; change the code around wait.PollUntilContextCancel to capture its (bool, error) result and handle both cases: if error != nil or the bool is false after the timeout/cancellation, log a clear error including the context deadline/cancel reason and return that error (or stop the post-pivot flow) instead of falling through to RecertFullFlow; reference the ctxWithTimeout, wait.PollUntilContextCancel call and the p.ops.RunBashInHostNamespace pull attempt to locate where to add the result check and early return.
♻️ Duplicate comments (2)
lca-cli/ibi-preparation/ibipreparation.go (1)
133-133:⚠️ Potential issue | 🟠 MajorPreserve cancellation through the precache step.
Runnow receives a caller-ownedctx, but this call resets it tocontext.Background(). A timed-out or canceled IBI preparation will still keep precaching images. ThreadctxintoprecacheFlowand pass it toworkload.Precache.As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lca-cli/ibi-preparation/ibipreparation.go` at line 133, The precache step currently ignores the caller's cancellation by calling workload.Precache with context.Background(); modify the code to thread the caller-owned ctx through precacheFlow and invoke workload.Precache(ctx, imageList, common.IBIPullSecretFilePath, i.config.PrecacheBestEffort) instead so timeouts/cancellations from Run are preserved; update the precacheFlow signature/uses and any call sites that start the precache to accept and forward ctx (referencing Run, precacheFlow, workload.Precache, imageList, common.IBIPullSecretFilePath, and i.config.PrecacheBestEffort).lca-cli/ops/ops.go (1)
459-463:⚠️ Potential issue | 🟠 MajorUse a non-cancelable context for deferred etcd cleanup.
If
ctxis canceled while recert is unwinding, the deferredStopEtcdServer(ctx, ...)will short-circuit and leave the temporary etcd container behind. Cleanup should run with a non-cancelable context and log failures.As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lca-cli/ops/ops.go` around lines 459 - 463, When deferring StopEtcdServer after RunUnauthenticatedEtcdServer, don't reuse the potentially cancelable ctx; create a non-cancelable context (e.g. cleanupCtx := context.Background() or context.WithoutCancel equivalent) and call o.StopEtcdServer(cleanupCtx, authFile, common.EtcdContainerName) inside the defer, capturing and logging any error returned (e.g. if err := o.StopEtcdServer(...); err != nil { /* log error via o.Logger/errorf or fmt */ }). Update the defer that references StopEtcdServer to use cleanupCtx and ensure failures are logged.
🧹 Nitpick comments (2)
internal/reboot/reboot_test.go (1)
63-64: Assert the forwarded context, not just any context.Using
gomock.Any()on Line 63 won’t catch regressions whereIsOrigStaterootBootedstops propagating the caller’s context.Proposed test tightening
+ ctx := context.Background() - mockRpmostreeclient.EXPECT().GetCurrentStaterootName(gomock.Any()).Return(tt.currentStateRoot, nil).Times(1) - got, err := rebootClient.IsOrigStaterootBooted(context.Background(), tt.args.ibu.Spec.SeedImageRef.Version) + mockRpmostreeclient.EXPECT().GetCurrentStaterootName(gomock.Eq(ctx)).Return(tt.currentStateRoot, nil).Times(1) + got, err := rebootClient.IsOrigStaterootBooted(ctx, tt.args.ibu.Spec.SeedImageRef.Version)As per coding guidelines "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/reboot/reboot_test.go` around lines 63 - 64, The test currently uses gomock.Any() for the context which won't catch regressions in context propagation; modify the test to create a concrete ctx variable (e.g., ctx := context.Background()), pass that ctx into rebootClient.IsOrigStaterootBooted, and change the mock expectation on mockRpmostreeclient.GetCurrentStaterootName to expect that exact context (e.g., mockRpmostreeclient.EXPECT().GetCurrentStaterootName(gomock.Eq(ctx)).Return(...)). This ensures the test asserts the forwarded context from IsOrigStaterootBooted rather than accepting any context.lca-cli/ipconfig/postpivot_test.go (1)
94-106: These expectations no longer prove the caller's context is propagated.Using
gomock.Any()for every newcontext.Contextarg means this still passes ifRun()regresses tocontext.Background()internally. Since context threading is the behavior under test here, move thectxsetup before the expectations and match that value (or a tighter matcher derived from it) on the mocked ops calls.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lca-cli/ipconfig/postpivot_test.go` around lines 94 - 106, The test currently uses gomock.Any() for every context argument which masks regressions in context propagation; before setting up gomock expectations create the test context (e.g. ctx := context.WithTimeout(...) or context.Background()) and use that exact ctx in the mocked calls (replace the first gomock.Any() argument in RunInHostNamespace, RecertFullFlow, SystemctlAction, ReadFile, etc. with gomock.Eq(ctx) or the ctx variable itself) and then call the code under test with that same ctx so the expectations assert the real context is threaded through (refer to RunInHostNamespace, RecertFullFlow, SystemctlAction, ReadFile and handler.Run/context usage to locate spots to change).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@controllers/ipc_controller.go`:
- Around line 327-334: The function isUnbootedStaterootAvailable should
propagate context cancellation/timeouts instead of treating all errors as "not
available": after calling rpmOstreeClient.GetUnbootedStaterootName(ctx) check
err and if errors.Is(err, context.Canceled) || errors.Is(err,
context.DeadlineExceeded) return nil, err; if err != nil return nil, err; then
if unbootedStaterootName == "" return lo.ToPtr(false), nil; otherwise return
lo.ToPtr(true), nil; also add the errors import.
In `@lca-cli/cmd/ibuStaterootSetup.go`:
- Line 120: The SIGTERM handler uses context.Background() when calling
opsClient.ImageExists which can block shutdown; change that call to use a short
timeout-bound context (e.g., ctx, cancel :=
context.WithTimeout(context.Background(), 3-10*time.Second); defer cancel()) and
pass ctx to ImageExists so the probe is bounded; ensure you call cancel() after
the check and adjust imports to include time if needed, targeting the
ImageExists call site where seedImage and seedImageExists are used inside the
SIGTERM handler.
In `@main/main.go`:
- Line 270: The call to chrootOp.GetContainerStorageTarget uses
context.Background(), which ignores cancellation and can hang startup;
create/use the signal-derived startup context (the same one you build for
manager startup, e.g. the context passed into mgr.Start or the new “startup”
context created from signals) before these init calls and pass that context into
GetContainerStorageTarget instead of context.Background(), ensuring any
cancellation or SIG handling will abort the lookup; update the call site where
containerStorageMountpointTarget is obtained and any other early init calls that
currently use context.Background() to use the startup context.
---
Outside diff comments:
In `@controllers/ipc_config_handlers.go`:
- Around line 801-817: The call to RunSystemdAction in RunLcaCliIPConfigPrePivot
ties systemd-run --wait to the reconcile ctx so cancelling ctx can kill the
client while the transient systemd unit continues running; change this to use a
detached context (e.g., create a new context.Background()-based ctx or a derived
context not cancelled by the reconcile) when invoking
h.ChrootOps.RunSystemdAction so the systemd unit is created and allowed to run
independently, or alternatively implement explicit unit follow-up logic on ctx
cancellation (query/stop the unit). Make the same change in the corresponding
scheduleIPConfigRollback call in controllers/ipc_rollback_handlers.go.
In `@internal/precache/workload/pullImages.go`:
- Around line 82-95: The loop that calls podmanImgPull (inside the
pullImage/pullImages.go retry loop bounded by MaxRetries) currently treats
context cancellation like any other error and continues retrying; update the
error handling so that if the context has been canceled or its deadline exceeded
(check ctx.Err() != nil or use errors.Is(err, context.Canceled)/errors.Is(err,
context.DeadlineExceeded)) you log and return immediately instead of retrying.
Modify the branch after podmanImgPull fails to detect context cancellation and
exit the function (no further retries), keeping the existing "manifest unknown"
short-circuit behavior intact.
In `@lca-cli/ipconfig/prepivot.go`:
- Around line 185-193: The deferred cleanup currently reuses the operation's ctx
which may be canceled; change the defer to create a fresh bounded context (e.g.,
ctx2, cancel := context.WithTimeout(context.Background(), <sensibleDuration>);
defer cancel()) and call p.ops.EnableClusterServices(ctx2) so re-enabling
cluster services is not affected by the original context's cancellation; update
the defer block to use ctx2 and ensure you add the time import for the timeout
value.
In `@lca-cli/ops/ops.go`:
- Around line 737-745: The polling call to wait.PollUntilContextCancel is
currently ignored so failures or timeouts during container stop are swallowed;
capture its returned error and handle it (e.g., assign err :=
wait.PollUntilContextCancel(...); if err != nil { return fmt.Errorf("failed
waiting for containers to stop: %w", err) } ) instead of proceeding to shut down
CRI-O; update the surrounding function to propagate or log the error
appropriately so RunBashInHostNamespace/crictl failures abort the shutdown flow.
- Around line 252-275: The waitForEtcd function ignores the passed ctx and uses
http.Get; modify it to listen for ctx cancellation and issue a context-aware
HTTP request: inside the select add a case <-ctx.Done() that returns ctx.Err(),
replace http.Get(healthzEndpoint) with creating an http.Request via
http.NewRequestWithContext(ctx, "GET", healthzEndpoint, nil) and use
http.DefaultClient.Do(req) (or a client variable) so the request is bound to
ctx, and avoid deferring resp.Body.Close() inside the loop by closing resp.Body
immediately after you're done checking StatusCode; keep the existing
timeout/ticker logic but ensure ctx cancellation short-circuits the loop.
In `@lca-cli/postpivot/postpivot.go`:
- Around line 276-291: The PollUntilContextCancel call using ctxWithTimeout
currently discards its result so a timeout/cancel is ignored and RecertFullFlow
proceeds without the image; change the code around wait.PollUntilContextCancel
to capture its (bool, error) result and handle both cases: if error != nil or
the bool is false after the timeout/cancellation, log a clear error including
the context deadline/cancel reason and return that error (or stop the post-pivot
flow) instead of falling through to RecertFullFlow; reference the
ctxWithTimeout, wait.PollUntilContextCancel call and the
p.ops.RunBashInHostNamespace pull attempt to locate where to add the result
check and early return.
---
Duplicate comments:
In `@lca-cli/ibi-preparation/ibipreparation.go`:
- Line 133: The precache step currently ignores the caller's cancellation by
calling workload.Precache with context.Background(); modify the code to thread
the caller-owned ctx through precacheFlow and invoke workload.Precache(ctx,
imageList, common.IBIPullSecretFilePath, i.config.PrecacheBestEffort) instead so
timeouts/cancellations from Run are preserved; update the precacheFlow
signature/uses and any call sites that start the precache to accept and forward
ctx (referencing Run, precacheFlow, workload.Precache, imageList,
common.IBIPullSecretFilePath, and i.config.PrecacheBestEffort).
In `@lca-cli/ops/ops.go`:
- Around line 459-463: When deferring StopEtcdServer after
RunUnauthenticatedEtcdServer, don't reuse the potentially cancelable ctx; create
a non-cancelable context (e.g. cleanupCtx := context.Background() or
context.WithoutCancel equivalent) and call o.StopEtcdServer(cleanupCtx,
authFile, common.EtcdContainerName) inside the defer, capturing and logging any
error returned (e.g. if err := o.StopEtcdServer(...); err != nil { /* log error
via o.Logger/errorf or fmt */ }). Update the defer that references
StopEtcdServer to use cleanupCtx and ensure failures are logged.
---
Nitpick comments:
In `@internal/reboot/reboot_test.go`:
- Around line 63-64: The test currently uses gomock.Any() for the context which
won't catch regressions in context propagation; modify the test to create a
concrete ctx variable (e.g., ctx := context.Background()), pass that ctx into
rebootClient.IsOrigStaterootBooted, and change the mock expectation on
mockRpmostreeclient.GetCurrentStaterootName to expect that exact context (e.g.,
mockRpmostreeclient.EXPECT().GetCurrentStaterootName(gomock.Eq(ctx)).Return(...)).
This ensures the test asserts the forwarded context from IsOrigStaterootBooted
rather than accepting any context.
In `@lca-cli/ipconfig/postpivot_test.go`:
- Around line 94-106: The test currently uses gomock.Any() for every context
argument which masks regressions in context propagation; before setting up
gomock expectations create the test context (e.g. ctx :=
context.WithTimeout(...) or context.Background()) and use that exact ctx in the
mocked calls (replace the first gomock.Any() argument in RunInHostNamespace,
RecertFullFlow, SystemctlAction, ReadFile, etc. with gomock.Eq(ctx) or the ctx
variable itself) and then call the code under test with that same ctx so the
expectations assert the real context is threaded through (refer to
RunInHostNamespace, RecertFullFlow, SystemctlAction, ReadFile and
handler.Run/context usage to locate spots to change).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: cbc175aa-17e9-487c-bdd2-458a20f210c1
📒 Files selected for processing (56)
controllers/ibu_controller.gocontrollers/ibu_controller_test.gocontrollers/idle_handlers.gocontrollers/idle_handlers_test.gocontrollers/ipc_config_handlers.gocontrollers/ipc_config_handlers_test.gocontrollers/ipc_controller.gocontrollers/ipc_controller_test.gocontrollers/ipc_idle_handlers.gocontrollers/ipc_idle_handlers_test.gocontrollers/ipc_rollback_handlers.gocontrollers/ipc_rollback_handlers_test.gocontrollers/prep_handlers.gocontrollers/rollback_handlers.gocontrollers/seedgen_controller.gocontrollers/upgrade_handlers.gocontrollers/upgrade_handlers_test.gointernal/imagemgmt/imagemgmt.gointernal/imagemgmt/imagemgmt_test.gointernal/imagemgmt/mock_imagemgmt.gointernal/ostreeclient/mock_ostreeclient.gointernal/ostreeclient/ostreeclient.gointernal/precache/workload/pullImages.gointernal/prep/prep.gointernal/reboot/mock_reboot.gointernal/reboot/reboot.gointernal/reboot/reboot_test.golca-cli/cmd/create.golca-cli/cmd/ibi.golca-cli/cmd/ibuPrecacheWorkload.golca-cli/cmd/ibuStaterootSetup.golca-cli/cmd/ipconfig/postpivot.golca-cli/cmd/ipconfig/prepivot.golca-cli/cmd/ipconfig/rollback.golca-cli/cmd/postpivotcmd.golca-cli/cmd/restore.golca-cli/ibi-preparation/ibipreparation.golca-cli/ibi-preparation/ibipreparation_test.golca-cli/initmonitor/initmonitor.golca-cli/ipconfig/postpivot.golca-cli/ipconfig/postpivot_test.golca-cli/ipconfig/prepivot.golca-cli/ipconfig/prepivot_test.golca-cli/ipconfig/rollback.golca-cli/ipconfig/rollback_test.golca-cli/ops/execute.golca-cli/ops/mock_execute.golca-cli/ops/mock_ops.golca-cli/ops/ops.golca-cli/ostreeclient/mock_rpmostreeclient.golca-cli/ostreeclient/rpmostreeclient.golca-cli/postpivot/postpivot.golca-cli/postpivot/postpivot_test.golca-cli/seedcreator/seedcreator.golca-cli/seedrestoration/seedrestoration.gomain/main.go
✅ Files skipped from review due to trivial changes (8)
- controllers/ibu_controller_test.go
- lca-cli/cmd/ipconfig/postpivot.go
- controllers/prep_handlers.go
- lca-cli/ipconfig/postpivot.go
- lca-cli/ops/mock_execute.go
- controllers/ipc_idle_handlers_test.go
- internal/imagemgmt/mock_imagemgmt.go
- internal/reboot/reboot.go
🚧 Files skipped from review as they are similar to previous changes (24)
- lca-cli/cmd/ipconfig/rollback.go
- lca-cli/cmd/create.go
- lca-cli/cmd/postpivotcmd.go
- lca-cli/cmd/restore.go
- lca-cli/ipconfig/rollback.go
- lca-cli/cmd/ibi.go
- controllers/upgrade_handlers_test.go
- internal/imagemgmt/imagemgmt_test.go
- lca-cli/seedrestoration/seedrestoration.go
- controllers/idle_handlers.go
- lca-cli/ipconfig/rollback_test.go
- lca-cli/ibi-preparation/ibipreparation_test.go
- controllers/ipc_idle_handlers.go
- internal/prep/prep.go
- controllers/seedgen_controller.go
- lca-cli/cmd/ipconfig/prepivot.go
- controllers/upgrade_handlers.go
- lca-cli/ostreeclient/rpmostreeclient.go
- internal/ostreeclient/ostreeclient.go
- internal/imagemgmt/imagemgmt.go
- internal/reboot/mock_reboot.go
- controllers/ipc_rollback_handlers_test.go
- lca-cli/ostreeclient/mock_rpmostreeclient.go
- controllers/ipc_controller_test.go
| func isUnbootedStaterootAvailable(ctx context.Context, rpmOstreeClient rpmostreeclient.IClient) (*bool, error) { | ||
| if rpmOstreeClient == nil { | ||
| return nil, fmt.Errorf("rpmOstreeClient is nil") | ||
| } | ||
|
|
||
| unbootedStaterootName, err := rpmOstreeClient.GetUnbootedStaterootName() | ||
| unbootedStaterootName, err := rpmOstreeClient.GetUnbootedStaterootName(ctx) | ||
| if err != nil || unbootedStaterootName == "" { | ||
| return lo.ToPtr(false), nil |
There was a problem hiding this comment.
Propagate cancellation instead of reporting “no unbooted stateroot”.
After this change, GetUnbootedStaterootName(ctx) can fail because the caller canceled or timed out the reconcile. Collapsing every error into false, nil turns that control signal into a real state decision: config can re-enter pre-pivot, and rollback validation can be marked failed, even though nothing was actually checked. Only the genuine “not available” case should map to false.
Suggested fix
func isUnbootedStaterootAvailable(ctx context.Context, rpmOstreeClient rpmostreeclient.IClient) (*bool, error) {
if rpmOstreeClient == nil {
return nil, fmt.Errorf("rpmOstreeClient is nil")
}
unbootedStaterootName, err := rpmOstreeClient.GetUnbootedStaterootName(ctx)
- if err != nil || unbootedStaterootName == "" {
+ if err != nil {
+ if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
+ return nil, err
+ }
+ return lo.ToPtr(false), nil
+ }
+ if unbootedStaterootName == "" {
return lo.ToPtr(false), nil
}
return lo.ToPtr(true), nil
}You'll also need to add the errors import.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@controllers/ipc_controller.go` around lines 327 - 334, The function
isUnbootedStaterootAvailable should propagate context cancellation/timeouts
instead of treating all errors as "not available": after calling
rpmOstreeClient.GetUnbootedStaterootName(ctx) check err and if errors.Is(err,
context.Canceled) || errors.Is(err, context.DeadlineExceeded) return nil, err;
if err != nil return nil, err; then if unbootedStaterootName == "" return
lo.ToPtr(false), nil; otherwise return lo.ToPtr(true), nil; also add the errors
import.
|
@sebrandon1: This pull request references CNF-23402 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
0f3c830 to
3a47e05
Compare
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (8)
internal/reboot/reboot.go (1)
180-191:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMake the post-reboot wait respect
ctx.Done().Both
Rebootimplementations now accept a context, but aftersystemd-runthey still block intime.Sleep(defaultRebootTimeout). If reboot does not terminate the process immediately, canceled or timed-out callers still hang for 60 minutes, which defeats the new cancellation behavior.Suggested fix
c.log.Info(fmt.Sprintf("Wait for %s to be killed via SIGTERM", defaultRebootTimeout.String())) - time.Sleep(defaultRebootTimeout) + timer := time.NewTimer(defaultRebootTimeout) + defer timer.Stop() + select { + case <-ctx.Done(): + return fmt.Errorf("reboot initiated but context ended before process exit: %w", ctx.Err()) + case <-timer.C: + } return fmt.Errorf("failed to reboot. This should never happen! Please check the system")Also applies to: 467-478
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/reboot/reboot.go` around lines 180 - 191, The Reboot method in IBURebootClient currently uses a blocking time.Sleep(defaultRebootTimeout) after invoking systemd-run which ignores ctx cancellation; replace the sleep with a select that waits on time.After(defaultRebootTimeout) and ctx.Done() so callers can be canceled or time out - e.g., in Reboot (and the other Reboot implementation around 467-478) use select { case <-time.After(defaultRebootTimeout): /* proceed */; case <-ctx.Done(): return ctx.Err() } and keep the existing log behavior when the wait completes normally.controllers/prep_handlers.go (1)
318-343:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMissed context propagation:
validateSeedOcpVersionusescontext.Background()on a reconcile path.
validateSeedOcpVersionis called fromvalidateIBUSpec(ctx, ibu)(line 431) which is in turn invoked fromhandlePrep(ctx, ibu)(line 562). The reconcilectxis available all the way down, but this method discards it and creates a freshcontext.Background()for ther.Get(...)call against the API server. That defeats reconcile cancellation/timeout for this lookup and is inconsistent with the rest of this PR.♻️ Suggested change
-func (r *ImageBasedUpgradeReconciler) validateSeedOcpVersion(seedOcpVersion string) error { +func (r *ImageBasedUpgradeReconciler) validateSeedOcpVersion(ctx context.Context, seedOcpVersion string) error { // get target OCP version targetClusterVersion := &configv1.ClusterVersion{} - if err := r.Get(context.Background(), types.NamespacedName{Name: "version"}, targetClusterVersion); err != nil { + if err := r.Get(ctx, types.NamespacedName{Name: "version"}, targetClusterVersion); err != nil { return fmt.Errorf("failed to get ClusterVersion for target: %w", err) }And update the caller:
- if err := r.validateSeedOcpVersion(ibu.Spec.SeedImageRef.Version); err != nil { + if err := r.validateSeedOcpVersion(ctx, ibu.Spec.SeedImageRef.Version); err != nil { return fmt.Errorf("failed to validate seed image OCP version: %w", err) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@controllers/prep_handlers.go` around lines 318 - 343, The function validateSeedOcpVersion currently uses context.Background() for the r.Get call which drops reconcile cancellation/timeout; change validateSeedOcpVersion to accept a context.Context parameter and use that context when calling r.Get (replace context.Background() with the passed ctx), update callers (validateIBUSpec and ultimately handlePrep) to pass the reconcile ctx through to validateIBUSpec and then to validateSeedOcpVersion so the API lookup honors the reconcile context and timeouts.controllers/upgrade_handlers.go (1)
354-366:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon’t let reconcile cancellation suppress automatic rollback.
This is the recovery path for fatal post-pivot failures. Reusing the reconcile
ctxmeans a shutting-down or canceled controller can abortInitiateRollback(...)immediately and leave the node in the failed upgrade state without the rollback this branch promises.🛟 Suggested fix
func (u *UpgHandler) autoRollbackIfEnabled(ctx context.Context, ibu *ibuv1.ImageBasedUpgrade, msg string) { // Check whether auto-rollback is disabled using annotation if val, exists := ibu.GetAnnotations()[common.AutoRollbackOnFailureUpgradeCompletionAnnotation]; exists { if val == common.AutoRollbackDisableValue { u.Log.Info("Auto-rollback upgrade completion is disabled") return } } u.Log.Info("Automatically rolling back due to failure") - if err := u.RebootClient.InitiateRollback(ctx, msg); err != nil { + rollbackCtx, cancel := context.WithTimeout(context.Background(), 5*time.Minute) + defer cancel() + if err := u.RebootClient.InitiateRollback(rollbackCtx, msg); err != nil { u.Log.Info(fmt.Sprintf("Unable to auto rollback: %s", err)) return }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@controllers/upgrade_handlers.go` around lines 354 - 366, The autoRollbackIfEnabled method currently reuses the reconcile ctx which can be cancelled and abort InitiateRollback; change it so UpgHandler.autoRollbackIfEnabled creates and uses a non-cancellable context (e.g., context.Background() or context.WithTimeout/WithDeadline) instead of the incoming ctx when calling u.RebootClient.InitiateRollback(ctx, msg), ensuring InitiateRollback runs to completion even if the reconcile context is cancelled; update the call site in autoRollbackIfEnabled to pass the new context and keep existing logging of failures.lca-cli/ipconfig/prepivot.go (1)
180-194:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winUse a non-cancellable context for the deferred
EnableClusterServicesrecovery.
StopClusterServices(ctx)succeeds with the request-scoped ctx, then the deferredEnableClusterServices(ctx)uses the same ctx. IfRun's ctx is cancelled (deadline expiry or parent cancellation) before the defer executes,exec.CommandContextwill refuse to start the recovery command and the node will be left with cluster services stopped. This is the same pattern previously flagged oninternal/prep/prep.go(deferredUnmountAndRemoveImage) that was fixed withcontext.Background().🔧 Proposed fix
p.log.Info("Stopping cluster services") if err = p.ops.StopClusterServices(ctx); err != nil { err = fmt.Errorf("failed to stop cluster services: %w", err) return } defer func() { p.log.Info("Enabling cluster services in old stateroot") - if internalErr := p.ops.EnableClusterServices(ctx); internalErr != nil { + // Use a non-cancellable context so recovery still runs if the request + // ctx was cancelled or its deadline elapsed. + cleanupCtx := context.Background() + if internalErr := p.ops.EnableClusterServices(cleanupCtx); internalErr != nil { if err == nil { err = internalErr } else { err = fmt.Errorf("%s; also failed to enable cluster services: %w", err.Error(), internalErr) } } }()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lca-cli/ipconfig/prepivot.go` around lines 180 - 194, The deferred recovery uses the request-scoped ctx which may be cancelled before the defer runs; change the defer to call p.ops.EnableClusterServices with a non-cancellable context (e.g., context.Background()) instead of the original ctx so the recovery command can run even if Run's ctx is cancelled; update the defer closure around p.ops.EnableClusterServices(ctx) to create and use a background context and preserve error handling logic that merges internalErr into the outer err.lca-cli/ibi-preparation/ibipreparation.go (1)
74-74:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPrecache flow drops the propagated context.
Run(ctx)callsi.precacheFlow(...)(line 74) butprecacheFlowdoes not take a context, and on line 133 it callsworkload.Precache(context.Background(), ...). Since precaching is one of the long-running operations the PR aims to make cancellable (per the PR description), usingcontext.Background()here means precache cannot be cancelled or time-limited even though the rest ofRunis wired up. ThreadctxthroughprecacheFlowand pass it toworkload.Precache.Proposed fix
- if err := i.precacheFlow(common.ContainersListFilePath, common.IBISeedInfoFilePath, defaultRegistriesConfFile); err != nil { + if err := i.precacheFlow(ctx, common.ContainersListFilePath, common.IBISeedInfoFilePath, defaultRegistriesConfFile); err != nil { return fmt.Errorf("failed to precache: %w", err) } @@ -func (i *IBIPrepare) precacheFlow(imageListFile, seedInfoFile, registriesConfFile string) error { +func (i *IBIPrepare) precacheFlow(ctx context.Context, imageListFile, seedInfoFile, registriesConfFile string) error { @@ - if err := workload.Precache(context.Background(), imageList, common.IBIPullSecretFilePath, i.config.PrecacheBestEffort); err != nil { + if err := workload.Precache(ctx, imageList, common.IBIPullSecretFilePath, i.config.PrecacheBestEffort); err != nil { return fmt.Errorf("failed to start precache: %w", err) }Also applies to: 89-89, 133-133
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lca-cli/ibi-preparation/ibipreparation.go` at line 74, The precache flow drops the propagated context: change precacheFlow to accept a context.Context (e.g., precacheFlow(ctx context.Context, ...)) and update its callers (the Run(ctx) invocation and any other calls) to pass the incoming ctx instead of using context.Background(); inside precacheFlow replace workload.Precache(context.Background(), ...) with workload.Precache(ctx, ...) so the long-running precache operation is cancellable and honors timeouts/cancellation from Run(ctx).lca-cli/ops/ops.go (2)
737-745:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPropagate failures from the container-stop retry loop.
The poll result is discarded. If
crictl stopkeeps failing orctxis canceled, this still falls through tosystemctl stop crioand can return success after an incomplete shutdown.Suggested fix
- _ = wait.PollUntilContextCancel(ctx, time.Second, true, func(ctx context.Context) (done bool, err error) { + if err := wait.PollUntilContextCancel(ctx, time.Second, true, func(ctx context.Context) (done bool, err error) { o.log.Info("Stop running containers") args := []string{"ps", "-q", "|", "xargs", "--no-run-if-empty", "--max-args", "1", "--max-procs", "10", "crictl", "stop", "--timeout", "5"} _, err = o.RunBashInHostNamespace(ctx, "crictl", args...) if err != nil { return false, fmt.Errorf("failed to stop running containers: %w", err) } return true, nil - }) + }); err != nil { + return fmt.Errorf("failed to stop running containers: %w", err) + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lca-cli/ops/ops.go` around lines 737 - 745, The poll call's returned error is ignored so failures or context cancellation during the crictl stop loop are swallowed; assign the result of wait.PollUntilContextCancel(...) to a variable, check it, and propagate or return a wrapped error instead of continuing to systemctl stop crio. Locate the anonymous func passed to wait.PollUntilContextCancel in the same block (which calls o.RunBashInHostNamespace with args and logs "Stop running containers") and change the caller to: err := wait.PollUntilContextCancel(...); if err != nil { return fmt.Errorf("stopping containers failed: %w", err) } (or propagate appropriately from the surrounding function).
252-275:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMake
waitForEtcdactually honorctx.The new signature takes
ctx, but this loop still waits on its own ticker/timeout and probes withhttp.Getwithout a request context. A canceled caller can still block here for up to a minute, which defeats the PR's cancellation goal.Suggested fix
func (o *ops) waitForEtcd(ctx context.Context, healthzEndpoint string) error { - timeout := time.After(1 * time.Minute) - ticker := time.NewTicker(1 * time.Second) - defer ticker.Stop() - - for { - select { - case <-timeout: - return fmt.Errorf("timeout waiting for etcd") - case <-ticker.C: - resp, err := http.Get(healthzEndpoint) //nolint:gosec - if err != nil { - o.log.Infof("Waiting for etcd: %s", err) - continue - } - defer func() { _ = resp.Body.Close() }() - - if resp.StatusCode != http.StatusOK { - o.log.Infof("Waiting for etcd, status: %d", resp.StatusCode) - continue - } - - return nil - } - } + client := &http.Client{} + return wait.PollUntilContextTimeout(ctx, time.Second, time.Minute, true, func(ctx context.Context) (bool, error) { + req, err := http.NewRequestWithContext(ctx, http.MethodGet, healthzEndpoint, nil) + if err != nil { + return false, err + } + resp, err := client.Do(req) //nolint:gosec + if err != nil { + o.log.Infof("Waiting for etcd: %s", err) + return false, nil + } + defer func() { _ = resp.Body.Close() }() + + if resp.StatusCode != http.StatusOK { + o.log.Infof("Waiting for etcd, status: %d", resp.StatusCode) + return false, nil + } + return true, nil + }) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lca-cli/ops/ops.go` around lines 252 - 275, waitForEtcd does not honor the passed ctx: replace the ad-hoc timeout/ticker with a ctx-aware loop by deriving a cancellable timeout context (e.g., ctx, cancel := context.WithTimeout(ctx, 1*time.Minute); defer cancel()), keep a ticker (time.NewTicker(1*time.Second)) and in the select listen for ctx.Done() instead of the timeout channel, and when probing use http.NewRequestWithContext(ctx, "GET", healthzEndpoint, nil) followed by http.DefaultClient.Do(req) so the HTTP request is cancelable; ensure resp.Body is closed and log ctx.Err() (using o.log.Infof) when exiting due to context cancellation.lca-cli/postpivot/postpivot.go (1)
275-291: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winReturn the recert image pull timeout/cancellation.
wait.PollUntilContextCancelis ignored here. If the 10-minute timeout expires orctxis canceled, this block still falls through intoRecertFullFlow, so the caller sees a later recert failure instead of the real pull failure.Suggested fix
- _ = wait.PollUntilContextCancel(ctxWithTimeout, time.Second, true, func(ctx context.Context) (bool, error) { + if err := wait.PollUntilContextCancel(ctxWithTimeout, time.Second, true, func(ctx context.Context) (bool, error) { p.log.Info("pulling recert image") command := "podman" if seedReconfiguration.Proxy != nil { command = fmt.Sprintf("HTTP_PROXY=%s HTTPS_PROXY=%s NO_PROXY=%s %s", seedReconfiguration.Proxy.HTTPProxy, seedReconfiguration.Proxy.HTTPSProxy, seedReconfiguration.Proxy.NoProxy, command) } if _, err := p.ops.RunBashInHostNamespace(ctx, command, "pull", "--authfile", common.ImageRegistryAuthFile, seedClusterInfo.RecertImagePullSpec); err != nil { p.log.Warnf("failed to pull recert image, will retry, err: %s", err.Error()) return false, nil } return true, nil - }) + }); err != nil { + return fmt.Errorf("failed to pull recert image %s: %w", seedClusterInfo.RecertImagePullSpec, err) + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lca-cli/postpivot/postpivot.go` around lines 275 - 291, The PollUntilContextCancel call's result is ignored so a timeout/cancel on ctxWithTimeout is not propagated; change the block around wait.PollUntilContextCancel (used with ctxWithTimeout and cancel) to capture its returned error and, if non-nil, return or wrap that error from the surrounding function instead of falling through into RecertFullFlow; ensure you propagate cancellation/timeout from ctxWithTimeout and preserve context when invoking p.ops.RunInHostNamespace / p.ops.RunBashInHostNamespace so the real pull failure is surfaced to callers.
♻️ Duplicate comments (3)
main/main.go (1)
270-274:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftStartup mount-target lookup still uses
context.Background().This is the same concern raised previously:
chrootOp.GetContainerStorageTarget(context.Background())runs on a detached context, so a signal received beforemgr.Start(ctx)cannot abort this host command. The signal-derived context is constructed later at line 407 (ctrl.SetupSignalHandler()); consider creating it earlier and threading it into the init calls (this lookup, and thelcautils.InitIBU/initSeedGen/lcautils.InitIPConfigcalls on lines 244/249/254 that still passcontext.TODO()).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@main/main.go` around lines 270 - 274, Replace the detached contexts with a signal-derived context created before the init calls and passed into them: create the manager/signal context (the same one used by mgr.Start(ctx) and created by ctrl.SetupSignalHandler()) earlier, then call chrootOp.GetContainerStorageTarget(ctx) instead of context.Background(), and call lcautils.InitIBU(ctx,...), initSeedGen(ctx,...), and lcautils.InitIPConfig(ctx,...) instead of context.TODO(); ensure mgr.Start continues to receive the same ctx so startup commands abort on signals.lca-cli/cmd/ibuStaterootSetup.go (1)
116-124:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSIGTERM handler still uses an unbounded
context.Background()for the seed-image probe.This is the same concern raised previously: if
opsClient.ImageExistsblocks against a wedged container store, the SIGTERM goroutine cannot reach the timedtime.Sleep/os.Exitpath and shutdown is left to SIGKILL. Bound the probe with a shortcontext.WithTimeoutso the handler always makes forward progress.🛡️ Suggested fix
- if seedImageExists, _ := opsClient.ImageExists(context.Background(), seedImage); seedImageExists { + checkCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second) + seedImageExists, _ := opsClient.ImageExists(checkCtx, seedImage) + cancel() + if seedImageExists {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lca-cli/cmd/ibuStaterootSetup.go` around lines 116 - 124, The SIGTERM handler uses context.Background() when calling opsClient.ImageExists, which can block; replace that call to use a short bounded context via context.WithTimeout (e.g., a few seconds) and defer cancel so the probe can't hang the handler; update the logic around opsClient.ImageExists(seedImage) to accept the (exists, err) return, treat errors as non-fatal (log them with logger.Error) and proceed to the time.Sleep/exit path using prep.StaterootSetupTerminationGracePeriodSeconds as before so the handler always makes forward progress even if the container store is wedged.controllers/ipc_controller.go (1)
327-334:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPropagate canceled rpm-ostree lookups instead of treating them as “not available”.
GetUnbootedStaterootName(ctx)can now fail because the reconcile was canceled or timed out. Returningfalse, nilfor every error makesvalidNextStages(...)treat an interrupted check as real state, which can incorrectly reopen config paths or hide rollback eligibility.🐛 Suggested fix
func isUnbootedStaterootAvailable(ctx context.Context, rpmOstreeClient rpmostreeclient.IClient) (*bool, error) { if rpmOstreeClient == nil { return nil, fmt.Errorf("rpmOstreeClient is nil") } unbootedStaterootName, err := rpmOstreeClient.GetUnbootedStaterootName(ctx) - if err != nil || unbootedStaterootName == "" { + if err != nil { + return nil, err + } + if unbootedStaterootName == "" { return lo.ToPtr(false), nil } return lo.ToPtr(true), nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@controllers/ipc_controller.go` around lines 327 - 334, The function isUnbootedStaterootAvailable currently swallows all errors from rpmOstreeClient.GetUnbootedStaterootName and returns false, which hides canceled/timeouts; change the logic so that if GetUnbootedStaterootName returns a non-nil error you propagate that error (return nil, err) instead of returning false, and only return lo.ToPtr(false), nil when err == nil and unbootedStaterootName == "" — update isUnbootedStaterootAvailable to call rpmOstreeClient.GetUnbootedStaterootName(ctx) and return nil, err on any error so canceled/deadline errors bubble up to callers like validNextStages.
🧹 Nitpick comments (1)
lca-cli/cmd/ibi.go (1)
86-98: ⚡ Quick win
context.Background()in adapter silently drops cancellation forcleanupDeviceoperationsBecause
preinstallUtilsdoesn't accept acontext.Context, the adapter hard-codescontext.Background(). As a result, ifcmd.Context()is cancelled (e.g., SIGTERM or a timeout), disk-cleanup commands dispatched throughcleanupDevicewill not be aborted and will run to completion (or hang) regardless of the outer cancellation signal.This is an inherent limitation of the external library interface, but it's worth adding a short inline comment making the cancellation-loss explicit so future maintainers don't assume
ctxpropagates end-to-end:📝 Suggested comment addition
func (a *preinstallExecutorAdapter) Execute(command string, args ...string) (string, error) { - out, err := a.exec.Execute(context.Background(), command, args...) + // context.Background() is intentional: preinstall-utils Execute interface is + // context-unaware, so the caller's cancellation signal cannot be forwarded here. + out, err := a.exec.Execute(context.Background(), command, args...)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lca-cli/cmd/ibi.go` around lines 86 - 98, The adapter preinstallExecutorAdapter.Execute currently calls a.exec.Execute(context.Background(), ...) which intentionally drops any caller cancellation; add a concise inline comment above the Execute method (referencing preinstallExecutorAdapter, Execute and ops.Execute) explaining that context.Background() is used because preinstall-utils' Execute signature lacks a context, that this will not propagate cancellation (e.g., cmd.Context()/SIGTERM/timeouts) and therefore cleanupDevice invocations may not be aborted—make the limitation explicit so future maintainers won't assume context propagation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@controllers/seedgen_controller.go`:
- Around line 588-596: Change validateSystem to return (string, error) instead
of only msg: have validateSystem call
ostreeclient.NewClient(...).IsOstreeAdminSetDefaultFeatureEnabled(ctx) and on
probe error return ("", err) while preserving the current rejection string only
for definitive false results (return msg, nil). Update the Reconcile caller that
currently treats any non-empty rejection as permanent failure (uses
setSeedGenStatusFailed/getPhase) to check the error return first and requeue
transient probe errors (do not call setSeedGenStatusFailed), only treating a
non-empty msg with nil error as the permanent rejection path.
In `@internal/imagemgmt/imagemgmt.go`:
- Around line 260-261: The call to c.Executor.Execute(ctx, "podman", args...)
currently ignores errors and thus swallows context cancellation; change the line
in imagemgmt.go (the podman rmi invocation where output is captured) to capture
the error (output, err := c.Executor.Execute(...)), then if errors.Is(err,
context.Canceled) || errors.Is(err, context.DeadlineExceeded) return that error
immediately; otherwise, keep the old behavior of logging the output and ignoring
non-context errors (log the output with c.Log.Info and continue). Ensure you
import or reference errors and use the same ctx/c.Executor.Execute and podman
rmi call sites (i.e., the same function/statement) so cancellation is
propagated.
In `@internal/ostreeclient/ostreeclient.go`:
- Around line 70-72: The call to IsOstreeAdminSetDefaultFeatureEnabled in Deploy
is discarding its error, so probe failures are treated as "feature disabled" and
can change deploy semantics; update Deploy (the block that builds args and calls
Execute) to capture the (bool, error) result, handle the error by returning or
propagating it (instead of using `_`), and only use the boolean when err == nil;
reference the IsOstreeAdminSetDefaultFeatureEnabled result variables and ensure
Deploy returns the error so Execute is not run under a failed probe.
In `@lca-cli/cmd/ipconfig/prepivot.go`:
- Line 127: The code replaces the command context with context.Background(),
which drops cancellation/deadline propagation; instead thread the Cobra command
context into runIPConfigPrePivot by passing cmd.Context() (or a derived context)
rather than creating context.Background(), so all ctx-aware calls (the
OSTree/systemd/reboot invocations invoked inside runIPConfigPrePivot) honor
cancellation and deadlines; update the runIPConfigPrePivot signature and all
call sites to accept the incoming ctx and use that ctx for those operations.
In `@lca-cli/cmd/ipconfig/rollback.go`:
- Line 70: Replace the unconditional root context with the Cobra command's
context so rollback honors cancellations/timeouts: change the local ctx :=
context.Background() to use cmd.Context() (or directly pass cmd.Context() into
runIPConfigRollback), and ensure runIPConfigRollback is invoked with that
cmd.Context() instead of a background context so subsequent ctx-aware calls
(rollback/reboot) respect upstream aborts.
In `@lca-cli/cmd/restore.go`:
- Line 33: The CLI entry uses cmd.Execute() so restore(cmd.Context()) gets a
non-cancellable Background context; change the root command invocation to
ExecuteContext using a signal-aware context (e.g., create ctx :=
signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM) and
defer cancel()) so cmd.Context() inside restore() is cancelled on
SIGINT/SIGTERM; update imports to include os, os/signal and syscall and call
rootCmd.ExecuteContext(ctx) instead of rootCmd.Execute().
In `@lca-cli/ops/execute.go`:
- Around line 67-68: The pipe writer returned by e.log.Writer() is not closed,
leaking the goroutine backing it; change the call sites (e.g.,
regularExecutor.ExecuteWithLiveLogger and the two other methods that pass
e.log.Writer() into execute) to first obtain the writer into a variable (w :=
e.log.Writer()), defer w.Close(), then pass w to e.execute(...) so the writer is
closed after execute returns and no goroutine is leaked.
In `@lca-cli/ops/ops.go`:
- Around line 457-463: The deferred etcd teardown is using the incoming ctx
which may be canceled when RecertFullFlow returns; change the defer to call
StopEtcdServer with a non-cancelable context (e.g. context.Background() or a
fresh context) so the temporary etcd container is always stopped. Locate the
RecertFullFlow function and replace the defer that currently calls
o.StopEtcdServer(ctx, authFile, common.EtcdContainerName) with a call that uses
a new non-cancelable context while keeping the same authFile and
common.EtcdContainerName arguments and error swallowing behavior.
---
Outside diff comments:
In `@controllers/prep_handlers.go`:
- Around line 318-343: The function validateSeedOcpVersion currently uses
context.Background() for the r.Get call which drops reconcile
cancellation/timeout; change validateSeedOcpVersion to accept a context.Context
parameter and use that context when calling r.Get (replace context.Background()
with the passed ctx), update callers (validateIBUSpec and ultimately handlePrep)
to pass the reconcile ctx through to validateIBUSpec and then to
validateSeedOcpVersion so the API lookup honors the reconcile context and
timeouts.
In `@controllers/upgrade_handlers.go`:
- Around line 354-366: The autoRollbackIfEnabled method currently reuses the
reconcile ctx which can be cancelled and abort InitiateRollback; change it so
UpgHandler.autoRollbackIfEnabled creates and uses a non-cancellable context
(e.g., context.Background() or context.WithTimeout/WithDeadline) instead of the
incoming ctx when calling u.RebootClient.InitiateRollback(ctx, msg), ensuring
InitiateRollback runs to completion even if the reconcile context is cancelled;
update the call site in autoRollbackIfEnabled to pass the new context and keep
existing logging of failures.
In `@internal/reboot/reboot.go`:
- Around line 180-191: The Reboot method in IBURebootClient currently uses a
blocking time.Sleep(defaultRebootTimeout) after invoking systemd-run which
ignores ctx cancellation; replace the sleep with a select that waits on
time.After(defaultRebootTimeout) and ctx.Done() so callers can be canceled or
time out - e.g., in Reboot (and the other Reboot implementation around 467-478)
use select { case <-time.After(defaultRebootTimeout): /* proceed */; case
<-ctx.Done(): return ctx.Err() } and keep the existing log behavior when the
wait completes normally.
In `@lca-cli/ibi-preparation/ibipreparation.go`:
- Line 74: The precache flow drops the propagated context: change precacheFlow
to accept a context.Context (e.g., precacheFlow(ctx context.Context, ...)) and
update its callers (the Run(ctx) invocation and any other calls) to pass the
incoming ctx instead of using context.Background(); inside precacheFlow replace
workload.Precache(context.Background(), ...) with workload.Precache(ctx, ...) so
the long-running precache operation is cancellable and honors
timeouts/cancellation from Run(ctx).
In `@lca-cli/ipconfig/prepivot.go`:
- Around line 180-194: The deferred recovery uses the request-scoped ctx which
may be cancelled before the defer runs; change the defer to call
p.ops.EnableClusterServices with a non-cancellable context (e.g.,
context.Background()) instead of the original ctx so the recovery command can
run even if Run's ctx is cancelled; update the defer closure around
p.ops.EnableClusterServices(ctx) to create and use a background context and
preserve error handling logic that merges internalErr into the outer err.
In `@lca-cli/ops/ops.go`:
- Around line 737-745: The poll call's returned error is ignored so failures or
context cancellation during the crictl stop loop are swallowed; assign the
result of wait.PollUntilContextCancel(...) to a variable, check it, and
propagate or return a wrapped error instead of continuing to systemctl stop
crio. Locate the anonymous func passed to wait.PollUntilContextCancel in the
same block (which calls o.RunBashInHostNamespace with args and logs "Stop
running containers") and change the caller to: err :=
wait.PollUntilContextCancel(...); if err != nil { return fmt.Errorf("stopping
containers failed: %w", err) } (or propagate appropriately from the surrounding
function).
- Around line 252-275: waitForEtcd does not honor the passed ctx: replace the
ad-hoc timeout/ticker with a ctx-aware loop by deriving a cancellable timeout
context (e.g., ctx, cancel := context.WithTimeout(ctx, 1*time.Minute); defer
cancel()), keep a ticker (time.NewTicker(1*time.Second)) and in the select
listen for ctx.Done() instead of the timeout channel, and when probing use
http.NewRequestWithContext(ctx, "GET", healthzEndpoint, nil) followed by
http.DefaultClient.Do(req) so the HTTP request is cancelable; ensure resp.Body
is closed and log ctx.Err() (using o.log.Infof) when exiting due to context
cancellation.
In `@lca-cli/postpivot/postpivot.go`:
- Around line 275-291: The PollUntilContextCancel call's result is ignored so a
timeout/cancel on ctxWithTimeout is not propagated; change the block around
wait.PollUntilContextCancel (used with ctxWithTimeout and cancel) to capture its
returned error and, if non-nil, return or wrap that error from the surrounding
function instead of falling through into RecertFullFlow; ensure you propagate
cancellation/timeout from ctxWithTimeout and preserve context when invoking
p.ops.RunInHostNamespace / p.ops.RunBashInHostNamespace so the real pull failure
is surfaced to callers.
---
Duplicate comments:
In `@controllers/ipc_controller.go`:
- Around line 327-334: The function isUnbootedStaterootAvailable currently
swallows all errors from rpmOstreeClient.GetUnbootedStaterootName and returns
false, which hides canceled/timeouts; change the logic so that if
GetUnbootedStaterootName returns a non-nil error you propagate that error
(return nil, err) instead of returning false, and only return lo.ToPtr(false),
nil when err == nil and unbootedStaterootName == "" — update
isUnbootedStaterootAvailable to call
rpmOstreeClient.GetUnbootedStaterootName(ctx) and return nil, err on any error
so canceled/deadline errors bubble up to callers like validNextStages.
In `@lca-cli/cmd/ibuStaterootSetup.go`:
- Around line 116-124: The SIGTERM handler uses context.Background() when
calling opsClient.ImageExists, which can block; replace that call to use a short
bounded context via context.WithTimeout (e.g., a few seconds) and defer cancel
so the probe can't hang the handler; update the logic around
opsClient.ImageExists(seedImage) to accept the (exists, err) return, treat
errors as non-fatal (log them with logger.Error) and proceed to the
time.Sleep/exit path using prep.StaterootSetupTerminationGracePeriodSeconds as
before so the handler always makes forward progress even if the container store
is wedged.
In `@main/main.go`:
- Around line 270-274: Replace the detached contexts with a signal-derived
context created before the init calls and passed into them: create the
manager/signal context (the same one used by mgr.Start(ctx) and created by
ctrl.SetupSignalHandler()) earlier, then call
chrootOp.GetContainerStorageTarget(ctx) instead of context.Background(), and
call lcautils.InitIBU(ctx,...), initSeedGen(ctx,...), and
lcautils.InitIPConfig(ctx,...) instead of context.TODO(); ensure mgr.Start
continues to receive the same ctx so startup commands abort on signals.
---
Nitpick comments:
In `@lca-cli/cmd/ibi.go`:
- Around line 86-98: The adapter preinstallExecutorAdapter.Execute currently
calls a.exec.Execute(context.Background(), ...) which intentionally drops any
caller cancellation; add a concise inline comment above the Execute method
(referencing preinstallExecutorAdapter, Execute and ops.Execute) explaining that
context.Background() is used because preinstall-utils' Execute signature lacks a
context, that this will not propagate cancellation (e.g.,
cmd.Context()/SIGTERM/timeouts) and therefore cleanupDevice invocations may not
be aborted—make the limitation explicit so future maintainers won't assume
context propagation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: e9132831-b28e-4ebb-83db-cd05c0882981
📒 Files selected for processing (56)
controllers/ibu_controller.gocontrollers/ibu_controller_test.gocontrollers/idle_handlers.gocontrollers/idle_handlers_test.gocontrollers/ipc_config_handlers.gocontrollers/ipc_config_handlers_test.gocontrollers/ipc_controller.gocontrollers/ipc_controller_test.gocontrollers/ipc_idle_handlers.gocontrollers/ipc_idle_handlers_test.gocontrollers/ipc_rollback_handlers.gocontrollers/ipc_rollback_handlers_test.gocontrollers/prep_handlers.gocontrollers/rollback_handlers.gocontrollers/seedgen_controller.gocontrollers/upgrade_handlers.gocontrollers/upgrade_handlers_test.gointernal/imagemgmt/imagemgmt.gointernal/imagemgmt/imagemgmt_test.gointernal/imagemgmt/mock_imagemgmt.gointernal/ostreeclient/mock_ostreeclient.gointernal/ostreeclient/ostreeclient.gointernal/precache/workload/pullImages.gointernal/prep/prep.gointernal/reboot/mock_reboot.gointernal/reboot/reboot.gointernal/reboot/reboot_test.golca-cli/cmd/create.golca-cli/cmd/ibi.golca-cli/cmd/ibuPrecacheWorkload.golca-cli/cmd/ibuStaterootSetup.golca-cli/cmd/ipconfig/postpivot.golca-cli/cmd/ipconfig/prepivot.golca-cli/cmd/ipconfig/rollback.golca-cli/cmd/postpivotcmd.golca-cli/cmd/restore.golca-cli/ibi-preparation/ibipreparation.golca-cli/ibi-preparation/ibipreparation_test.golca-cli/initmonitor/initmonitor.golca-cli/ipconfig/postpivot.golca-cli/ipconfig/postpivot_test.golca-cli/ipconfig/prepivot.golca-cli/ipconfig/prepivot_test.golca-cli/ipconfig/rollback.golca-cli/ipconfig/rollback_test.golca-cli/ops/execute.golca-cli/ops/mock_execute.golca-cli/ops/mock_ops.golca-cli/ops/ops.golca-cli/ostreeclient/mock_rpmostreeclient.golca-cli/ostreeclient/rpmostreeclient.golca-cli/postpivot/postpivot.golca-cli/postpivot/postpivot_test.golca-cli/seedcreator/seedcreator.golca-cli/seedrestoration/seedrestoration.gomain/main.go
✅ Files skipped from review due to trivial changes (1)
- lca-cli/ipconfig/postpivot.go
🚧 Files skipped from review as they are similar to previous changes (3)
- lca-cli/cmd/create.go
- controllers/ibu_controller_test.go
- lca-cli/cmd/ipconfig/postpivot.go
| func (r *SeedGeneratorReconciler) validateSystem(ctx context.Context) (msg string) { | ||
| // Check that the "ostree admin set-default" feature is available | ||
| if !ostreeclient.NewClient(r.Executor, false).IsOstreeAdminSetDefaultFeatureEnabled() { | ||
| if enabled, err := ostreeclient.NewClient(r.Executor, false).IsOstreeAdminSetDefaultFeatureEnabled(ctx); err != nil { | ||
| msg = fmt.Sprintf("Failure occurred during ostree admin set-default feature check: %v", err) | ||
| return | ||
| } else if !enabled { | ||
| msg = "Rejected: Installed release does not support \"ostree admin set-default\" feature" | ||
| return | ||
| } |
There was a problem hiding this comment.
Transient ostree probe errors flip seedgen into terminal Failed state.
When IsOstreeAdminSetDefaultFeatureEnabled(ctx) returns an error (e.g., context cancellation during reconcile, transient ostree admin --help failure), validateSystem sets msg to a non-empty rejection string. The caller in Reconcile (around line 1017) treats any non-empty rejection as a permanent failure via setSeedGenStatusFailed, after which getPhase returns phaseFailed and the user must delete and recreate the CR to retry. Now that ctx cancellations can produce errors here, transient cancellations during reconcile teardown could permanently fail seedgen.
Consider returning the error separately from validateSystem so Reconcile can distinguish "rejected by validation" from "transient probe failure" and requeue the latter instead of marking the CR failed.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@controllers/seedgen_controller.go` around lines 588 - 596, Change
validateSystem to return (string, error) instead of only msg: have
validateSystem call
ostreeclient.NewClient(...).IsOstreeAdminSetDefaultFeatureEnabled(ctx) and on
probe error return ("", err) while preserving the current rejection string only
for definitive false results (return msg, nil). Update the Reconcile caller that
currently treats any non-empty rejection as permanent failure (uses
setSeedGenStatusFailed/getPhase) to check the error return first and requeue
transient probe errors (do not call setSeedGenStatusFailed), only treating a
non-empty msg with nil error as the permanent rejection path.
| output, _ := c.Executor.Execute(ctx, "podman", args...) | ||
| c.Log.Info("Deleted unused images", "output", output) |
There was a problem hiding this comment.
Don't swallow cancellation from podman rmi.
Execute(ctx, ...) now returns context.Canceled / context.DeadlineExceeded too, but this line drops the error unconditionally and keeps cleaning images. That breaks the new cancellation contract and can make callers see a normal cleanup result after they already aborted the operation. Keep ignoring ordinary podman rmi failures, but return when the context is done.
Suggested fix
- output, _ := c.Executor.Execute(ctx, "podman", args...)
- c.Log.Info("Deleted unused images", "output", output)
+ output, err := c.Executor.Execute(ctx, "podman", args...)
+ if err != nil {
+ if ctx.Err() != nil {
+ return ctx.Err()
+ }
+ c.Log.Info("Ignoring podman rmi failure", "error", err, "targets", toRemove)
+ } else {
+ c.Log.Info("Deleted unused images", "output", output)
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| output, _ := c.Executor.Execute(ctx, "podman", args...) | |
| c.Log.Info("Deleted unused images", "output", output) | |
| output, err := c.Executor.Execute(ctx, "podman", args...) | |
| if err != nil { | |
| if ctx.Err() != nil { | |
| return ctx.Err() | |
| } | |
| c.Log.Info("Ignoring podman rmi failure", "error", err, "targets", toRemove) | |
| } else { | |
| c.Log.Info("Deleted unused images", "output", output) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/imagemgmt/imagemgmt.go` around lines 260 - 261, The call to
c.Executor.Execute(ctx, "podman", args...) currently ignores errors and thus
swallows context cancellation; change the line in imagemgmt.go (the podman rmi
invocation where output is captured) to capture the error (output, err :=
c.Executor.Execute(...)), then if errors.Is(err, context.Canceled) ||
errors.Is(err, context.DeadlineExceeded) return that error immediately;
otherwise, keep the old behavior of logging the output and ignoring non-context
errors (log the output with c.Log.Info and continue). Ensure you import or
reference errors and use the same ctx/c.Executor.Execute and podman rmi call
sites (i.e., the same function/statement) so cancellation is propagated.
| Short: "Restore seed cluster configurations", | ||
| Run: func(cmd *cobra.Command, args []string) { | ||
| restore() | ||
| restore(cmd.Context()) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Check if the CLI main entry point uses ExecuteContext with signal handling
rg -n --type=go -C5 "ExecuteContext\|ExecuteWithContext\|signal\.NotifyContext\|rootCmd\.Execute" lca-cli/Repository: openshift-kni/lifecycle-agent
Length of output: 55
🏁 Script executed:
#!/bin/bash
# Find main.go file(s)
find . -name "main.go" -type f | head -20Repository: openshift-kni/lifecycle-agent
Length of output: 685
🏁 Script executed:
#!/bin/bash
# Search for rootCmd initialization and usage across the codebase
rg -n "rootCmd\s*:=|rootCmd\s*=|var rootCmd" --type=go | head -20Repository: openshift-kni/lifecycle-agent
Length of output: 194
🏁 Script executed:
#!/bin/bash
# Check the structure of lca-cli directory
ls -la lca-cli/ 2>/dev/null || echo "lca-cli not found at root level"
fd -t f -e go lca-cli/ | head -30Repository: openshift-kni/lifecycle-agent
Length of output: 1431
🏁 Script executed:
#!/bin/bash
# Check the CLI main entry point
cat -n ./main/lca-cli/main.goRepository: openshift-kni/lifecycle-agent
Length of output: 894
🏁 Script executed:
#!/bin/bash
# Check the root command definition
cat -n lca-cli/cmd/root.goRepository: openshift-kni/lifecycle-agent
Length of output: 3775
🏁 Script executed:
#!/bin/bash
# Verify the context threading in restore.go (mentioned in the original review)
sed -n '20,60p' lca-cli/cmd/restore.goRepository: openshift-kni/lifecycle-agent
Length of output: 1125
Wire signal-aware context at the CLI entry point to make context cancellation effective.
The context threading in restore.go is correct, but it receives context.Background() from the CLI entry point since cmd.Execute() at lca-cli/cmd/root.go:91 does not pass a cancellable context. This means SIGINT/SIGTERM will not propagate to long-running child processes like podman pull or rpm-ostree, undermining the PR's goal of enabling cancellation support.
Update lca-cli/cmd/root.go:91 to use ExecuteContext with signal handling:
Fix for cmd/root.go
+import (
+ "context"
+ "os"
+ "os/signal"
+ "syscall"
+)
- if err := rootCmd.Execute(); err != nil {
+ ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
+ defer stop()
+ if err := rootCmd.ExecuteContext(ctx); err != nil {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lca-cli/cmd/restore.go` at line 33, The CLI entry uses cmd.Execute() so
restore(cmd.Context()) gets a non-cancellable Background context; change the
root command invocation to ExecuteContext using a signal-aware context (e.g.,
create ctx := signal.NotifyContext(context.Background(), os.Interrupt,
syscall.SIGTERM) and defer cancel()) so cmd.Context() inside restore() is
cancelled on SIGINT/SIGTERM; update imports to include os, os/signal and syscall
and call rootCmd.ExecuteContext(ctx) instead of rootCmd.Execute().
| func (e *regularExecutor) ExecuteWithLiveLogger(ctx context.Context, command string, args ...string) (string, error) { | ||
| return e.execute(ctx, e.log.Writer(), "", command, args...) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd lca-cli && git ls-files | grep -E "execute\.go|go\.mod" | head -20Repository: openshift-kni/lifecycle-agent
Length of output: 108
🏁 Script executed:
cat -n lca-cli/ops/execute.goRepository: openshift-kni/lifecycle-agent
Length of output: 5790
🌐 Web query:
logrus Logger Writer() method pipe close documentation
💡 Result:
In logrus, Logger.Writer returns an *io.PipeWriter, created by io.Pipe. The doc/behavior explicitly says callers are responsible for closing the returned writer when done. [1][2] What to call / expect - logger.Writer is a convenience that calls WriterLevel(InfoLevel) and returns the PipeWriter. [1] - Each line written to the PipeWriter is scanned and printed to the logger using the configured formatters/hooks. [2] - You should close it yourself (typical pattern is defer w.Close). [2] Does w.Close “pipe close” the reader and flush synchronously? - Closing the PipeWriter is non-deterministic in the sense that logs are produced by a goroutine scanning the pipe reader; depending on timing (e.g., tests/process exit), some writes may not appear even after Close. This is discussed in logrus issues. [3] Does closing the PipeWriter close some underlying file? - io.PipeWriter has no “underlying file”; it only closes the writer end of the pipe. Any actual file (if any) is on the logger’s Logger.Out side, and logrus can’t close that for you because Logger.Out is just an io.Writer. [4][5] - Therefore, closing w (the PipeWriter you got from Writer) does not automatically close whatever writer the logger is outputting to; you must close that writer separately if it needs closing (e.g., an *os.File you set as logger.Out via SetOutput). [4][5] Practical answer (code) - Use: w:= logger.Writer defer w.Close (then write to w; e.g., hook it to stdlib log.New). [2] - If your logger ultimately writes to a file you opened, close that file separately (closing w only closes the pipe writer end). [4][5] If you tell me your exact logrus version and how you wire stdlib log / http server / file output, I can suggest the safest shutdown/flush ordering for your case.
Citations:
- 1: https://github.com/sirupsen/logrus/blob/master/writer.go
- 2: https://github.com/sirupsen/logrus
- 3: PipeWriter closing is non-deterministic sirupsen/logrus#1111
- 4: https://stackoverflow.com/questions/49871231/does-closing-io-pipewriter-close-the-underlying-file
- 5: https://github.com/sirupsen/logrus/blob/master/logger.go
🏁 Script executed:
# Verify all ExecuteWithLiveLogger implementations are covered
rg "ExecuteWithLiveLogger" lca-cli/ops/execute.goRepository: openshift-kni/lifecycle-agent
Length of output: 521
🏁 Script executed:
# Confirm no other executor types with ExecuteWithLiveLogger exist
rg "type.*Executor.*struct" lca-cli/ops/execute.goRepository: openshift-kni/lifecycle-agent
Length of output: 162
Close the pipe writer returned by logrus.Logger.Writer() to prevent goroutine leaks.
These three methods pass e.log.Writer() directly to execute functions without closing it. The pipe writer is backed by a goroutine that remains alive if not closed, accumulating leaked resources in long-lived controller processes.
♻️ Suggested fix
func (e *regularExecutor) ExecuteWithLiveLogger(ctx context.Context, command string, args ...string) (string, error) {
- return e.execute(ctx, e.log.Writer(), "", command, args...)
+ writer := e.log.Writer()
+ defer writer.Close()
+ return e.execute(ctx, writer, "", command, args...)
}
func (e *nsenterExecutor) ExecuteWithLiveLogger(ctx context.Context, command string, args ...string) (string, error) {
- return e.baseExecute(ctx, e.log.Writer(), command, args...)
+ writer := e.log.Writer()
+ defer writer.Close()
+ return e.baseExecute(ctx, writer, command, args...)
}
func (e *chrootExecutor) ExecuteWithLiveLogger(ctx context.Context, command string, args ...string) (string, error) {
- return e.baseExecute(ctx, e.log.Writer(), command, args...)
+ writer := e.log.Writer()
+ defer writer.Close()
+ return e.baseExecute(ctx, writer, command, args...)
}Applies to lines: 67–68, 79–80, 117–118.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (e *regularExecutor) ExecuteWithLiveLogger(ctx context.Context, command string, args ...string) (string, error) { | |
| return e.execute(ctx, e.log.Writer(), "", command, args...) | |
| func (e *regularExecutor) ExecuteWithLiveLogger(ctx context.Context, command string, args ...string) (string, error) { | |
| writer := e.log.Writer() | |
| defer writer.Close() | |
| return e.execute(ctx, writer, "", command, args...) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lca-cli/ops/execute.go` around lines 67 - 68, The pipe writer returned by
e.log.Writer() is not closed, leaking the goroutine backing it; change the call
sites (e.g., regularExecutor.ExecuteWithLiveLogger and the two other methods
that pass e.log.Writer() into execute) to first obtain the writer into a
variable (w := e.log.Writer()), defer w.Close(), then pass w to e.execute(...)
so the writer is closed after execute returns and no goroutine is leaked.
|
/retest |
3a47e05 to
aaae2ba
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
lca-cli/seedrestoration/seedrestoration.go (1)
119-123:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard
filepath.Walkcallback error before dereferencinginfo.Line 122 can panic when
Walkcalls the callback witherr != nilandinfo == nil.Proposed fix
func (s *SeedRestoration) cleanupServiceUnits(ctx context.Context) error { dir := filepath.Join(common.InstallationConfigurationFilesDir, "services") err := filepath.Walk(dir, func(path string, info os.FileInfo, err error) error { + if err != nil { + return err + } + if info == nil { + return fmt.Errorf("walk returned nil file info for %s", path) + } if info.IsDir() { return nil }As per coding guidelines, “Focus on major issues impacting performance, readability, maintainability and security.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lca-cli/seedrestoration/seedrestoration.go` around lines 119 - 123, The filepath.Walk callback in function cleanupServiceUnits dereferences info without checking the callback error; guard the callback by checking if err != nil at the top of the anonymous func and return that error (or skip) before referencing info to avoid nil dereference/panic. Update the callback used in filepath.Walk to first handle the incoming err (and treat nil info defensively) and then continue with the existing logic that checks info.IsDir() and paths.lca-cli/ibi-preparation/ibipreparation.go (1)
74-75:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
precacheFlowdrops cancellation by usingcontext.Background().Line 133 bypasses the
Run(ctx)context, so precache cannot be cancelled/time-limited by the caller.🔧 Proposed fix
- if err := i.precacheFlow(common.ContainersListFilePath, common.IBISeedInfoFilePath, defaultRegistriesConfFile); err != nil { + if err := i.precacheFlow(ctx, common.ContainersListFilePath, common.IBISeedInfoFilePath, defaultRegistriesConfFile); err != nil { return fmt.Errorf("failed to precache: %w", err) }-func (i *IBIPrepare) precacheFlow(imageListFile, seedInfoFile, registriesConfFile string) error { +func (i *IBIPrepare) precacheFlow(ctx context.Context, imageListFile, seedInfoFile, registriesConfFile string) error {- if err := workload.Precache(context.Background(), imageList, common.IBIPullSecretFilePath, i.config.PrecacheBestEffort); err != nil { + if err := workload.Precache(ctx, imageList, common.IBIPullSecretFilePath, i.config.PrecacheBestEffort); err != nil { return fmt.Errorf("failed to start precache: %w", err) }Also applies to: 89-90, 133-134
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lca-cli/ibi-preparation/ibipreparation.go` around lines 74 - 75, precacheFlow currently ignores caller cancellation by creating its own context (using context.Background()); change precacheFlow to accept a context.Context parameter and propagate that context through all internal operations instead of calling context.Background(), then update callers (the Run(ctx) method and any i.precacheFlow(...) invocations) to pass the incoming ctx so cancellation/timeouts are honored; ensure any downstream calls inside precacheFlow (HTTP/IO/subprocess functions) also accept and use the passed ctx.controllers/upgrade_handlers.go (1)
79-84:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon’t mark upgrade as permanently failed on context cancellation/deadline.
Line 79-Line 84 treats all errors from
IsOrigStaterootBooted(ctx, ...)as terminal failures. With context-aware downstream calls, transientcontext.Canceled/context.DeadlineExceededcan now incorrectly fail the CR.💡 Suggested fix
import ( "context" + "errors" "fmt" "os" "path/filepath" "strings" ) @@ origStaterootBooted, err := r.RebootClient.IsOrigStaterootBooted(ctx, ibu.Spec.SeedImageRef.Version) if err != nil { + if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) { + return requeueWithShortInterval(), err + } utils.SetUpgradeStatusFailed(ibu, err.Error()) return doNotRequeue(), nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@controllers/upgrade_handlers.go` around lines 79 - 84, The current code treats any error from r.RebootClient.IsOrigStaterootBooted(ctx, ...) as a terminal failure and calls utils.SetUpgradeStatusFailed; instead detect context cancellation/timeouts (use errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded)) after the call and in that case do NOT call utils.SetUpgradeStatusFailed — return early so the controller can retry (e.g., return doNotRequeue(), err) or propagate the context error appropriately; only call utils.SetUpgradeStatusFailed for non-context-related errors.internal/reboot/reboot.go (1)
180-191:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRespect context cancellation while waiting for reboot completion.
Both
IBURebootClient.Reboot(lines 180–191) andIPCRebootClient.Reboot(lines 467–478) ignorectx.Done()and can block for a full 60 minutes even after the caller cancels the context. This violates the context contract and prevents graceful cancellation.💡 Suggested fix
c.log.Info(fmt.Sprintf("Wait for %s to be killed via SIGTERM", defaultRebootTimeout.String())) - time.Sleep(defaultRebootTimeout) - - return fmt.Errorf("failed to reboot. This should never happen! Please check the system") + select { + case <-ctx.Done(): + return fmt.Errorf("reboot wait aborted: %w", ctx.Err()) + case <-time.After(defaultRebootTimeout): + return fmt.Errorf("failed to reboot. This should never happen! Please check the system") + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/reboot/reboot.go` around lines 180 - 191, The Reboot methods (IBURebootClient.Reboot and IPCRebootClient.Reboot) block using time.Sleep(defaultRebootTimeout) and ignore ctx cancellation; replace the sleep with a context-aware wait (e.g. create a timer with time.NewTimer(defaultRebootTimeout) and use a select between <-timer.C and <-ctx.Done()), cancel/stop the timer appropriately, and when ctx is done return ctx.Err() (or wrap it) instead of continuing to the final "should never happen" error; keep the existing hostCommandsExecutor.Execute call and logging (use defaultRebootTimeout, IBURebootClient.Reboot and IPCRebootClient.Reboot identifiers to locate the changes).lca-cli/postpivot/postpivot.go (1)
275-291:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPropagate poll cancellation instead of falling through.
Both waits discard
wait.PollUntilContextCancel(...)'s return value. That means a canceled/timed-out context still lets post-pivot continue: the recert path can drop intoRecertFullFlowafter the 10-minute image pull timeout, and the node-IP path can keep configuring networking from incomplete/stale kubelet state. Please return the polling error in both places.Suggested fix
- _ = wait.PollUntilContextCancel(ctxWithTimeout, time.Second, true, func(ctx context.Context) (bool, error) { + if err := wait.PollUntilContextCancel(ctxWithTimeout, time.Second, true, func(ctx context.Context) (bool, error) { p.log.Info("pulling recert image") command := "podman" if seedReconfiguration.Proxy != nil { command = fmt.Sprintf("HTTP_PROXY=%s HTTPS_PROXY=%s NO_PROXY=%s %s", seedReconfiguration.Proxy.HTTPProxy, seedReconfiguration.Proxy.HTTPSProxy, seedReconfiguration.Proxy.NoProxy, command) } if _, err := p.ops.RunBashInHostNamespace(ctx, command, "pull", "--authfile", common.ImageRegistryAuthFile, seedClusterInfo.RecertImagePullSpec); err != nil { p.log.Warnf("failed to pull recert image, will retry, err: %s", err.Error()) return false, nil } return true, nil - }) + }); err != nil { + return fmt.Errorf("failed to pull recert image: %w", err) + } }- _ = wait.PollUntilContextCancel(ctx, time.Second, true, func(ctx context.Context) (done bool, err error) { + if err := wait.PollUntilContextCancel(ctx, time.Second, true, func(ctx context.Context) (done bool, err error) { if _, err := os.Stat(nodePrimaryIPFile); err != nil { return false, nil } return true, nil - }) + }); err != nil { + return fmt.Errorf("failed waiting for nodeip-configuration: %w", err) + } }Also applies to: 750-761
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lca-cli/postpivot/postpivot.go` around lines 275 - 291, The polling calls to wait.PollUntilContextCancel are currently ignored which lets execution continue after timeouts/cancellations; update the blocks around p.ops.RunInHostNamespace/RunBashInHostNamespace so you capture the error returned by wait.PollUntilContextCancel and propagate it (return it) instead of discarding it; specifically, after calling wait.PollUntilContextCancel in the recert image pull loop and the node-IP/kubelet polling loop, check the returned error and return it up the call stack (so callers like RecertFullFlow or the networking configuration path stop on timeout/cancel) rather than falling through.lca-cli/ops/ops.go (1)
252-276:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMake
waitForEtcdactually honor the passed context.This method now takes
ctx, but the loop never checksctx.Done()and the probe request is still made with plainhttp.Get. A canceled reconcile/CLI call can therefore stay blocked here for the full minute, which defeats the main goal of this PR.Suggested fix
func (o *ops) waitForEtcd(ctx context.Context, healthzEndpoint string) error { - timeout := time.After(1 * time.Minute) + ctx, cancel := context.WithTimeout(ctx, time.Minute) + defer cancel() ticker := time.NewTicker(1 * time.Second) defer ticker.Stop() for { select { - case <-timeout: - return fmt.Errorf("timeout waiting for etcd") + case <-ctx.Done(): + return ctx.Err() case <-ticker.C: - resp, err := http.Get(healthzEndpoint) //nolint:gosec + req, err := http.NewRequestWithContext(ctx, http.MethodGet, healthzEndpoint, nil) + if err != nil { + return fmt.Errorf("failed to build etcd health request: %w", err) + } + resp, err := http.DefaultClient.Do(req) //nolint:gosec if err != nil { o.log.Infof("Waiting for etcd: %s", err) continue } - defer func() { _ = resp.Body.Close() }() + _ = resp.Body.Close() if resp.StatusCode != http.StatusOK { o.log.Infof("Waiting for etcd, status: %d", resp.StatusCode) continue }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lca-cli/ops/ops.go` around lines 252 - 276, waitForEtcd currently ignores ctx and uses http.Get; update it to respect cancellation by adding a select case for <-ctx.Done() that returns ctx.Err(), and replace http.Get(healthzEndpoint) with an HTTP request built with http.NewRequestWithContext(ctx, "GET", healthzEndpoint, nil) and executed via http.DefaultClient.Do(req) so the request is cancelable; also stop deferring resp.Body.Close() inside the loop—call resp.Body.Close() immediately after you're done with the response (before continue) to avoid leaking closers across iterations.
♻️ Duplicate comments (1)
internal/imagemgmt/imagemgmt.go (1)
259-264:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPropagate cancellation from
podman rmifailures.Line 259-263 still logs-and-continues for all errors, including canceled/deadline-expired contexts. That breaks the new cancellation contract and can keep deleting images after abort.
Suggested fix
- output, err := c.Executor.Execute(ctx, "podman", args...) - if err != nil { - c.Log.Info("Some images could not be removed (may be in use)", "error", err) - } - c.Log.Info("Deleted unused images", "output", output) + output, err := c.Executor.Execute(ctx, "podman", args...) + if err != nil { + if ctx.Err() != nil { + return ctx.Err() + } + c.Log.Info("Some images could not be removed (may be in use)", "error", err) + } else { + c.Log.Info("Deleted unused images", "output", output) + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/imagemgmt/imagemgmt.go` around lines 259 - 264, The current handling of errors from c.Executor.Execute in the image deletion loop logs-and-continues even when the context was canceled/expired; change the error handling so that after Execute returns an error you detect cancellation (check ctx.Err()!=nil and/or errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded)) and immediately return that error to propagate cancellation instead of logging and continuing; for non-cancellation errors keep the existing info log ("Some images could not be removed (may be in use)") and continue, and ensure you reference the Execute call (c.Executor.Execute) and logging calls (c.Log.Info) when making this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@controllers/idle_handlers.go`:
- Around line 257-267: The cleanup paths treat context cancellation as routine
failures; change both the cleanupUnbootedStateroot call and the
rpmOstreeClient.RpmOstreeCleanup call to detect context cancellation (use
errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) or
errors.Is with ctx.Err()) and immediately return that error (or propagate/abort)
instead of logging and incrementing failures; update the logic around
cleanupUnbootedStateroot, rpmOstreeClient.RpmOstreeCleanup, ctx, log and
failures so context-cancellation/timeout results short-circuit the function
without mutating failures or continuing other cleanup steps.
In `@lca-cli/cmd/ibi.go`:
- Around line 73-74: preinstallExecutorAdapter currently forces
context.Background() inside its execution methods, causing commands invoked via
runIBI(cmd.Context()) to ignore cancellation/timeouts; update
preinstallExecutorAdapter so it uses the caller-supplied context instead of
context.Background() (either by adding a ctx parameter to its Exec/Run/Execute
methods or by storing a context field set from cmd.Context() when constructing
the adapter) and propagate cmd.Context() from runIBI into
preinstallUtils.NewCleanupDevice / preinstallUtils.NewDiskOps (or any call sites
that create the adapter) so all commands respect CLI cancellation and timeouts.
In `@lca-cli/ops/ops.go`:
- Around line 691-694: The current check for o.SystemctlAction(ctx, "is-active",
containerStorageMountUnit) treats any error as "no active mount" and returns
nil; change it to only swallow the specific error that indicates the unit is
inactive (the expected systemctl exit/status for inactive) and propagate all
other errors (including context cancellation/timeouts) so callers don't proceed
on bogus state. Update both occurrences (the call using o.SystemctlAction with
"is-active" and containerStorageMountUnit around lines referenced) to inspect
the error from SystemctlAction and return it unless it matches the known
inactive result; keep the original behavior for the true inactive case.
---
Outside diff comments:
In `@controllers/upgrade_handlers.go`:
- Around line 79-84: The current code treats any error from
r.RebootClient.IsOrigStaterootBooted(ctx, ...) as a terminal failure and calls
utils.SetUpgradeStatusFailed; instead detect context cancellation/timeouts (use
errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded))
after the call and in that case do NOT call utils.SetUpgradeStatusFailed —
return early so the controller can retry (e.g., return doNotRequeue(), err) or
propagate the context error appropriately; only call
utils.SetUpgradeStatusFailed for non-context-related errors.
In `@internal/reboot/reboot.go`:
- Around line 180-191: The Reboot methods (IBURebootClient.Reboot and
IPCRebootClient.Reboot) block using time.Sleep(defaultRebootTimeout) and ignore
ctx cancellation; replace the sleep with a context-aware wait (e.g. create a
timer with time.NewTimer(defaultRebootTimeout) and use a select between
<-timer.C and <-ctx.Done()), cancel/stop the timer appropriately, and when ctx
is done return ctx.Err() (or wrap it) instead of continuing to the final "should
never happen" error; keep the existing hostCommandsExecutor.Execute call and
logging (use defaultRebootTimeout, IBURebootClient.Reboot and
IPCRebootClient.Reboot identifiers to locate the changes).
In `@lca-cli/ibi-preparation/ibipreparation.go`:
- Around line 74-75: precacheFlow currently ignores caller cancellation by
creating its own context (using context.Background()); change precacheFlow to
accept a context.Context parameter and propagate that context through all
internal operations instead of calling context.Background(), then update callers
(the Run(ctx) method and any i.precacheFlow(...) invocations) to pass the
incoming ctx so cancellation/timeouts are honored; ensure any downstream calls
inside precacheFlow (HTTP/IO/subprocess functions) also accept and use the
passed ctx.
In `@lca-cli/ops/ops.go`:
- Around line 252-276: waitForEtcd currently ignores ctx and uses http.Get;
update it to respect cancellation by adding a select case for <-ctx.Done() that
returns ctx.Err(), and replace http.Get(healthzEndpoint) with an HTTP request
built with http.NewRequestWithContext(ctx, "GET", healthzEndpoint, nil) and
executed via http.DefaultClient.Do(req) so the request is cancelable; also stop
deferring resp.Body.Close() inside the loop—call resp.Body.Close() immediately
after you're done with the response (before continue) to avoid leaking closers
across iterations.
In `@lca-cli/postpivot/postpivot.go`:
- Around line 275-291: The polling calls to wait.PollUntilContextCancel are
currently ignored which lets execution continue after timeouts/cancellations;
update the blocks around p.ops.RunInHostNamespace/RunBashInHostNamespace so you
capture the error returned by wait.PollUntilContextCancel and propagate it
(return it) instead of discarding it; specifically, after calling
wait.PollUntilContextCancel in the recert image pull loop and the
node-IP/kubelet polling loop, check the returned error and return it up the call
stack (so callers like RecertFullFlow or the networking configuration path stop
on timeout/cancel) rather than falling through.
In `@lca-cli/seedrestoration/seedrestoration.go`:
- Around line 119-123: The filepath.Walk callback in function
cleanupServiceUnits dereferences info without checking the callback error; guard
the callback by checking if err != nil at the top of the anonymous func and
return that error (or skip) before referencing info to avoid nil
dereference/panic. Update the callback used in filepath.Walk to first handle the
incoming err (and treat nil info defensively) and then continue with the
existing logic that checks info.IsDir() and paths.
---
Duplicate comments:
In `@internal/imagemgmt/imagemgmt.go`:
- Around line 259-264: The current handling of errors from c.Executor.Execute in
the image deletion loop logs-and-continues even when the context was
canceled/expired; change the error handling so that after Execute returns an
error you detect cancellation (check ctx.Err()!=nil and/or errors.Is(err,
context.Canceled) || errors.Is(err, context.DeadlineExceeded)) and immediately
return that error to propagate cancellation instead of logging and continuing;
for non-cancellation errors keep the existing info log ("Some images could not
be removed (may be in use)") and continue, and ensure you reference the Execute
call (c.Executor.Execute) and logging calls (c.Log.Info) when making this
change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: c26b3938-951c-4361-82b3-7d4c04de035e
📒 Files selected for processing (56)
controllers/ibu_controller.gocontrollers/ibu_controller_test.gocontrollers/idle_handlers.gocontrollers/idle_handlers_test.gocontrollers/ipc_config_handlers.gocontrollers/ipc_config_handlers_test.gocontrollers/ipc_controller.gocontrollers/ipc_controller_test.gocontrollers/ipc_idle_handlers.gocontrollers/ipc_idle_handlers_test.gocontrollers/ipc_rollback_handlers.gocontrollers/ipc_rollback_handlers_test.gocontrollers/prep_handlers.gocontrollers/rollback_handlers.gocontrollers/seedgen_controller.gocontrollers/upgrade_handlers.gocontrollers/upgrade_handlers_test.gointernal/imagemgmt/imagemgmt.gointernal/imagemgmt/imagemgmt_test.gointernal/imagemgmt/mock_imagemgmt.gointernal/ostreeclient/mock_ostreeclient.gointernal/ostreeclient/ostreeclient.gointernal/precache/workload/pullImages.gointernal/prep/prep.gointernal/reboot/mock_reboot.gointernal/reboot/reboot.gointernal/reboot/reboot_test.golca-cli/cmd/create.golca-cli/cmd/ibi.golca-cli/cmd/ibuPrecacheWorkload.golca-cli/cmd/ibuStaterootSetup.golca-cli/cmd/ipconfig/postpivot.golca-cli/cmd/ipconfig/prepivot.golca-cli/cmd/ipconfig/rollback.golca-cli/cmd/postpivotcmd.golca-cli/cmd/restore.golca-cli/ibi-preparation/ibipreparation.golca-cli/ibi-preparation/ibipreparation_test.golca-cli/initmonitor/initmonitor.golca-cli/ipconfig/postpivot.golca-cli/ipconfig/postpivot_test.golca-cli/ipconfig/prepivot.golca-cli/ipconfig/prepivot_test.golca-cli/ipconfig/rollback.golca-cli/ipconfig/rollback_test.golca-cli/ops/execute.golca-cli/ops/mock_execute.golca-cli/ops/mock_ops.golca-cli/ops/ops.golca-cli/ostreeclient/mock_rpmostreeclient.golca-cli/ostreeclient/rpmostreeclient.golca-cli/postpivot/postpivot.golca-cli/postpivot/postpivot_test.golca-cli/seedcreator/seedcreator.golca-cli/seedrestoration/seedrestoration.gomain/main.go
✅ Files skipped from review due to trivial changes (6)
- lca-cli/ops/mock_execute.go
- internal/imagemgmt/mock_imagemgmt.go
- internal/reboot/mock_reboot.go
- lca-cli/ostreeclient/mock_rpmostreeclient.go
- internal/ostreeclient/mock_ostreeclient.go
- lca-cli/ops/mock_ops.go
| if err := cleanupUnbootedStateroot(ctx, stateroot, ops, ostreeClient, rpmOstreeClient); err != nil { | ||
| log.Error(err, "failed to remove stateroot", "stateroot", stateroot) | ||
| failures += 1 | ||
| } | ||
| } | ||
|
|
||
| // clear temporary files and reclaim disk space | ||
| if err := rpmOstreeClient.RpmOstreeCleanup(); err != nil { | ||
| if err := rpmOstreeClient.RpmOstreeCleanup(ctx); err != nil { | ||
| log.Error(err, "failed rpm-ostree cleanup -b") | ||
| failures += 1 | ||
| } |
There was a problem hiding this comment.
Return immediately on context cancellation during cleanup.
Line 257 and Line 264 currently treat context.Canceled/context.DeadlineExceeded as normal cleanup failures and continue mutating state. That breaks cancellation semantics and can prolong shutdown/reconcile aborts.
Proposed fix
import (
"context"
+ "errors"
"fmt"
"os"
@@
for _, stateroot := range staterootsToRemove {
@@
if err := cleanupUnbootedStateroot(ctx, stateroot, ops, ostreeClient, rpmOstreeClient); err != nil {
+ if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
+ return err
+ }
log.Error(err, "failed to remove stateroot", "stateroot", stateroot)
failures += 1
}
}
@@
if err := rpmOstreeClient.RpmOstreeCleanup(ctx); err != nil {
+ if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
+ return err
+ }
log.Error(err, "failed rpm-ostree cleanup -b")
failures += 1
}As per coding guidelines, “Focus on major issues impacting performance, readability, maintainability and security.”
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if err := cleanupUnbootedStateroot(ctx, stateroot, ops, ostreeClient, rpmOstreeClient); err != nil { | |
| log.Error(err, "failed to remove stateroot", "stateroot", stateroot) | |
| failures += 1 | |
| } | |
| } | |
| // clear temporary files and reclaim disk space | |
| if err := rpmOstreeClient.RpmOstreeCleanup(); err != nil { | |
| if err := rpmOstreeClient.RpmOstreeCleanup(ctx); err != nil { | |
| log.Error(err, "failed rpm-ostree cleanup -b") | |
| failures += 1 | |
| } | |
| if err := cleanupUnbootedStateroot(ctx, stateroot, ops, ostreeClient, rpmOstreeClient); err != nil { | |
| if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) { | |
| return err | |
| } | |
| log.Error(err, "failed to remove stateroot", "stateroot", stateroot) | |
| failures += 1 | |
| } | |
| } | |
| // clear temporary files and reclaim disk space | |
| if err := rpmOstreeClient.RpmOstreeCleanup(ctx); err != nil { | |
| if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) { | |
| return err | |
| } | |
| log.Error(err, "failed rpm-ostree cleanup -b") | |
| failures += 1 | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@controllers/idle_handlers.go` around lines 257 - 267, The cleanup paths treat
context cancellation as routine failures; change both the
cleanupUnbootedStateroot call and the rpmOstreeClient.RpmOstreeCleanup call to
detect context cancellation (use errors.Is(err, context.Canceled) ||
errors.Is(err, context.DeadlineExceeded) or errors.Is with ctx.Err()) and
immediately return that error (or propagate/abort) instead of logging and
incrementing failures; update the logic around cleanupUnbootedStateroot,
rpmOstreeClient.RpmOstreeCleanup, ctx, log and failures so
context-cancellation/timeout results short-circuit the function without mutating
failures or continuing other cleanup steps.
| cleanupDevice := preinstallUtils.NewCleanupDevice(log, preinstallUtils.NewDiskOps(log, &preinstallExecutorAdapter{exec: hostCommandsExecutor})) | ||
| rpmOstreeClient := ostree.NewClient("lca-cli", hostCommandsExecutor) |
There was a problem hiding this comment.
Preserve CLI cancellation in preinstallExecutorAdapter instead of forcing context.Background().
Line 93 always uses context.Background(), so commands invoked through preinstall-utils ignore cancellation/timeouts from runIBI(cmd.Context()).
🔧 Proposed fix
type preinstallExecutorAdapter struct {
+ ctx context.Context
exec ops.Execute
}
func (a *preinstallExecutorAdapter) Execute(command string, args ...string) (string, error) {
- out, err := a.exec.Execute(context.Background(), command, args...)
+ execCtx := a.ctx
+ if execCtx == nil {
+ execCtx = context.Background()
+ }
+ out, err := a.exec.Execute(execCtx, command, args...)
if err != nil {
return out, fmt.Errorf("execute %s: %w", command, err)
}
return out, nil
}-cleanupDevice := preinstallUtils.NewCleanupDevice(log, preinstallUtils.NewDiskOps(log, &preinstallExecutorAdapter{exec: hostCommandsExecutor}))
+cleanupDevice := preinstallUtils.NewCleanupDevice(log, preinstallUtils.NewDiskOps(log, &preinstallExecutorAdapter{
+ ctx: ctx,
+ exec: hostCommandsExecutor,
+}))Also applies to: 88-96
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lca-cli/cmd/ibi.go` around lines 73 - 74, preinstallExecutorAdapter currently
forces context.Background() inside its execution methods, causing commands
invoked via runIBI(cmd.Context()) to ignore cancellation/timeouts; update
preinstallExecutorAdapter so it uses the caller-supplied context instead of
context.Background() (either by adding a ctx parameter to its Exec/Run/Execute
methods or by storing a context field set from cmd.Context() when constructing
the adapter) and propagate cmd.Context() from runIBI into
preinstallUtils.NewCleanupDevice / preinstallUtils.NewDiskOps (or any call sites
that create the adapter) so all commands respect CLI cancellation and timeouts.
| if _, err := o.SystemctlAction(ctx, "is-active", containerStorageMountUnit); err != nil { | ||
| // No active mount, return nil | ||
| return "", nil | ||
| } |
There was a problem hiding this comment.
Don't collapse canceled systemctl is-active calls into the inactive path.
These branches now receive context-driven failures too, but they still treat broad classes of errors as "unit is inactive/already stopped". If ctx times out or the command fails for another reason, callers can proceed with the wrong storage/service state instead of aborting. Only special-case the expected inactive exit code and propagate everything else.
Suggested fix
if _, err := o.SystemctlAction(ctx, "is-active", containerStorageMountUnit); err != nil {
- // No active mount, return nil
- return "", nil
+ var exitErr *exec.ExitError
+ if errors.As(err, &exitErr) && exitErr.ExitCode() == 3 {
+ return "", nil
+ }
+ return "", fmt.Errorf("unable to determine whether %s is active: %w", containerStorageMountUnit, err)
} crioSystemdStatus, err := o.SystemctlAction(ctx, "is-active", "crio")
var exitErr *exec.ExitError
// If ExitCode is 3, the command succeeded and told us that crio is down
- if err != nil && errors.As(err, &exitErr) && exitErr.ExitCode() != 3 {
- return fmt.Errorf("failed to checking crio status: %w", err)
+ if err != nil {
+ if !(errors.As(err, &exitErr) && exitErr.ExitCode() == 3) {
+ return fmt.Errorf("failed checking crio status: %w", err)
+ }
}Also applies to: 728-733
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lca-cli/ops/ops.go` around lines 691 - 694, The current check for
o.SystemctlAction(ctx, "is-active", containerStorageMountUnit) treats any error
as "no active mount" and returns nil; change it to only swallow the specific
error that indicates the unit is inactive (the expected systemctl exit/status
for inactive) and propagate all other errors (including context
cancellation/timeouts) so callers don't proceed on bogus state. Update both
occurrences (the call using o.SystemctlAction with "is-active" and
containerStorageMountUnit around lines referenced) to inspect the error from
SystemctlAction and return it unless it matches the known inactive result; keep
the original behavior for the true inactive case.
aaae2ba to
70fcc6e
Compare
Replace exec.Command with exec.CommandContext so external commands respect context cancellation and timeouts. Propagate ctx through the Ops interface, downstream consumers, controllers, and CLI commands. - Add ctx as first parameter to Execute, ExecuteWithLiveLogger, and all executor implementations - Thread ctx through ~25 Ops methods and all callers - Use cmd.Context() in ipconfig CLI commands instead of context.Background() - Add timeouts for startup and signal-handler probes - Use context.WithoutCancel for deferred cleanup operations - Handle errors from IsOstreeAdminSetDefaultFeatureEnabled - Log podman rmi errors instead of silently discarding - Regenerate all mocks
70fcc6e to
e88e675
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
controllers/ipc_controller.go (1)
327-338:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPropagate cancellation instead of collapsing all errors to
false.When
GetUnbootedStaterootName(ctx)fails due to context cancellation or timeout, line 333 returnsfalse, nil, converting the control signal into a state decision. Downstream logic invalidNextStagesmay incorrectly allow config re-entry or mark rollback validation as failed when nothing was actually checked. Only the genuine "not available" case (empty name) should map tofalse.🔧 Suggested fix
func isUnbootedStaterootAvailable(ctx context.Context, rpmOstreeClient rpmostreeclient.IClient) (*bool, error) { if rpmOstreeClient == nil { return nil, fmt.Errorf("rpmOstreeClient is nil") } unbootedStaterootName, err := rpmOstreeClient.GetUnbootedStaterootName(ctx) - if err != nil || unbootedStaterootName == "" { + if err != nil { + return nil, err + } + if unbootedStaterootName == "" { return lo.ToPtr(false), nil } return lo.ToPtr(true), nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@controllers/ipc_controller.go` around lines 327 - 338, The function isUnbootedStaterootAvailable currently swallows all errors from rpmOstreeClient.GetUnbootedStaterootName and returns (false, nil), which converts cancellations/timeouts into a state; change it to propagate errors instead: call GetUnbootedStaterootName(ctx) and if err != nil return nil, err (so context.Canceled/DeadlineExceeded propagate), only treat an empty unbootedStaterootName as lo.ToPtr(false), nil and otherwise return lo.ToPtr(true), nil; update references to isUnbootedStaterootAvailable (used by validNextStages) to handle the returned error accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@controllers/ipc_controller.go`:
- Around line 327-338: The function isUnbootedStaterootAvailable currently
swallows all errors from rpmOstreeClient.GetUnbootedStaterootName and returns
(false, nil), which converts cancellations/timeouts into a state; change it to
propagate errors instead: call GetUnbootedStaterootName(ctx) and if err != nil
return nil, err (so context.Canceled/DeadlineExceeded propagate), only treat an
empty unbootedStaterootName as lo.ToPtr(false), nil and otherwise return
lo.ToPtr(true), nil; update references to isUnbootedStaterootAvailable (used by
validNextStages) to handle the returned error accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Enterprise
Run ID: d54da558-e48d-422b-9c64-74552246cba8
📒 Files selected for processing (57)
.konflux/overlay/release.in.yamlcontrollers/ibu_controller.gocontrollers/ibu_controller_test.gocontrollers/idle_handlers.gocontrollers/idle_handlers_test.gocontrollers/ipc_config_handlers.gocontrollers/ipc_config_handlers_test.gocontrollers/ipc_controller.gocontrollers/ipc_controller_test.gocontrollers/ipc_idle_handlers.gocontrollers/ipc_idle_handlers_test.gocontrollers/ipc_rollback_handlers.gocontrollers/ipc_rollback_handlers_test.gocontrollers/prep_handlers.gocontrollers/rollback_handlers.gocontrollers/seedgen_controller.gocontrollers/upgrade_handlers.gocontrollers/upgrade_handlers_test.gointernal/imagemgmt/imagemgmt.gointernal/imagemgmt/imagemgmt_test.gointernal/imagemgmt/mock_imagemgmt.gointernal/ostreeclient/mock_ostreeclient.gointernal/ostreeclient/ostreeclient.gointernal/precache/workload/pullImages.gointernal/prep/prep.gointernal/reboot/mock_reboot.gointernal/reboot/reboot.gointernal/reboot/reboot_test.golca-cli/cmd/create.golca-cli/cmd/ibi.golca-cli/cmd/ibuPrecacheWorkload.golca-cli/cmd/ibuStaterootSetup.golca-cli/cmd/ipconfig/postpivot.golca-cli/cmd/ipconfig/prepivot.golca-cli/cmd/ipconfig/rollback.golca-cli/cmd/postpivotcmd.golca-cli/cmd/restore.golca-cli/ibi-preparation/ibipreparation.golca-cli/ibi-preparation/ibipreparation_test.golca-cli/initmonitor/initmonitor.golca-cli/ipconfig/postpivot.golca-cli/ipconfig/postpivot_test.golca-cli/ipconfig/prepivot.golca-cli/ipconfig/prepivot_test.golca-cli/ipconfig/rollback.golca-cli/ipconfig/rollback_test.golca-cli/ops/execute.golca-cli/ops/mock_execute.golca-cli/ops/mock_ops.golca-cli/ops/ops.golca-cli/ostreeclient/mock_rpmostreeclient.golca-cli/ostreeclient/rpmostreeclient.golca-cli/postpivot/postpivot.golca-cli/postpivot/postpivot_test.golca-cli/seedcreator/seedcreator.golca-cli/seedrestoration/seedrestoration.gomain/main.go
✅ Files skipped from review due to trivial changes (5)
- .konflux/overlay/release.in.yaml
- internal/imagemgmt/mock_imagemgmt.go
- internal/ostreeclient/mock_ostreeclient.go
- lca-cli/ostreeclient/mock_rpmostreeclient.go
- lca-cli/ops/mock_execute.go
🚧 Files skipped from review as they are similar to previous changes (44)
- controllers/ibu_controller.go
- lca-cli/cmd/postpivotcmd.go
- lca-cli/cmd/create.go
- lca-cli/cmd/restore.go
- controllers/ibu_controller_test.go
- main/main.go
- lca-cli/cmd/ipconfig/rollback.go
- lca-cli/ipconfig/rollback_test.go
- controllers/ipc_idle_handlers.go
- lca-cli/cmd/ibuStaterootSetup.go
- internal/reboot/reboot_test.go
- lca-cli/cmd/ibi.go
- lca-cli/postpivot/postpivot_test.go
- lca-cli/cmd/ipconfig/postpivot.go
- controllers/prep_handlers.go
- controllers/ipc_idle_handlers_test.go
- lca-cli/seedrestoration/seedrestoration.go
- lca-cli/ops/execute.go
- internal/imagemgmt/imagemgmt.go
- internal/precache/workload/pullImages.go
- lca-cli/ibi-preparation/ibipreparation.go
- controllers/ipc_rollback_handlers_test.go
- lca-cli/ipconfig/postpivot.go
- lca-cli/initmonitor/initmonitor.go
- internal/reboot/mock_reboot.go
- controllers/idle_handlers_test.go
- lca-cli/ipconfig/prepivot_test.go
- lca-cli/ostreeclient/rpmostreeclient.go
- internal/imagemgmt/imagemgmt_test.go
- controllers/upgrade_handlers.go
- controllers/idle_handlers.go
- lca-cli/ipconfig/prepivot.go
- controllers/rollback_handlers.go
- controllers/ipc_controller_test.go
- lca-cli/ipconfig/postpivot_test.go
- lca-cli/ibi-preparation/ibipreparation_test.go
- controllers/seedgen_controller.go
- lca-cli/seedcreator/seedcreator.go
- internal/prep/prep.go
- lca-cli/postpivot/postpivot.go
- lca-cli/ops/mock_ops.go
- lca-cli/cmd/ipconfig/prepivot.go
- internal/reboot/reboot.go
- lca-cli/ops/ops.go
|
@sebrandon1: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
exec.Commandwithexec.CommandContextso external commands can be aborted via context cancellationctx context.Contextas first parameter to theExecuteinterface (Execute,ExecuteWithLiveLogger) and all 3 implementations (regularExecutor,nsenterExecutor,chrootExecutor)ctxthrough theOpsinterface (~25 command-executing methods), downstream consumers (ostreeclient,rpmostreeclient,reboot,imagemgmt,precache/workload), controllers (using existing reconcilectx), and CLI commands (usingcontext.Background())context.TODO()with proper context usage throughoutThis enables proper timeout and cancellation support for long-running operations like
podman pull,rpm-ostree, andrecert.Closes #200
Test plan
make build— operator binary compilesmake cli-build— CLI binary compilesmake golangci-lint— all linters passmake unittest— all unit tests pass (0 failures)make generate— mocks regenerated cleanly