Skip to content

CNF-23402: Add context.Context to Execute and Ops interfaces#6187

Open
sebrandon1 wants to merge 1 commit into
openshift-kni:mainfrom
sebrandon1:add-context-to-execute-interface
Open

CNF-23402: Add context.Context to Execute and Ops interfaces#6187
sebrandon1 wants to merge 1 commit into
openshift-kni:mainfrom
sebrandon1:add-context-to-execute-interface

Conversation

@sebrandon1
Copy link
Copy Markdown
Contributor

Summary

  • Replace exec.Command with exec.CommandContext so external commands can be aborted via context cancellation
  • Add ctx context.Context as first parameter to the Execute interface (Execute, ExecuteWithLiveLogger) and all 3 implementations (regularExecutor, nsenterExecutor, chrootExecutor)
  • Propagate ctx through the Ops interface (~25 command-executing methods), downstream consumers (ostreeclient, rpmostreeclient, reboot, imagemgmt, precache/workload), controllers (using existing reconcile ctx), and CLI commands (using context.Background())
  • Replace context.TODO() with proper context usage throughout
  • Regenerate all mocks

This enables proper timeout and cancellation support for long-running operations like podman pull, rpm-ostree, and recert.

Closes #200

Test plan

  • make build — operator binary compiles
  • make cli-build — CLI binary compiles
  • make golangci-lint — all linters pass
  • make unittest — all unit tests pass (0 failures)
  • make generate — mocks regenerated cleanly

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 27, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 27, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign omertuc for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot requested review from eranco74 and leo8a April 27, 2026 17:34
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Propagates context.Context broadly across the codebase: interfaces, helpers, executors, controllers, CLI commands, and mocks. Call sites updated to pass ctx into command execution, ostree/rpm-ostree, reboot, and cleanup flows; tests and generated mocks adapted to new context-first signatures.

Changes

Context propagation DAG

Layer / File(s) Summary
Interfaces & mocks
internal/*/mock_*.go, lca-cli/ops/mock_ops.go, lca-cli/ostreeclient/mock_rpmostreeclient.go, internal/ostreeclient/mock_ostreeclient.go, internal/reboot/mock_reboot.go, internal/imagemgmt/mock_imagemgmt.go, lca-cli/ops/mock_execute.go
Many interfaces and generated mock methods now accept ctx context.Context as the first parameter; gomock recorder methods updated.
Executor (low-level) & ops mocks
lca-cli/ops/execute.go, lca-cli/ops/mock_execute.go
Executors use exec.CommandContext, accept ctx; nsenter/chroot wrappers forward ctx; mocks updated accordingly.
Ops implementation
lca-cli/ops/ops.go, related helpers
Host-exec helpers, SystemctlAction, RunInHostNamespace, image mount/unmount, recert, and cluster service flows now accept and forward ctx.
Ostree / rpm-ostree clients
lca-cli/ostreeclient/*, internal/ostreeclient/*
rpmostree and internal ostree client APIs accept ctx; IsOstreeAdminSetDefaultFeatureEnabled now returns (bool, error) and probes with ctx.
Image management & precache
internal/imagemgmt/*, internal/precache/workload/*
Image management functions and precache helpers accept ctx and run podman/crictl calls with ctx; removal errors are captured and logged.
Controllers & handlers
controllers/*
Reconciler flows and stage handlers thread ctx into validNextStages, stateroot checks, remounts, systemd actions, cleanup, and seedgen steps.
Reboot & rollback flows
internal/reboot/*, controllers/*_handlers.go
RebootIntf and implementations accept ctx; rollback/init-monitor flows use ctx for ostree/rpm and ops calls and to trigger reboots.
CLI entrypoints & wiring
lca-cli/cmd/*, lca-cli/ibi-preparation/*, main/main.go
Cobra commands use cmd.Context() and pass ctx into handlers; some adapters added where needed.
Tests & gomock expectations
**/*_test.go, **/mock_*.go
Unit tests updated to supply context (often context.Background() or gomock.Any()); DoAndReturn callbacks accept ctx where used.

Sequence diagram

sequenceDiagram
  participant Controller
  participant OpsExecutor
  participant RPMOstree
  participant Reboot
  Controller->>OpsExecutor: RunInHostNamespace(ctx, nmstatectl/podman/bash)
  OpsExecutor->>OpsExecutor: exec.CommandContext(ctx, ...)
  Controller->>RPMOstree: IsStaterootBooted(ctx, name)
  RPMOstree-->>Controller: bool,error
  Controller->>Reboot: InitiateRollback(ctx,msg) / RebootToNewStateRoot(ctx,rationale)
  Reboot->>RPMOstree: GetUnbootedStaterootName(ctx) / SetDefaultDeployment(ctx, idx)
  Reboot->>OpsExecutor: RunSystemdAction(ctx, ...)
  OpsExecutor-->>Reboot: result
  Reboot-->>Controller: rollback/reboot result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (9)
CLAUDE.md (1)

1-118: ⚠️ Potential issue | 🔴 Critical

Correct the Operator SDK version: v1.40.0, not v2.

The line claiming "Operator SDK v2" is incorrect. The Makefile specifies OPERATOR_SDK_VERSION ?= 1.40.0, which is v1.x. Operator SDK does not have a v2 major release (only v1.x releases exist).

The Go 1.24 claim is correct (go.mod specifies 1.24.6). Kubebuilder v4 is consistent with the use of controller-gen v0.19.0 and controller-runtime v0.22.5.

Update the overview to read: "Built with Kubebuilder v4, Operator SDK v1.40.0 in Go 1.24" or similar.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CLAUDE.md` around lines 1 - 118, Replace the incorrect "Operator SDK v2"
claim in the project overview with the actual version from the Makefile
(OPERATOR_SDK_VERSION ?= 1.40.0); specifically update the sentence that
currently reads "Built with Kubebuilder v4 and Operator SDK v2 in Go 1.24" to
something like "Built with Kubebuilder v4, Operator SDK v1.40.0 in Go 1.24" so
the CLAUDE.md text matches the Makefile's OPERATOR_SDK_VERSION.
lca-cli/cmd/ipconfig/postpivot.go (1)

78-90: ⚠️ Potential issue | 🟠 Major

Thread cmd.Context() through to preserve cancellation propagation from the Cobra command lifecycle.

At Line 78, context.Background() prevents SIGTERM/SIGINT cancellation from reaching the operation. Modify runIPConfigPostPivot() to accept a context.Context parameter, then pass cmd.Context() from the Run callback (Line 62) and reuse that context for both postPivotHandler.Run(ctx) and AutoRollbackIfEnabled(ctx).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lca-cli/cmd/ipconfig/postpivot.go` around lines 78 - 90, Change
runIPConfigPostPivot to accept a context.Context parameter and use that context
instead of context.Background(); specifically, update the Run callback to call
runIPConfigPostPivot(cmd.Context(), ...) and inside runIPConfigPostPivot replace
the local ctx := context.Background() with the passed ctx and use that ctx when
invoking postPivotHandler.Run(ctx) and rbClient.AutoRollbackIfEnabled(ctx,
reboot.IPConfigRunComponent, ...). Ensure all occurrences of
postPivotHandler.Run and AutoRollbackIfEnabled referenced in
runIPConfigPostPivot use the passed context so cancellation from the Cobra
command (cmd.Context()) propagates.
controllers/upgrade_handlers.go (1)

344-356: ⚠️ Potential issue | 🟠 Major

Use context.Background() for InitiateRollback to ensure rollback completes independently of reconciliation context cancellation.

Auto-rollback is a critical recovery mechanism that should not be aborted by a canceled reconciliation context. Currently, line 355 passes ctx to InitiateRollback, which could fail silently if the context has been canceled during error handling. This pattern is already established in lca-cli/initmonitor/initmonitor.go, which correctly uses context.Background() for rollback operations.

Proposed fix
-	if err := u.RebootClient.InitiateRollback(ctx, msg); err != nil {
+	rollbackCtx := context.Background()
+	if err := u.RebootClient.InitiateRollback(rollbackCtx, msg); err != nil {
 		u.Log.Info(fmt.Sprintf("Unable to auto rollback: %s", err))
 		return
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controllers/upgrade_handlers.go` around lines 344 - 356, In
autoRollbackIfEnabled, the call to u.RebootClient.InitiateRollback(ctx, msg)
should use a root context so rollback isn't canceled with the reconciliation
context; replace the passed-in ctx with context.Background() (i.e., call
u.RebootClient.InitiateRollback(context.Background(), msg)) inside
UpgHandler.autoRollbackIfEnabled and ensure the function references
InitiateRollback and UpgHandler correctly.
lca-cli/ipconfig/prepivot.go (1)

185-193: ⚠️ Potential issue | 🟠 Major

Use a non-cancelable context for the deferred service recovery.

Line 187 reuses the request context in a critical deferred recovery path. If that context is canceled during the subsequent operations (lines 197-232), EnableClusterServices will fail immediately because exec.CommandContext respects context cancellation. This leaves the cluster in a broken state: services are stopped but kubelet is not re-enabled.

Proposed fix
+	cleanupCtx := context.Background()
 	defer func() {
 		p.log.Info("Enabling cluster services in old stateroot")
-		if internalErr := p.ops.EnableClusterServices(ctx); internalErr != nil {
+		if internalErr := p.ops.EnableClusterServices(cleanupCtx); internalErr != nil {
 			if err == nil {
 				err = internalErr
 			} else {
 				err = fmt.Errorf("%s; also failed to enable cluster services: %w", err.Error(), internalErr)
 			}
 		}
 	}()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lca-cli/ipconfig/prepivot.go` around lines 185 - 193, The deferred recovery
block currently reuses the incoming request context (ctx) when calling
p.ops.EnableClusterServices inside the deferred func, which can be canceled and
cause EnableClusterServices to fail; change the deferred call to use a
non-cancelable context (e.g., ctx := context.Background() or context.TODO())
when invoking p.ops.EnableClusterServices so the recovery always runs regardless
of request cancellation, keeping the existing error-aggregation logic (the
deferred func that updates err) intact and referencing
p.ops.EnableClusterServices and the deferred func in prepivot.go.
internal/imagemgmt/imagemgmt.go (1)

240-261: ⚠️ Potential issue | 🟠 Major

Preserve cancellation in the image-deletion loop.

The podman rmi failures are intentionally best-effort here, but ignoring errors with _ at line 260 also discards context cancellation signals. When the context is cancelled, Execute returns a context.Canceled or context.DeadlineExceeded error which is swallowed, allowing the loop to continue deleting images instead of stopping. Check for context errors before treating other failures as ignorable.

Suggested fix
-			// Ignore errors, as images may be in use by transient containers or other images
-			output, _ := c.Executor.Execute(ctx, "podman", args...)
-			c.Log.Info("Deleted unused images", "output", output)
+			// Ignore non-context errors, as images may be in use by transient containers or other images
+			output, err := c.Executor.Execute(ctx, "podman", args...)
+			if err != nil {
+				if ctxErr := ctx.Err(); ctxErr != nil {
+					return ctxErr
+				}
+				c.Log.Error(err, "Ignoring podman rmi failure during cleanup", "args", args)
+			} else {
+				c.Log.Info("Deleted unused images", "output", output)
+			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/imagemgmt/imagemgmt.go` around lines 240 - 261, In
CleanupUnusedImages, don't swallow Execute errors blindly; preserve cancellation
by checking the returned error from c.Executor.Execute (called in the image
deletion loop) and if it's context.Canceled or context.DeadlineExceeded (or
ctx.Err() != nil) return that error immediately; only treat non-context errors
as ignorable (log them and continue). Update the loop around the Execute call in
CleanupUnusedImages to capture the error, check ctx.Err() or the specific
context errors, and return on cancellation before continuing to the next batch.
lca-cli/ops/ops.go (1)

252-276: ⚠️ Potential issue | 🟠 Major

waitForEtcd ignores the context parameter and blocks HTTP requests indefinitely.

The ctx parameter is declared but never used. The function uses a hardcoded time.After timeout instead of observing ctx.Done(), and http.Get is not bound to the context, so a hung HTTP request can block the caller regardless of context cancellation.

Use context.WithTimeout and http.NewRequestWithContext to properly bind both the timeout and the HTTP request to the provided context:

Suggested fix
 func (o *ops) waitForEtcd(ctx context.Context, healthzEndpoint string) error {
-	timeout := time.After(1 * time.Minute)
+	ctx, cancel := context.WithTimeout(ctx, time.Minute)
+	defer cancel()
 	ticker := time.NewTicker(1 * time.Second)
 	defer ticker.Stop()
 
 	for {
 		select {
-		case <-timeout:
-			return fmt.Errorf("timeout waiting for etcd")
+		case <-ctx.Done():
+			return ctx.Err()
 		case <-ticker.C:
-			resp, err := http.Get(healthzEndpoint) //nolint:gosec
+			req, err := http.NewRequestWithContext(ctx, http.MethodGet, healthzEndpoint, nil)
+			if err != nil {
+				return fmt.Errorf("failed to create etcd health request: %w", err)
+			}
+			resp, err := http.DefaultClient.Do(req) //nolint:gosec
 			if err != nil {
 				o.log.Infof("Waiting for etcd: %s", err)
 				continue
 			}
-			defer resp.Body.Close()
+			resp.Body.Close()
 
 			if resp.StatusCode != http.StatusOK {
 				o.log.Infof("Waiting for etcd, status: %d", resp.StatusCode)
 				continue
 			}

 			return nil
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lca-cli/ops/ops.go` around lines 252 - 276, The waitForEtcd function ignores
the passed ctx and uses a hard timeout and http.Get which can hang; update
waitForEtcd to derive a cancellable context (use context.WithTimeout(ctx,
1*time.Minute) or respect ctx directly), replace http.Get with an
http.NewRequestWithContext using that context (or an http.Client with the
context-bound request), and in the polling loop select on ctx.Done() (or the
derived timeout context) in addition to the ticker so cancellation/unblock
works; ensure resp.Body.Close() is deferred only after a successful non-nil resp
and propagate/return the context error when ctx is cancelled.
lca-cli/ibi-preparation/ibipreparation.go (1)

54-86: ⚠️ Potential issue | 🟠 Major

Propagate context through precacheFlow to enable cancellation during image pulling.

The Run(ctx) method now passes context to diskPreparation, postDeployment, cleanupRhcosSysroot, and shutdownNode, but the call to i.precacheFlow() at line ~75 bypasses context entirely. The precacheFlow method invokes workload.Precache(), which spawns concurrent image-pull goroutines via PullImages() without any cancellation control. Image pulling can be a long-running phase; missing context here leaves it outside the new cancellation model introduced by the recent context propagation effort.

Update precacheFlow(imageListFile, seedInfoFile, registriesConfFile string) to accept and forward ctx context.Context, and update workload.Precache() and PullImages() signatures accordingly to respect context cancellation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lca-cli/ibi-preparation/ibipreparation.go` around lines 54 - 86, The Run
method currently calls i.precacheFlow(...) without a context which prevents
cancellation during long image pulls; change precacheFlow(imageListFile,
seedInfoFile, registriesConfFile string) to precacheFlow(ctx context.Context,
imageListFile, seedInfoFile, registriesConfFile string), update its internal
call to workload.Precache(...) to accept and forward ctx, and modify
workload.Precache(...) and PullImages(...) signatures to take ctx
context.Context and to use ctx for cancellation inside goroutines; finally
update the call site in Run to call i.precacheFlow(ctx,
common.ContainersListFilePath, common.IBISeedInfoFilePath,
defaultRegistriesConfFile) so context is propagated end-to-end.
internal/reboot/reboot.go (1)

208-270: ⚠️ Potential issue | 🟠 Major

Fix context reuse and error handling in rollback sequences.

Two issues in InitiateRollback at lines 208-270 (IBURebootClient) and 382-440 (IPCRebootClient):

  1. Context reuse: Both implementations pass the caller's context through the entire rollback sequence (SetDefaultDeployment, RebootToNewStateRoot). Controllers pass reconcile contexts that may timeout or be cancelled. Once rollback is committed, derive a fresh unbounded context for the sequence instead of reusing the request-scoped one.

  2. Error handling bug (IBURebootClient only): Lines 268-270 unconditionally return an error via fmt.Errorf(...) even when RebootToNewStateRoot succeeds. The error message also incorrectly says "unable to get set deployment" instead of a reboot failure. Match the IPCRebootClient pattern: directly return the result from RebootToNewStateRoot.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/reboot/reboot.go` around lines 208 - 270, InitiateRollback currently
reuses the caller's context for the post-commit rollback steps and also has a
bug that always returns an error after calling RebootToNewStateRoot; fix by
deriving a fresh context (e.g., ctx := context.Background() or
context.WithCancel(context.Background())) for the sequence of
SetDefaultDeployment and RebootToNewStateRoot instead of passing the incoming
reconcile context, and change the final error handling in
IBURebootClient.InitiateRollback to return the result of RebootToNewStateRoot
directly (like IPCRebootClient) rather than unconditionally wrapping/returning a
misleading fmt.Errorf; apply the same context-derivation pattern to the
IPCRebootClient path that performs SetDefaultDeployment and
RebootToNewStateRoot.
lca-cli/postpivot/postpivot.go (1)

272-295: ⚠️ Potential issue | 🟠 Major

Stop the recert flow when the image pull times out or is canceled.

The wait.PollUntilContextCancel error is ignored here. If the 10-minute pull window expires, the code silently falls through to RecertFullFlow, which then proceeds with no guarantee the image was pulled successfully.

Suggested fix
 	if _, err := p.ops.RunInHostNamespace(ctx, "podman", "image", "exists", seedClusterInfo.RecertImagePullSpec); err != nil {
 		ctxWithTimeout, cancel := context.WithTimeout(ctx, 10*time.Minute)
 		defer cancel()
-		_ = wait.PollUntilContextCancel(ctxWithTimeout, time.Second, true, func(ctx context.Context) (bool, error) {
+		if err := wait.PollUntilContextCancel(ctxWithTimeout, time.Second, true, func(ctx context.Context) (bool, error) {
 			p.log.Info("pulling recert image")
 			command := "podman"
 			if seedReconfiguration.Proxy != nil {
 				command = fmt.Sprintf("HTTP_PROXY=%s HTTPS_PROXY=%s NO_PROXY=%s %s",
 					seedReconfiguration.Proxy.HTTPProxy, seedReconfiguration.Proxy.HTTPSProxy,
 					seedReconfiguration.Proxy.NoProxy, command)
 			}
 			if _, err := p.ops.RunBashInHostNamespace(ctx, command, "pull", "--authfile", common.ImageRegistryAuthFile, seedClusterInfo.RecertImagePullSpec); err != nil {
 				p.log.Warnf("failed to pull recert image, will retry, err: %s", err.Error())
 				return false, nil
 			}
 			return true, nil
-		})
+		}); err != nil {
+			return fmt.Errorf("failed to pull recert image: %w", err)
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lca-cli/postpivot/postpivot.go` around lines 272 - 295, The pull loop ignores
the error returned by wait.PollUntilContextCancel so a timeout/cancel still
calls RecertFullFlow; capture the error (e.g. pullErr :=
wait.PollUntilContextCancel(...)) after calling wait.PollUntilContextCancel and
if pullErr != nil return or propagate an error and stop the recert flow (handle
context.DeadlineExceeded/context.Canceled specially to log and return early).
Update the block around RunInHostNamespace/RunBashInHostNamespace and the
subsequent RecertFullFlow call to check pullErr and only call
p.ops.RecertFullFlow when the pull succeeded; include informative logging with
the pullErr when aborting.
🧹 Nitpick comments (1)
lca-cli/cmd/ipconfig/rollback.go (1)

70-79: Context handling in CLI commands is a codebase-wide pattern; this suggestion aligns with broader refactoring goals.

At line 70, context.Background() is created and used for handler and reboot operations. While passing cmd.Context() would be architecturally cleaner and enable proper signal propagation, this pattern is consistent across all CLI commands in the codebase (restore, prepivot, postpivot, etc.). The reboot operation itself does not respond to context cancellation. If the codebase adopts command-context propagation as a standard pattern, this should be refactored along with other CLI commands rather than in isolation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lca-cli/cmd/ipconfig/rollback.go` around lines 70 - 79, Replace the locally
created context.Background() with the command's context so cancellation/signal
propagation works across CLI commands: use cmd.Context() instead of
context.Background() when constructing ctx passed into
reboot.NewIPCRebootClient, ipconfig.NewRollbackHandler and when calling exec.Run
and rb.RebootToNewStateRoot; keep in mind rb.RebootToNewStateRoot may not
observe cancellation (so callers should still pass cmd.Context() for consistency
and future-proofing).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@controllers/seedgen_controller.go`:
- Around line 420-423: The helper methods (e.g., rmPreviousImagerContainer)
create a new context via context.Background(), which ignores reconcile
cancellation/timeouts; change each helper to accept a context parameter (ctx
context.Context) and use that ctx when calling r.Executor.Execute for commands
like "podman rm", "systemd-run", and "podman inspect"; then update all callers
(setupWorkspace, generateSeedImage, finishSeedgen and Reconcile) to pass the
incoming reconcile ctx through to these helpers so command execution respects
cancellation and deadlines.

In `@internal/ostreeclient/ostreeclient.go`:
- Around line 105-113: Change IsOstreeAdminSetDefaultFeatureEnabled to return
(bool, error) instead of just bool: call c.executor.Execute(ctx, "ostree",
"admin", "--help"), if err != nil return false, err; otherwise return
strings.Contains(output, "set-default"), nil. Update callers (e.g.,
InitiateRollback) to handle the returned error and distinguish execution
failures (context cancellation/deadlines) from a legitimate "feature not
present" false result so they don't misreport support when the probe itself
failed.

In `@internal/precache/workload/pullImages.go`:
- Around line 51-53: The helpers create new background contexts and drop caller
cancellation; thread a context.Context through the call chain instead: change
Precache to Precache(ctx, precacheSpec, authFile, bestEffort), PullImages to
PullImages(ctx, precacheSpec, authFile), pullImage to pullImage(ctx, image,
authFile, progress), podmanImgPull to podmanImgPull(ctx, image, authFile),
podmanImgExists to podmanImgExists(ctx, image) and CheckPodman to
CheckPodman(ctx); remove any ctx := context.Background() occurrences in
pullImages.go (the noted lines 51, 60–66, 72–74), update all call sites (e.g.,
ibipreparation.go and ibuPrecacheWorkload.go callers) to pass the incoming ctx,
and ensure Executor.ExecuteWithLiveLogger and other podman invocations use the
passed ctx so cancellation/timeouts propagate.

In `@internal/prep/prep.go`:
- Around line 98-99: The deferred cleanup currently calls
ops.UnmountAndRemoveImage with the request-scoped ctx which may be cancelled
before the defer runs; change the defer to use a non-cancellable context (e.g.,
context.Background() or a ctxNoCancel created via context.WithoutDeadline
semantics) when invoking UnmountAndRemoveImage so seedImage is always
unmounted/removed even if the original ctx is cancelled. Update the defer that
references ops.UnmountAndRemoveImage(ctx, seedImage) to pass the new
non-cancellable context instead of the request-scoped ctx.

In `@lca-cli/cmd/postpivotcmd.go`:
- Around line 68-71: The code creates a detached context with
context.Background() which prevents cancellation/timeouts from cmd.Context()
from reaching PostPivotConfiguration and the rollback; replace
context.Background() with the Cobra command's context (use cmd.Context()) and
pass that ctx into postPivotRunner.PostPivotConfiguration(ctx) and
rebootClient.AutoRollbackIfEnabled(ctx, ...) so both PostPivotConfiguration and
AutoRollbackIfEnabled observe cancellation/timeouts from the command.

In `@lca-cli/postpivot/postpivot.go`:
- Around line 951-954: The deferred Umount uses the request context which may be
canceled before cleanup; change the defer to call p.ops.Umount with
context.Background() (or a new non-cancelable context) instead of ctx so unmount
runs even if the original context is canceled — update the defer at the
Mount/Umount pair (p.ops.Mount(..., deviceName, mountFolder) and defer
p.ops.Umount(...)) to use context.Background().

In `@lca-cli/seedcreator/seedcreator.go`:
- Around line 69-73: CreateSeedImage currently roots the workflow in
context.Background(), preventing cancellation; change CreateSeedImage to accept
a caller-owned context (e.g., func (s *SeedCreator) CreateSeedImage(ctx
context.Context) error) and thread that ctx into downstream calls (replace local
ctx := context.Background() and pass ctx into s.copyConfigurationFiles, and any
other helper calls such as podman/rpm-ostree/tar invocations) so cancellation
and deadlines propagate from the caller through SeedCreator.CreateSeedImage and
related methods.

---

Outside diff comments:
In `@CLAUDE.md`:
- Around line 1-118: Replace the incorrect "Operator SDK v2" claim in the
project overview with the actual version from the Makefile (OPERATOR_SDK_VERSION
?= 1.40.0); specifically update the sentence that currently reads "Built with
Kubebuilder v4 and Operator SDK v2 in Go 1.24" to something like "Built with
Kubebuilder v4, Operator SDK v1.40.0 in Go 1.24" so the CLAUDE.md text matches
the Makefile's OPERATOR_SDK_VERSION.

In `@controllers/upgrade_handlers.go`:
- Around line 344-356: In autoRollbackIfEnabled, the call to
u.RebootClient.InitiateRollback(ctx, msg) should use a root context so rollback
isn't canceled with the reconciliation context; replace the passed-in ctx with
context.Background() (i.e., call
u.RebootClient.InitiateRollback(context.Background(), msg)) inside
UpgHandler.autoRollbackIfEnabled and ensure the function references
InitiateRollback and UpgHandler correctly.

In `@internal/imagemgmt/imagemgmt.go`:
- Around line 240-261: In CleanupUnusedImages, don't swallow Execute errors
blindly; preserve cancellation by checking the returned error from
c.Executor.Execute (called in the image deletion loop) and if it's
context.Canceled or context.DeadlineExceeded (or ctx.Err() != nil) return that
error immediately; only treat non-context errors as ignorable (log them and
continue). Update the loop around the Execute call in CleanupUnusedImages to
capture the error, check ctx.Err() or the specific context errors, and return on
cancellation before continuing to the next batch.

In `@internal/reboot/reboot.go`:
- Around line 208-270: InitiateRollback currently reuses the caller's context
for the post-commit rollback steps and also has a bug that always returns an
error after calling RebootToNewStateRoot; fix by deriving a fresh context (e.g.,
ctx := context.Background() or context.WithCancel(context.Background())) for the
sequence of SetDefaultDeployment and RebootToNewStateRoot instead of passing the
incoming reconcile context, and change the final error handling in
IBURebootClient.InitiateRollback to return the result of RebootToNewStateRoot
directly (like IPCRebootClient) rather than unconditionally wrapping/returning a
misleading fmt.Errorf; apply the same context-derivation pattern to the
IPCRebootClient path that performs SetDefaultDeployment and
RebootToNewStateRoot.

In `@lca-cli/cmd/ipconfig/postpivot.go`:
- Around line 78-90: Change runIPConfigPostPivot to accept a context.Context
parameter and use that context instead of context.Background(); specifically,
update the Run callback to call runIPConfigPostPivot(cmd.Context(), ...) and
inside runIPConfigPostPivot replace the local ctx := context.Background() with
the passed ctx and use that ctx when invoking postPivotHandler.Run(ctx) and
rbClient.AutoRollbackIfEnabled(ctx, reboot.IPConfigRunComponent, ...). Ensure
all occurrences of postPivotHandler.Run and AutoRollbackIfEnabled referenced in
runIPConfigPostPivot use the passed context so cancellation from the Cobra
command (cmd.Context()) propagates.

In `@lca-cli/ibi-preparation/ibipreparation.go`:
- Around line 54-86: The Run method currently calls i.precacheFlow(...) without
a context which prevents cancellation during long image pulls; change
precacheFlow(imageListFile, seedInfoFile, registriesConfFile string) to
precacheFlow(ctx context.Context, imageListFile, seedInfoFile,
registriesConfFile string), update its internal call to workload.Precache(...)
to accept and forward ctx, and modify workload.Precache(...) and PullImages(...)
signatures to take ctx context.Context and to use ctx for cancellation inside
goroutines; finally update the call site in Run to call i.precacheFlow(ctx,
common.ContainersListFilePath, common.IBISeedInfoFilePath,
defaultRegistriesConfFile) so context is propagated end-to-end.

In `@lca-cli/ipconfig/prepivot.go`:
- Around line 185-193: The deferred recovery block currently reuses the incoming
request context (ctx) when calling p.ops.EnableClusterServices inside the
deferred func, which can be canceled and cause EnableClusterServices to fail;
change the deferred call to use a non-cancelable context (e.g., ctx :=
context.Background() or context.TODO()) when invoking
p.ops.EnableClusterServices so the recovery always runs regardless of request
cancellation, keeping the existing error-aggregation logic (the deferred func
that updates err) intact and referencing p.ops.EnableClusterServices and the
deferred func in prepivot.go.

In `@lca-cli/ops/ops.go`:
- Around line 252-276: The waitForEtcd function ignores the passed ctx and uses
a hard timeout and http.Get which can hang; update waitForEtcd to derive a
cancellable context (use context.WithTimeout(ctx, 1*time.Minute) or respect ctx
directly), replace http.Get with an http.NewRequestWithContext using that
context (or an http.Client with the context-bound request), and in the polling
loop select on ctx.Done() (or the derived timeout context) in addition to the
ticker so cancellation/unblock works; ensure resp.Body.Close() is deferred only
after a successful non-nil resp and propagate/return the context error when ctx
is cancelled.

In `@lca-cli/postpivot/postpivot.go`:
- Around line 272-295: The pull loop ignores the error returned by
wait.PollUntilContextCancel so a timeout/cancel still calls RecertFullFlow;
capture the error (e.g. pullErr := wait.PollUntilContextCancel(...)) after
calling wait.PollUntilContextCancel and if pullErr != nil return or propagate an
error and stop the recert flow (handle context.DeadlineExceeded/context.Canceled
specially to log and return early). Update the block around
RunInHostNamespace/RunBashInHostNamespace and the subsequent RecertFullFlow call
to check pullErr and only call p.ops.RecertFullFlow when the pull succeeded;
include informative logging with the pullErr when aborting.

---

Nitpick comments:
In `@lca-cli/cmd/ipconfig/rollback.go`:
- Around line 70-79: Replace the locally created context.Background() with the
command's context so cancellation/signal propagation works across CLI commands:
use cmd.Context() instead of context.Background() when constructing ctx passed
into reboot.NewIPCRebootClient, ipconfig.NewRollbackHandler and when calling
exec.Run and rb.RebootToNewStateRoot; keep in mind rb.RebootToNewStateRoot may
not observe cancellation (so callers should still pass cmd.Context() for
consistency and future-proofing).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: d7240f78-76d7-46d4-a368-eb23f07f0963

📥 Commits

Reviewing files that changed from the base of the PR and between b6a88cc and efce46e.

📒 Files selected for processing (56)
  • CLAUDE.md
  • controllers/ibu_controller.go
  • controllers/ibu_controller_test.go
  • controllers/idle_handlers.go
  • controllers/idle_handlers_test.go
  • controllers/ipc_config_handlers.go
  • controllers/ipc_config_handlers_test.go
  • controllers/ipc_controller.go
  • controllers/ipc_controller_test.go
  • controllers/ipc_idle_handlers.go
  • controllers/ipc_idle_handlers_test.go
  • controllers/ipc_rollback_handlers.go
  • controllers/ipc_rollback_handlers_test.go
  • controllers/prep_handlers.go
  • controllers/rollback_handlers.go
  • controllers/seedgen_controller.go
  • controllers/upgrade_handlers.go
  • controllers/upgrade_handlers_test.go
  • internal/imagemgmt/imagemgmt.go
  • internal/imagemgmt/imagemgmt_test.go
  • internal/imagemgmt/mock_imagemgmt.go
  • internal/ostreeclient/mock_ostreeclient.go
  • internal/ostreeclient/ostreeclient.go
  • internal/precache/workload/pullImages.go
  • internal/prep/prep.go
  • internal/reboot/mock_reboot.go
  • internal/reboot/reboot.go
  • internal/reboot/reboot_test.go
  • lca-cli/cmd/create.go
  • lca-cli/cmd/ibi.go
  • lca-cli/cmd/ibuStaterootSetup.go
  • lca-cli/cmd/ipconfig/postpivot.go
  • lca-cli/cmd/ipconfig/prepivot.go
  • lca-cli/cmd/ipconfig/rollback.go
  • lca-cli/cmd/postpivotcmd.go
  • lca-cli/cmd/restore.go
  • lca-cli/ibi-preparation/ibipreparation.go
  • lca-cli/ibi-preparation/ibipreparation_test.go
  • lca-cli/initmonitor/initmonitor.go
  • lca-cli/ipconfig/postpivot.go
  • lca-cli/ipconfig/postpivot_test.go
  • lca-cli/ipconfig/prepivot.go
  • lca-cli/ipconfig/prepivot_test.go
  • lca-cli/ipconfig/rollback.go
  • lca-cli/ipconfig/rollback_test.go
  • lca-cli/ops/execute.go
  • lca-cli/ops/mock_execute.go
  • lca-cli/ops/mock_ops.go
  • lca-cli/ops/ops.go
  • lca-cli/ostreeclient/mock_rpmostreeclient.go
  • lca-cli/ostreeclient/rpmostreeclient.go
  • lca-cli/postpivot/postpivot.go
  • lca-cli/postpivot/postpivot_test.go
  • lca-cli/seedcreator/seedcreator.go
  • lca-cli/seedrestoration/seedrestoration.go
  • main/main.go

Comment thread controllers/seedgen_controller.go Outdated
Comment thread internal/ostreeclient/ostreeclient.go Outdated
Comment thread internal/precache/workload/pullImages.go Outdated
Comment thread internal/prep/prep.go Outdated
Comment thread lca-cli/cmd/postpivotcmd.go Outdated
Comment thread lca-cli/postpivot/postpivot.go Outdated
Comment thread lca-cli/seedcreator/seedcreator.go Outdated
@sebrandon1 sebrandon1 force-pushed the add-context-to-execute-interface branch from efce46e to 70adfe3 Compare April 27, 2026 18:02
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 27, 2026
@sebrandon1
Copy link
Copy Markdown
Contributor Author

/retest-required

@sebrandon1 sebrandon1 force-pushed the add-context-to-execute-interface branch from 70adfe3 to 0f3c830 Compare April 27, 2026 20:22
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
lca-cli/ipconfig/prepivot.go (1)

185-193: ⚠️ Potential issue | 🟠 Major

Use a fresh context for the deferred service re-enable.

This cleanup now reuses the main operation ctx. If that context is canceled or times out after StopClusterServices(ctx) succeeds, EnableClusterServices(ctx) can fail for the same reason and leave kubelet disabled after a partial pre-pivot. Cleanup should run with its own bounded context instead of inheriting cancellation from the work it is trying to unwind.

Suggested fix
defer func() {
	p.log.Info("Enabling cluster services in old stateroot")
-	if internalErr := p.ops.EnableClusterServices(ctx); internalErr != nil {
+	cleanupCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
+	defer cancel()
+	if internalErr := p.ops.EnableClusterServices(cleanupCtx); internalErr != nil {
		if err == nil {
			err = internalErr
		} else {
			err = fmt.Errorf("%s; also failed to enable cluster services: %w", err.Error(), internalErr)
		}
	}
}()

This needs a time import as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lca-cli/ipconfig/prepivot.go` around lines 185 - 193, The deferred cleanup
currently reuses the operation's ctx which may be canceled; change the defer to
create a fresh bounded context (e.g., ctx2, cancel :=
context.WithTimeout(context.Background(), <sensibleDuration>); defer cancel())
and call p.ops.EnableClusterServices(ctx2) so re-enabling cluster services is
not affected by the original context's cancellation; update the defer block to
use ctx2 and ensure you add the time import for the timeout value.
controllers/ipc_config_handlers.go (1)

801-817: ⚠️ Potential issue | 🟠 Major

Don't tie systemd-run --wait to the reconcile context.

Once the transient unit is created, canceling ctx aborts the waiting client via exec.CommandContext, not necessarily the unit that is now mutating the host. This path will report pre-pivot failure even though the systemd unit may still be running, which can lead to duplicate/conflicting attempts on the next reconcile. Use a detached context here, or explicitly stop/query the spawned unit when cancellation happens. The same risk exists in controllers/ipc_rollback_handlers.go around scheduleIPConfigRollback.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controllers/ipc_config_handlers.go` around lines 801 - 817, The call to
RunSystemdAction in RunLcaCliIPConfigPrePivot ties systemd-run --wait to the
reconcile ctx so cancelling ctx can kill the client while the transient systemd
unit continues running; change this to use a detached context (e.g., create a
new context.Background()-based ctx or a derived context not cancelled by the
reconcile) when invoking h.ChrootOps.RunSystemdAction so the systemd unit is
created and allowed to run independently, or alternatively implement explicit
unit follow-up logic on ctx cancellation (query/stop the unit). Make the same
change in the corresponding scheduleIPConfigRollback call in
controllers/ipc_rollback_handlers.go.
lca-cli/ops/ops.go (2)

737-745: ⚠️ Potential issue | 🟠 Major

Don't drop the container-stop polling error.

The result of wait.PollUntilContextCancel is ignored. If ctx times out or crictl stop keeps failing, this method proceeds as if containers were stopped and then shuts down CRI-O anyway.

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lca-cli/ops/ops.go` around lines 737 - 745, The polling call to
wait.PollUntilContextCancel is currently ignored so failures or timeouts during
container stop are swallowed; capture its returned error and handle it (e.g.,
assign err := wait.PollUntilContextCancel(...); if err != nil { return
fmt.Errorf("failed waiting for containers to stop: %w", err) } ) instead of
proceeding to shut down CRI-O; update the surrounding function to propagate or
log the error appropriately so RunBashInHostNamespace/crictl failures abort the
shutdown flow.

252-275: ⚠️ Potential issue | 🟠 Major

waitForEtcd still ignores the passed context.

The loop never checks ctx.Done(), and http.Get is not bound to ctx, so cancellation can still block here until the hardcoded timeout or a hung GET returns. Use ctx in the select and issue the request with a context-aware HTTP call.

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lca-cli/ops/ops.go` around lines 252 - 275, The waitForEtcd function ignores
the passed ctx and uses http.Get; modify it to listen for ctx cancellation and
issue a context-aware HTTP request: inside the select add a case <-ctx.Done()
that returns ctx.Err(), replace http.Get(healthzEndpoint) with creating an
http.Request via http.NewRequestWithContext(ctx, "GET", healthzEndpoint, nil)
and use http.DefaultClient.Do(req) (or a client variable) so the request is
bound to ctx, and avoid deferring resp.Body.Close() inside the loop by closing
resp.Body immediately after you're done checking StatusCode; keep the existing
timeout/ticker logic but ensure ctx cancellation short-circuits the loop.
internal/precache/workload/pullImages.go (1)

82-95: ⚠️ Potential issue | 🟠 Major

Stop retrying once the context is canceled.

pullImage retries up to MaxRetries even for context.Canceled/DeadlineExceeded. On a canceled precache, that turns each remaining image into repeated failed execs instead of aborting promptly. Treat cancellation as terminal and return immediately.

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/precache/workload/pullImages.go` around lines 82 - 95, The loop that
calls podmanImgPull (inside the pullImage/pullImages.go retry loop bounded by
MaxRetries) currently treats context cancellation like any other error and
continues retrying; update the error handling so that if the context has been
canceled or its deadline exceeded (check ctx.Err() != nil or use errors.Is(err,
context.Canceled)/errors.Is(err, context.DeadlineExceeded)) you log and return
immediately instead of retrying. Modify the branch after podmanImgPull fails to
detect context cancellation and exit the function (no further retries), keeping
the existing "manifest unknown" short-circuit behavior intact.
lca-cli/postpivot/postpivot.go (1)

276-291: ⚠️ Potential issue | 🟠 Major

Surface recert image pull timeout/cancellation.

This poll result is discarded. If the pull never succeeds within 10 minutes or ctx is canceled, the code still falls through into RecertFullFlow, which hides the real failure and can start recert without a pulled image.

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lca-cli/postpivot/postpivot.go` around lines 276 - 291, The
PollUntilContextCancel call using ctxWithTimeout currently discards its result
so a timeout/cancel is ignored and RecertFullFlow proceeds without the image;
change the code around wait.PollUntilContextCancel to capture its (bool, error)
result and handle both cases: if error != nil or the bool is false after the
timeout/cancellation, log a clear error including the context deadline/cancel
reason and return that error (or stop the post-pivot flow) instead of falling
through to RecertFullFlow; reference the ctxWithTimeout,
wait.PollUntilContextCancel call and the p.ops.RunBashInHostNamespace pull
attempt to locate where to add the result check and early return.
♻️ Duplicate comments (2)
lca-cli/ibi-preparation/ibipreparation.go (1)

133-133: ⚠️ Potential issue | 🟠 Major

Preserve cancellation through the precache step.

Run now receives a caller-owned ctx, but this call resets it to context.Background(). A timed-out or canceled IBI preparation will still keep precaching images. Thread ctx into precacheFlow and pass it to workload.Precache.

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lca-cli/ibi-preparation/ibipreparation.go` at line 133, The precache step
currently ignores the caller's cancellation by calling workload.Precache with
context.Background(); modify the code to thread the caller-owned ctx through
precacheFlow and invoke workload.Precache(ctx, imageList,
common.IBIPullSecretFilePath, i.config.PrecacheBestEffort) instead so
timeouts/cancellations from Run are preserved; update the precacheFlow
signature/uses and any call sites that start the precache to accept and forward
ctx (referencing Run, precacheFlow, workload.Precache, imageList,
common.IBIPullSecretFilePath, and i.config.PrecacheBestEffort).
lca-cli/ops/ops.go (1)

459-463: ⚠️ Potential issue | 🟠 Major

Use a non-cancelable context for deferred etcd cleanup.

If ctx is canceled while recert is unwinding, the deferred StopEtcdServer(ctx, ...) will short-circuit and leave the temporary etcd container behind. Cleanup should run with a non-cancelable context and log failures.

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lca-cli/ops/ops.go` around lines 459 - 463, When deferring StopEtcdServer
after RunUnauthenticatedEtcdServer, don't reuse the potentially cancelable ctx;
create a non-cancelable context (e.g. cleanupCtx := context.Background() or
context.WithoutCancel equivalent) and call o.StopEtcdServer(cleanupCtx,
authFile, common.EtcdContainerName) inside the defer, capturing and logging any
error returned (e.g. if err := o.StopEtcdServer(...); err != nil { /* log error
via o.Logger/errorf or fmt */ }). Update the defer that references
StopEtcdServer to use cleanupCtx and ensure failures are logged.
🧹 Nitpick comments (2)
internal/reboot/reboot_test.go (1)

63-64: Assert the forwarded context, not just any context.

Using gomock.Any() on Line 63 won’t catch regressions where IsOrigStaterootBooted stops propagating the caller’s context.

Proposed test tightening
+			ctx := context.Background()
-			mockRpmostreeclient.EXPECT().GetCurrentStaterootName(gomock.Any()).Return(tt.currentStateRoot, nil).Times(1)
-			got, err := rebootClient.IsOrigStaterootBooted(context.Background(), tt.args.ibu.Spec.SeedImageRef.Version)
+			mockRpmostreeclient.EXPECT().GetCurrentStaterootName(gomock.Eq(ctx)).Return(tt.currentStateRoot, nil).Times(1)
+			got, err := rebootClient.IsOrigStaterootBooted(ctx, tt.args.ibu.Spec.SeedImageRef.Version)

As per coding guidelines "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/reboot/reboot_test.go` around lines 63 - 64, The test currently uses
gomock.Any() for the context which won't catch regressions in context
propagation; modify the test to create a concrete ctx variable (e.g., ctx :=
context.Background()), pass that ctx into rebootClient.IsOrigStaterootBooted,
and change the mock expectation on mockRpmostreeclient.GetCurrentStaterootName
to expect that exact context (e.g.,
mockRpmostreeclient.EXPECT().GetCurrentStaterootName(gomock.Eq(ctx)).Return(...)).
This ensures the test asserts the forwarded context from IsOrigStaterootBooted
rather than accepting any context.
lca-cli/ipconfig/postpivot_test.go (1)

94-106: These expectations no longer prove the caller's context is propagated.

Using gomock.Any() for every new context.Context arg means this still passes if Run() regresses to context.Background() internally. Since context threading is the behavior under test here, move the ctx setup before the expectations and match that value (or a tighter matcher derived from it) on the mocked ops calls.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lca-cli/ipconfig/postpivot_test.go` around lines 94 - 106, The test currently
uses gomock.Any() for every context argument which masks regressions in context
propagation; before setting up gomock expectations create the test context (e.g.
ctx := context.WithTimeout(...) or context.Background()) and use that exact ctx
in the mocked calls (replace the first gomock.Any() argument in
RunInHostNamespace, RecertFullFlow, SystemctlAction, ReadFile, etc. with
gomock.Eq(ctx) or the ctx variable itself) and then call the code under test
with that same ctx so the expectations assert the real context is threaded
through (refer to RunInHostNamespace, RecertFullFlow, SystemctlAction, ReadFile
and handler.Run/context usage to locate spots to change).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@controllers/ipc_controller.go`:
- Around line 327-334: The function isUnbootedStaterootAvailable should
propagate context cancellation/timeouts instead of treating all errors as "not
available": after calling rpmOstreeClient.GetUnbootedStaterootName(ctx) check
err and if errors.Is(err, context.Canceled) || errors.Is(err,
context.DeadlineExceeded) return nil, err; if err != nil return nil, err; then
if unbootedStaterootName == "" return lo.ToPtr(false), nil; otherwise return
lo.ToPtr(true), nil; also add the errors import.

In `@lca-cli/cmd/ibuStaterootSetup.go`:
- Line 120: The SIGTERM handler uses context.Background() when calling
opsClient.ImageExists which can block shutdown; change that call to use a short
timeout-bound context (e.g., ctx, cancel :=
context.WithTimeout(context.Background(), 3-10*time.Second); defer cancel()) and
pass ctx to ImageExists so the probe is bounded; ensure you call cancel() after
the check and adjust imports to include time if needed, targeting the
ImageExists call site where seedImage and seedImageExists are used inside the
SIGTERM handler.

In `@main/main.go`:
- Line 270: The call to chrootOp.GetContainerStorageTarget uses
context.Background(), which ignores cancellation and can hang startup;
create/use the signal-derived startup context (the same one you build for
manager startup, e.g. the context passed into mgr.Start or the new “startup”
context created from signals) before these init calls and pass that context into
GetContainerStorageTarget instead of context.Background(), ensuring any
cancellation or SIG handling will abort the lookup; update the call site where
containerStorageMountpointTarget is obtained and any other early init calls that
currently use context.Background() to use the startup context.

---

Outside diff comments:
In `@controllers/ipc_config_handlers.go`:
- Around line 801-817: The call to RunSystemdAction in RunLcaCliIPConfigPrePivot
ties systemd-run --wait to the reconcile ctx so cancelling ctx can kill the
client while the transient systemd unit continues running; change this to use a
detached context (e.g., create a new context.Background()-based ctx or a derived
context not cancelled by the reconcile) when invoking
h.ChrootOps.RunSystemdAction so the systemd unit is created and allowed to run
independently, or alternatively implement explicit unit follow-up logic on ctx
cancellation (query/stop the unit). Make the same change in the corresponding
scheduleIPConfigRollback call in controllers/ipc_rollback_handlers.go.

In `@internal/precache/workload/pullImages.go`:
- Around line 82-95: The loop that calls podmanImgPull (inside the
pullImage/pullImages.go retry loop bounded by MaxRetries) currently treats
context cancellation like any other error and continues retrying; update the
error handling so that if the context has been canceled or its deadline exceeded
(check ctx.Err() != nil or use errors.Is(err, context.Canceled)/errors.Is(err,
context.DeadlineExceeded)) you log and return immediately instead of retrying.
Modify the branch after podmanImgPull fails to detect context cancellation and
exit the function (no further retries), keeping the existing "manifest unknown"
short-circuit behavior intact.

In `@lca-cli/ipconfig/prepivot.go`:
- Around line 185-193: The deferred cleanup currently reuses the operation's ctx
which may be canceled; change the defer to create a fresh bounded context (e.g.,
ctx2, cancel := context.WithTimeout(context.Background(), <sensibleDuration>);
defer cancel()) and call p.ops.EnableClusterServices(ctx2) so re-enabling
cluster services is not affected by the original context's cancellation; update
the defer block to use ctx2 and ensure you add the time import for the timeout
value.

In `@lca-cli/ops/ops.go`:
- Around line 737-745: The polling call to wait.PollUntilContextCancel is
currently ignored so failures or timeouts during container stop are swallowed;
capture its returned error and handle it (e.g., assign err :=
wait.PollUntilContextCancel(...); if err != nil { return fmt.Errorf("failed
waiting for containers to stop: %w", err) } ) instead of proceeding to shut down
CRI-O; update the surrounding function to propagate or log the error
appropriately so RunBashInHostNamespace/crictl failures abort the shutdown flow.
- Around line 252-275: The waitForEtcd function ignores the passed ctx and uses
http.Get; modify it to listen for ctx cancellation and issue a context-aware
HTTP request: inside the select add a case <-ctx.Done() that returns ctx.Err(),
replace http.Get(healthzEndpoint) with creating an http.Request via
http.NewRequestWithContext(ctx, "GET", healthzEndpoint, nil) and use
http.DefaultClient.Do(req) (or a client variable) so the request is bound to
ctx, and avoid deferring resp.Body.Close() inside the loop by closing resp.Body
immediately after you're done checking StatusCode; keep the existing
timeout/ticker logic but ensure ctx cancellation short-circuits the loop.

In `@lca-cli/postpivot/postpivot.go`:
- Around line 276-291: The PollUntilContextCancel call using ctxWithTimeout
currently discards its result so a timeout/cancel is ignored and RecertFullFlow
proceeds without the image; change the code around wait.PollUntilContextCancel
to capture its (bool, error) result and handle both cases: if error != nil or
the bool is false after the timeout/cancellation, log a clear error including
the context deadline/cancel reason and return that error (or stop the post-pivot
flow) instead of falling through to RecertFullFlow; reference the
ctxWithTimeout, wait.PollUntilContextCancel call and the
p.ops.RunBashInHostNamespace pull attempt to locate where to add the result
check and early return.

---

Duplicate comments:
In `@lca-cli/ibi-preparation/ibipreparation.go`:
- Line 133: The precache step currently ignores the caller's cancellation by
calling workload.Precache with context.Background(); modify the code to thread
the caller-owned ctx through precacheFlow and invoke workload.Precache(ctx,
imageList, common.IBIPullSecretFilePath, i.config.PrecacheBestEffort) instead so
timeouts/cancellations from Run are preserved; update the precacheFlow
signature/uses and any call sites that start the precache to accept and forward
ctx (referencing Run, precacheFlow, workload.Precache, imageList,
common.IBIPullSecretFilePath, and i.config.PrecacheBestEffort).

In `@lca-cli/ops/ops.go`:
- Around line 459-463: When deferring StopEtcdServer after
RunUnauthenticatedEtcdServer, don't reuse the potentially cancelable ctx; create
a non-cancelable context (e.g. cleanupCtx := context.Background() or
context.WithoutCancel equivalent) and call o.StopEtcdServer(cleanupCtx,
authFile, common.EtcdContainerName) inside the defer, capturing and logging any
error returned (e.g. if err := o.StopEtcdServer(...); err != nil { /* log error
via o.Logger/errorf or fmt */ }). Update the defer that references
StopEtcdServer to use cleanupCtx and ensure failures are logged.

---

Nitpick comments:
In `@internal/reboot/reboot_test.go`:
- Around line 63-64: The test currently uses gomock.Any() for the context which
won't catch regressions in context propagation; modify the test to create a
concrete ctx variable (e.g., ctx := context.Background()), pass that ctx into
rebootClient.IsOrigStaterootBooted, and change the mock expectation on
mockRpmostreeclient.GetCurrentStaterootName to expect that exact context (e.g.,
mockRpmostreeclient.EXPECT().GetCurrentStaterootName(gomock.Eq(ctx)).Return(...)).
This ensures the test asserts the forwarded context from IsOrigStaterootBooted
rather than accepting any context.

In `@lca-cli/ipconfig/postpivot_test.go`:
- Around line 94-106: The test currently uses gomock.Any() for every context
argument which masks regressions in context propagation; before setting up
gomock expectations create the test context (e.g. ctx :=
context.WithTimeout(...) or context.Background()) and use that exact ctx in the
mocked calls (replace the first gomock.Any() argument in RunInHostNamespace,
RecertFullFlow, SystemctlAction, ReadFile, etc. with gomock.Eq(ctx) or the ctx
variable itself) and then call the code under test with that same ctx so the
expectations assert the real context is threaded through (refer to
RunInHostNamespace, RecertFullFlow, SystemctlAction, ReadFile and
handler.Run/context usage to locate spots to change).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: cbc175aa-17e9-487c-bdd2-458a20f210c1

📥 Commits

Reviewing files that changed from the base of the PR and between efce46e and 0f3c830.

📒 Files selected for processing (56)
  • controllers/ibu_controller.go
  • controllers/ibu_controller_test.go
  • controllers/idle_handlers.go
  • controllers/idle_handlers_test.go
  • controllers/ipc_config_handlers.go
  • controllers/ipc_config_handlers_test.go
  • controllers/ipc_controller.go
  • controllers/ipc_controller_test.go
  • controllers/ipc_idle_handlers.go
  • controllers/ipc_idle_handlers_test.go
  • controllers/ipc_rollback_handlers.go
  • controllers/ipc_rollback_handlers_test.go
  • controllers/prep_handlers.go
  • controllers/rollback_handlers.go
  • controllers/seedgen_controller.go
  • controllers/upgrade_handlers.go
  • controllers/upgrade_handlers_test.go
  • internal/imagemgmt/imagemgmt.go
  • internal/imagemgmt/imagemgmt_test.go
  • internal/imagemgmt/mock_imagemgmt.go
  • internal/ostreeclient/mock_ostreeclient.go
  • internal/ostreeclient/ostreeclient.go
  • internal/precache/workload/pullImages.go
  • internal/prep/prep.go
  • internal/reboot/mock_reboot.go
  • internal/reboot/reboot.go
  • internal/reboot/reboot_test.go
  • lca-cli/cmd/create.go
  • lca-cli/cmd/ibi.go
  • lca-cli/cmd/ibuPrecacheWorkload.go
  • lca-cli/cmd/ibuStaterootSetup.go
  • lca-cli/cmd/ipconfig/postpivot.go
  • lca-cli/cmd/ipconfig/prepivot.go
  • lca-cli/cmd/ipconfig/rollback.go
  • lca-cli/cmd/postpivotcmd.go
  • lca-cli/cmd/restore.go
  • lca-cli/ibi-preparation/ibipreparation.go
  • lca-cli/ibi-preparation/ibipreparation_test.go
  • lca-cli/initmonitor/initmonitor.go
  • lca-cli/ipconfig/postpivot.go
  • lca-cli/ipconfig/postpivot_test.go
  • lca-cli/ipconfig/prepivot.go
  • lca-cli/ipconfig/prepivot_test.go
  • lca-cli/ipconfig/rollback.go
  • lca-cli/ipconfig/rollback_test.go
  • lca-cli/ops/execute.go
  • lca-cli/ops/mock_execute.go
  • lca-cli/ops/mock_ops.go
  • lca-cli/ops/ops.go
  • lca-cli/ostreeclient/mock_rpmostreeclient.go
  • lca-cli/ostreeclient/rpmostreeclient.go
  • lca-cli/postpivot/postpivot.go
  • lca-cli/postpivot/postpivot_test.go
  • lca-cli/seedcreator/seedcreator.go
  • lca-cli/seedrestoration/seedrestoration.go
  • main/main.go
✅ Files skipped from review due to trivial changes (8)
  • controllers/ibu_controller_test.go
  • lca-cli/cmd/ipconfig/postpivot.go
  • controllers/prep_handlers.go
  • lca-cli/ipconfig/postpivot.go
  • lca-cli/ops/mock_execute.go
  • controllers/ipc_idle_handlers_test.go
  • internal/imagemgmt/mock_imagemgmt.go
  • internal/reboot/reboot.go
🚧 Files skipped from review as they are similar to previous changes (24)
  • lca-cli/cmd/ipconfig/rollback.go
  • lca-cli/cmd/create.go
  • lca-cli/cmd/postpivotcmd.go
  • lca-cli/cmd/restore.go
  • lca-cli/ipconfig/rollback.go
  • lca-cli/cmd/ibi.go
  • controllers/upgrade_handlers_test.go
  • internal/imagemgmt/imagemgmt_test.go
  • lca-cli/seedrestoration/seedrestoration.go
  • controllers/idle_handlers.go
  • lca-cli/ipconfig/rollback_test.go
  • lca-cli/ibi-preparation/ibipreparation_test.go
  • controllers/ipc_idle_handlers.go
  • internal/prep/prep.go
  • controllers/seedgen_controller.go
  • lca-cli/cmd/ipconfig/prepivot.go
  • controllers/upgrade_handlers.go
  • lca-cli/ostreeclient/rpmostreeclient.go
  • internal/ostreeclient/ostreeclient.go
  • internal/imagemgmt/imagemgmt.go
  • internal/reboot/mock_reboot.go
  • controllers/ipc_rollback_handlers_test.go
  • lca-cli/ostreeclient/mock_rpmostreeclient.go
  • controllers/ipc_controller_test.go

Comment on lines +327 to 334
func isUnbootedStaterootAvailable(ctx context.Context, rpmOstreeClient rpmostreeclient.IClient) (*bool, error) {
if rpmOstreeClient == nil {
return nil, fmt.Errorf("rpmOstreeClient is nil")
}

unbootedStaterootName, err := rpmOstreeClient.GetUnbootedStaterootName()
unbootedStaterootName, err := rpmOstreeClient.GetUnbootedStaterootName(ctx)
if err != nil || unbootedStaterootName == "" {
return lo.ToPtr(false), nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Propagate cancellation instead of reporting “no unbooted stateroot”.

After this change, GetUnbootedStaterootName(ctx) can fail because the caller canceled or timed out the reconcile. Collapsing every error into false, nil turns that control signal into a real state decision: config can re-enter pre-pivot, and rollback validation can be marked failed, even though nothing was actually checked. Only the genuine “not available” case should map to false.

Suggested fix
func isUnbootedStaterootAvailable(ctx context.Context, rpmOstreeClient rpmostreeclient.IClient) (*bool, error) {
	if rpmOstreeClient == nil {
		return nil, fmt.Errorf("rpmOstreeClient is nil")
	}

	unbootedStaterootName, err := rpmOstreeClient.GetUnbootedStaterootName(ctx)
-	if err != nil || unbootedStaterootName == "" {
+	if err != nil {
+		if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
+			return nil, err
+		}
+		return lo.ToPtr(false), nil
+	}
+	if unbootedStaterootName == "" {
		return lo.ToPtr(false), nil
	}

	return lo.ToPtr(true), nil
}

You'll also need to add the errors import.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@controllers/ipc_controller.go` around lines 327 - 334, The function
isUnbootedStaterootAvailable should propagate context cancellation/timeouts
instead of treating all errors as "not available": after calling
rpmOstreeClient.GetUnbootedStaterootName(ctx) check err and if errors.Is(err,
context.Canceled) || errors.Is(err, context.DeadlineExceeded) return nil, err;
if err != nil return nil, err; then if unbootedStaterootName == "" return
lo.ToPtr(false), nil; otherwise return lo.ToPtr(true), nil; also add the errors
import.

Comment thread lca-cli/cmd/ibuStaterootSetup.go Outdated
Comment thread main/main.go Outdated
@sebrandon1 sebrandon1 changed the title feat: add context.Context to Execute and Ops interfaces CNF-23402: Add context.Context to Execute and Ops interfaces May 1, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 1, 2026

@sebrandon1: This pull request references CNF-23402 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

  • Replace exec.Command with exec.CommandContext so external commands can be aborted via context cancellation
  • Add ctx context.Context as first parameter to the Execute interface (Execute, ExecuteWithLiveLogger) and all 3 implementations (regularExecutor, nsenterExecutor, chrootExecutor)
  • Propagate ctx through the Ops interface (~25 command-executing methods), downstream consumers (ostreeclient, rpmostreeclient, reboot, imagemgmt, precache/workload), controllers (using existing reconcile ctx), and CLI commands (using context.Background())
  • Replace context.TODO() with proper context usage throughout
  • Regenerate all mocks

This enables proper timeout and cancellation support for long-running operations like podman pull, rpm-ostree, and recert.

Closes #200

Test plan

  • make build — operator binary compiles
  • make cli-build — CLI binary compiles
  • make golangci-lint — all linters pass
  • make unittest — all unit tests pass (0 failures)
  • make generate — mocks regenerated cleanly

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sebrandon1 sebrandon1 force-pushed the add-context-to-execute-interface branch from 0f3c830 to 3a47e05 Compare May 5, 2026 22:37
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)
internal/reboot/reboot.go (1)

180-191: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make the post-reboot wait respect ctx.Done().

Both Reboot implementations now accept a context, but after systemd-run they still block in time.Sleep(defaultRebootTimeout). If reboot does not terminate the process immediately, canceled or timed-out callers still hang for 60 minutes, which defeats the new cancellation behavior.

Suggested fix
 	c.log.Info(fmt.Sprintf("Wait for %s to be killed via SIGTERM", defaultRebootTimeout.String()))
-	time.Sleep(defaultRebootTimeout)
+	timer := time.NewTimer(defaultRebootTimeout)
+	defer timer.Stop()
+	select {
+	case <-ctx.Done():
+		return fmt.Errorf("reboot initiated but context ended before process exit: %w", ctx.Err())
+	case <-timer.C:
+	}
 
 	return fmt.Errorf("failed to reboot. This should never happen! Please check the system")

Also applies to: 467-478

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/reboot/reboot.go` around lines 180 - 191, The Reboot method in
IBURebootClient currently uses a blocking time.Sleep(defaultRebootTimeout) after
invoking systemd-run which ignores ctx cancellation; replace the sleep with a
select that waits on time.After(defaultRebootTimeout) and ctx.Done() so callers
can be canceled or time out - e.g., in Reboot (and the other Reboot
implementation around 467-478) use select { case
<-time.After(defaultRebootTimeout): /* proceed */; case <-ctx.Done(): return
ctx.Err() } and keep the existing log behavior when the wait completes normally.
controllers/prep_handlers.go (1)

318-343: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missed context propagation: validateSeedOcpVersion uses context.Background() on a reconcile path.

validateSeedOcpVersion is called from validateIBUSpec(ctx, ibu) (line 431) which is in turn invoked from handlePrep(ctx, ibu) (line 562). The reconcile ctx is available all the way down, but this method discards it and creates a fresh context.Background() for the r.Get(...) call against the API server. That defeats reconcile cancellation/timeout for this lookup and is inconsistent with the rest of this PR.

♻️ Suggested change
-func (r *ImageBasedUpgradeReconciler) validateSeedOcpVersion(seedOcpVersion string) error {
+func (r *ImageBasedUpgradeReconciler) validateSeedOcpVersion(ctx context.Context, seedOcpVersion string) error {
 	// get target OCP version
 	targetClusterVersion := &configv1.ClusterVersion{}
-	if err := r.Get(context.Background(), types.NamespacedName{Name: "version"}, targetClusterVersion); err != nil {
+	if err := r.Get(ctx, types.NamespacedName{Name: "version"}, targetClusterVersion); err != nil {
 		return fmt.Errorf("failed to get ClusterVersion for target: %w", err)
 	}

And update the caller:

-	if err := r.validateSeedOcpVersion(ibu.Spec.SeedImageRef.Version); err != nil {
+	if err := r.validateSeedOcpVersion(ctx, ibu.Spec.SeedImageRef.Version); err != nil {
 		return fmt.Errorf("failed to validate seed image OCP version: %w", err)
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@controllers/prep_handlers.go` around lines 318 - 343, The function
validateSeedOcpVersion currently uses context.Background() for the r.Get call
which drops reconcile cancellation/timeout; change validateSeedOcpVersion to
accept a context.Context parameter and use that context when calling r.Get
(replace context.Background() with the passed ctx), update callers
(validateIBUSpec and ultimately handlePrep) to pass the reconcile ctx through to
validateIBUSpec and then to validateSeedOcpVersion so the API lookup honors the
reconcile context and timeouts.
controllers/upgrade_handlers.go (1)

354-366: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t let reconcile cancellation suppress automatic rollback.

This is the recovery path for fatal post-pivot failures. Reusing the reconcile ctx means a shutting-down or canceled controller can abort InitiateRollback(...) immediately and leave the node in the failed upgrade state without the rollback this branch promises.

🛟 Suggested fix
 func (u *UpgHandler) autoRollbackIfEnabled(ctx context.Context, ibu *ibuv1.ImageBasedUpgrade, msg string) {
 	// Check whether auto-rollback is disabled using annotation
 	if val, exists := ibu.GetAnnotations()[common.AutoRollbackOnFailureUpgradeCompletionAnnotation]; exists {
 		if val == common.AutoRollbackDisableValue {
 			u.Log.Info("Auto-rollback upgrade completion is disabled")
 			return
 		}
 	}
 
 	u.Log.Info("Automatically rolling back due to failure")
 
-	if err := u.RebootClient.InitiateRollback(ctx, msg); err != nil {
+	rollbackCtx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
+	defer cancel()
+	if err := u.RebootClient.InitiateRollback(rollbackCtx, msg); err != nil {
 		u.Log.Info(fmt.Sprintf("Unable to auto rollback: %s", err))
 		return
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@controllers/upgrade_handlers.go` around lines 354 - 366, The
autoRollbackIfEnabled method currently reuses the reconcile ctx which can be
cancelled and abort InitiateRollback; change it so
UpgHandler.autoRollbackIfEnabled creates and uses a non-cancellable context
(e.g., context.Background() or context.WithTimeout/WithDeadline) instead of the
incoming ctx when calling u.RebootClient.InitiateRollback(ctx, msg), ensuring
InitiateRollback runs to completion even if the reconcile context is cancelled;
update the call site in autoRollbackIfEnabled to pass the new context and keep
existing logging of failures.
lca-cli/ipconfig/prepivot.go (1)

180-194: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Use a non-cancellable context for the deferred EnableClusterServices recovery.

StopClusterServices(ctx) succeeds with the request-scoped ctx, then the deferred EnableClusterServices(ctx) uses the same ctx. If Run's ctx is cancelled (deadline expiry or parent cancellation) before the defer executes, exec.CommandContext will refuse to start the recovery command and the node will be left with cluster services stopped. This is the same pattern previously flagged on internal/prep/prep.go (deferred UnmountAndRemoveImage) that was fixed with context.Background().

🔧 Proposed fix
 	p.log.Info("Stopping cluster services")
 	if err = p.ops.StopClusterServices(ctx); err != nil {
 		err = fmt.Errorf("failed to stop cluster services: %w", err)
 		return
 	}
 
 	defer func() {
 		p.log.Info("Enabling cluster services in old stateroot")
-		if internalErr := p.ops.EnableClusterServices(ctx); internalErr != nil {
+		// Use a non-cancellable context so recovery still runs if the request
+		// ctx was cancelled or its deadline elapsed.
+		cleanupCtx := context.Background()
+		if internalErr := p.ops.EnableClusterServices(cleanupCtx); internalErr != nil {
 			if err == nil {
 				err = internalErr
 			} else {
 				err = fmt.Errorf("%s; also failed to enable cluster services: %w", err.Error(), internalErr)
 			}
 		}
 	}()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lca-cli/ipconfig/prepivot.go` around lines 180 - 194, The deferred recovery
uses the request-scoped ctx which may be cancelled before the defer runs; change
the defer to call p.ops.EnableClusterServices with a non-cancellable context
(e.g., context.Background()) instead of the original ctx so the recovery command
can run even if Run's ctx is cancelled; update the defer closure around
p.ops.EnableClusterServices(ctx) to create and use a background context and
preserve error handling logic that merges internalErr into the outer err.
lca-cli/ibi-preparation/ibipreparation.go (1)

74-74: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Precache flow drops the propagated context.

Run(ctx) calls i.precacheFlow(...) (line 74) but precacheFlow does not take a context, and on line 133 it calls workload.Precache(context.Background(), ...). Since precaching is one of the long-running operations the PR aims to make cancellable (per the PR description), using context.Background() here means precache cannot be cancelled or time-limited even though the rest of Run is wired up. Thread ctx through precacheFlow and pass it to workload.Precache.

Proposed fix
-	if err := i.precacheFlow(common.ContainersListFilePath, common.IBISeedInfoFilePath, defaultRegistriesConfFile); err != nil {
+	if err := i.precacheFlow(ctx, common.ContainersListFilePath, common.IBISeedInfoFilePath, defaultRegistriesConfFile); err != nil {
 		return fmt.Errorf("failed to precache: %w", err)
 	}
@@
-func (i *IBIPrepare) precacheFlow(imageListFile, seedInfoFile, registriesConfFile string) error {
+func (i *IBIPrepare) precacheFlow(ctx context.Context, imageListFile, seedInfoFile, registriesConfFile string) error {
@@
-	if err := workload.Precache(context.Background(), imageList, common.IBIPullSecretFilePath, i.config.PrecacheBestEffort); err != nil {
+	if err := workload.Precache(ctx, imageList, common.IBIPullSecretFilePath, i.config.PrecacheBestEffort); err != nil {
 		return fmt.Errorf("failed to start precache: %w", err)
 	}

Also applies to: 89-89, 133-133

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lca-cli/ibi-preparation/ibipreparation.go` at line 74, The precache flow
drops the propagated context: change precacheFlow to accept a context.Context
(e.g., precacheFlow(ctx context.Context, ...)) and update its callers (the
Run(ctx) invocation and any other calls) to pass the incoming ctx instead of
using context.Background(); inside precacheFlow replace
workload.Precache(context.Background(), ...) with workload.Precache(ctx, ...) so
the long-running precache operation is cancellable and honors
timeouts/cancellation from Run(ctx).
lca-cli/ops/ops.go (2)

737-745: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate failures from the container-stop retry loop.

The poll result is discarded. If crictl stop keeps failing or ctx is canceled, this still falls through to systemctl stop crio and can return success after an incomplete shutdown.

Suggested fix
-		_ = wait.PollUntilContextCancel(ctx, time.Second, true, func(ctx context.Context) (done bool, err error) {
+		if err := wait.PollUntilContextCancel(ctx, time.Second, true, func(ctx context.Context) (done bool, err error) {
 			o.log.Info("Stop running containers")
 			args := []string{"ps", "-q", "|", "xargs", "--no-run-if-empty", "--max-args", "1", "--max-procs", "10", "crictl", "stop", "--timeout", "5"}
 			_, err = o.RunBashInHostNamespace(ctx, "crictl", args...)
 			if err != nil {
 				return false, fmt.Errorf("failed to stop running containers: %w", err)
 			}
 			return true, nil
-		})
+		}); err != nil {
+			return fmt.Errorf("failed to stop running containers: %w", err)
+		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lca-cli/ops/ops.go` around lines 737 - 745, The poll call's returned error is
ignored so failures or context cancellation during the crictl stop loop are
swallowed; assign the result of wait.PollUntilContextCancel(...) to a variable,
check it, and propagate or return a wrapped error instead of continuing to
systemctl stop crio. Locate the anonymous func passed to
wait.PollUntilContextCancel in the same block (which calls
o.RunBashInHostNamespace with args and logs "Stop running containers") and
change the caller to: err := wait.PollUntilContextCancel(...); if err != nil {
return fmt.Errorf("stopping containers failed: %w", err) } (or propagate
appropriately from the surrounding function).

252-275: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make waitForEtcd actually honor ctx.

The new signature takes ctx, but this loop still waits on its own ticker/timeout and probes with http.Get without a request context. A canceled caller can still block here for up to a minute, which defeats the PR's cancellation goal.

Suggested fix
 func (o *ops) waitForEtcd(ctx context.Context, healthzEndpoint string) error {
-	timeout := time.After(1 * time.Minute)
-	ticker := time.NewTicker(1 * time.Second)
-	defer ticker.Stop()
-
-	for {
-		select {
-		case <-timeout:
-			return fmt.Errorf("timeout waiting for etcd")
-		case <-ticker.C:
-			resp, err := http.Get(healthzEndpoint) //nolint:gosec
-			if err != nil {
-				o.log.Infof("Waiting for etcd: %s", err)
-				continue
-			}
-			defer func() { _ = resp.Body.Close() }()
-
-			if resp.StatusCode != http.StatusOK {
-				o.log.Infof("Waiting for etcd, status: %d", resp.StatusCode)
-				continue
-			}
-
-			return nil
-		}
-	}
+	client := &http.Client{}
+	return wait.PollUntilContextTimeout(ctx, time.Second, time.Minute, true, func(ctx context.Context) (bool, error) {
+		req, err := http.NewRequestWithContext(ctx, http.MethodGet, healthzEndpoint, nil)
+		if err != nil {
+			return false, err
+		}
+		resp, err := client.Do(req) //nolint:gosec
+		if err != nil {
+			o.log.Infof("Waiting for etcd: %s", err)
+			return false, nil
+		}
+		defer func() { _ = resp.Body.Close() }()
+
+		if resp.StatusCode != http.StatusOK {
+			o.log.Infof("Waiting for etcd, status: %d", resp.StatusCode)
+			return false, nil
+		}
+		return true, nil
+	})
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lca-cli/ops/ops.go` around lines 252 - 275, waitForEtcd does not honor the
passed ctx: replace the ad-hoc timeout/ticker with a ctx-aware loop by deriving
a cancellable timeout context (e.g., ctx, cancel := context.WithTimeout(ctx,
1*time.Minute); defer cancel()), keep a ticker (time.NewTicker(1*time.Second))
and in the select listen for ctx.Done() instead of the timeout channel, and when
probing use http.NewRequestWithContext(ctx, "GET", healthzEndpoint, nil)
followed by http.DefaultClient.Do(req) so the HTTP request is cancelable; ensure
resp.Body is closed and log ctx.Err() (using o.log.Infof) when exiting due to
context cancellation.
lca-cli/postpivot/postpivot.go (1)

275-291: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Return the recert image pull timeout/cancellation.

wait.PollUntilContextCancel is ignored here. If the 10-minute timeout expires or ctx is canceled, this block still falls through into RecertFullFlow, so the caller sees a later recert failure instead of the real pull failure.

Suggested fix
-		_ = wait.PollUntilContextCancel(ctxWithTimeout, time.Second, true, func(ctx context.Context) (bool, error) {
+		if err := wait.PollUntilContextCancel(ctxWithTimeout, time.Second, true, func(ctx context.Context) (bool, error) {
 			p.log.Info("pulling recert image")
 			command := "podman"
 			if seedReconfiguration.Proxy != nil {
 				command = fmt.Sprintf("HTTP_PROXY=%s HTTPS_PROXY=%s NO_PROXY=%s %s",
 					seedReconfiguration.Proxy.HTTPProxy, seedReconfiguration.Proxy.HTTPSProxy,
 					seedReconfiguration.Proxy.NoProxy, command)
 			}
 			if _, err := p.ops.RunBashInHostNamespace(ctx, command, "pull", "--authfile", common.ImageRegistryAuthFile, seedClusterInfo.RecertImagePullSpec); err != nil {
 				p.log.Warnf("failed to pull recert image, will retry, err: %s", err.Error())
 				return false, nil
 			}
 			return true, nil
-		})
+		}); err != nil {
+			return fmt.Errorf("failed to pull recert image %s: %w", seedClusterInfo.RecertImagePullSpec, err)
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lca-cli/postpivot/postpivot.go` around lines 275 - 291, The
PollUntilContextCancel call's result is ignored so a timeout/cancel on
ctxWithTimeout is not propagated; change the block around
wait.PollUntilContextCancel (used with ctxWithTimeout and cancel) to capture its
returned error and, if non-nil, return or wrap that error from the surrounding
function instead of falling through into RecertFullFlow; ensure you propagate
cancellation/timeout from ctxWithTimeout and preserve context when invoking
p.ops.RunInHostNamespace / p.ops.RunBashInHostNamespace so the real pull failure
is surfaced to callers.
♻️ Duplicate comments (3)
main/main.go (1)

270-274: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Startup mount-target lookup still uses context.Background().

This is the same concern raised previously: chrootOp.GetContainerStorageTarget(context.Background()) runs on a detached context, so a signal received before mgr.Start(ctx) cannot abort this host command. The signal-derived context is constructed later at line 407 (ctrl.SetupSignalHandler()); consider creating it earlier and threading it into the init calls (this lookup, and the lcautils.InitIBU / initSeedGen / lcautils.InitIPConfig calls on lines 244/249/254 that still pass context.TODO()).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@main/main.go` around lines 270 - 274, Replace the detached contexts with a
signal-derived context created before the init calls and passed into them:
create the manager/signal context (the same one used by mgr.Start(ctx) and
created by ctrl.SetupSignalHandler()) earlier, then call
chrootOp.GetContainerStorageTarget(ctx) instead of context.Background(), and
call lcautils.InitIBU(ctx,...), initSeedGen(ctx,...), and
lcautils.InitIPConfig(ctx,...) instead of context.TODO(); ensure mgr.Start
continues to receive the same ctx so startup commands abort on signals.
lca-cli/cmd/ibuStaterootSetup.go (1)

116-124: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

SIGTERM handler still uses an unbounded context.Background() for the seed-image probe.

This is the same concern raised previously: if opsClient.ImageExists blocks against a wedged container store, the SIGTERM goroutine cannot reach the timed time.Sleep / os.Exit path and shutdown is left to SIGKILL. Bound the probe with a short context.WithTimeout so the handler always makes forward progress.

🛡️ Suggested fix
-			if seedImageExists, _ := opsClient.ImageExists(context.Background(), seedImage); seedImageExists {
+			checkCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+			seedImageExists, _ := opsClient.ImageExists(checkCtx, seedImage)
+			cancel()
+			if seedImageExists {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lca-cli/cmd/ibuStaterootSetup.go` around lines 116 - 124, The SIGTERM handler
uses context.Background() when calling opsClient.ImageExists, which can block;
replace that call to use a short bounded context via context.WithTimeout (e.g.,
a few seconds) and defer cancel so the probe can't hang the handler; update the
logic around opsClient.ImageExists(seedImage) to accept the (exists, err)
return, treat errors as non-fatal (log them with logger.Error) and proceed to
the time.Sleep/exit path using prep.StaterootSetupTerminationGracePeriodSeconds
as before so the handler always makes forward progress even if the container
store is wedged.
controllers/ipc_controller.go (1)

327-334: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate canceled rpm-ostree lookups instead of treating them as “not available”.

GetUnbootedStaterootName(ctx) can now fail because the reconcile was canceled or timed out. Returning false, nil for every error makes validNextStages(...) treat an interrupted check as real state, which can incorrectly reopen config paths or hide rollback eligibility.

🐛 Suggested fix
 func isUnbootedStaterootAvailable(ctx context.Context, rpmOstreeClient rpmostreeclient.IClient) (*bool, error) {
 	if rpmOstreeClient == nil {
 		return nil, fmt.Errorf("rpmOstreeClient is nil")
 	}
 
 	unbootedStaterootName, err := rpmOstreeClient.GetUnbootedStaterootName(ctx)
-	if err != nil || unbootedStaterootName == "" {
+	if err != nil {
+		return nil, err
+	}
+	if unbootedStaterootName == "" {
 		return lo.ToPtr(false), nil
 	}
 
 	return lo.ToPtr(true), nil
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@controllers/ipc_controller.go` around lines 327 - 334, The function
isUnbootedStaterootAvailable currently swallows all errors from
rpmOstreeClient.GetUnbootedStaterootName and returns false, which hides
canceled/timeouts; change the logic so that if GetUnbootedStaterootName returns
a non-nil error you propagate that error (return nil, err) instead of returning
false, and only return lo.ToPtr(false), nil when err == nil and
unbootedStaterootName == "" — update isUnbootedStaterootAvailable to call
rpmOstreeClient.GetUnbootedStaterootName(ctx) and return nil, err on any error
so canceled/deadline errors bubble up to callers like validNextStages.
🧹 Nitpick comments (1)
lca-cli/cmd/ibi.go (1)

86-98: ⚡ Quick win

context.Background() in adapter silently drops cancellation for cleanupDevice operations

Because preinstallUtils doesn't accept a context.Context, the adapter hard-codes context.Background(). As a result, if cmd.Context() is cancelled (e.g., SIGTERM or a timeout), disk-cleanup commands dispatched through cleanupDevice will not be aborted and will run to completion (or hang) regardless of the outer cancellation signal.

This is an inherent limitation of the external library interface, but it's worth adding a short inline comment making the cancellation-loss explicit so future maintainers don't assume ctx propagates end-to-end:

📝 Suggested comment addition
 func (a *preinstallExecutorAdapter) Execute(command string, args ...string) (string, error) {
-	out, err := a.exec.Execute(context.Background(), command, args...)
+	// context.Background() is intentional: preinstall-utils Execute interface is
+	// context-unaware, so the caller's cancellation signal cannot be forwarded here.
+	out, err := a.exec.Execute(context.Background(), command, args...)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lca-cli/cmd/ibi.go` around lines 86 - 98, The adapter
preinstallExecutorAdapter.Execute currently calls
a.exec.Execute(context.Background(), ...) which intentionally drops any caller
cancellation; add a concise inline comment above the Execute method (referencing
preinstallExecutorAdapter, Execute and ops.Execute) explaining that
context.Background() is used because preinstall-utils' Execute signature lacks a
context, that this will not propagate cancellation (e.g.,
cmd.Context()/SIGTERM/timeouts) and therefore cleanupDevice invocations may not
be aborted—make the limitation explicit so future maintainers won't assume
context propagation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@controllers/seedgen_controller.go`:
- Around line 588-596: Change validateSystem to return (string, error) instead
of only msg: have validateSystem call
ostreeclient.NewClient(...).IsOstreeAdminSetDefaultFeatureEnabled(ctx) and on
probe error return ("", err) while preserving the current rejection string only
for definitive false results (return msg, nil). Update the Reconcile caller that
currently treats any non-empty rejection as permanent failure (uses
setSeedGenStatusFailed/getPhase) to check the error return first and requeue
transient probe errors (do not call setSeedGenStatusFailed), only treating a
non-empty msg with nil error as the permanent rejection path.

In `@internal/imagemgmt/imagemgmt.go`:
- Around line 260-261: The call to c.Executor.Execute(ctx, "podman", args...)
currently ignores errors and thus swallows context cancellation; change the line
in imagemgmt.go (the podman rmi invocation where output is captured) to capture
the error (output, err := c.Executor.Execute(...)), then if errors.Is(err,
context.Canceled) || errors.Is(err, context.DeadlineExceeded) return that error
immediately; otherwise, keep the old behavior of logging the output and ignoring
non-context errors (log the output with c.Log.Info and continue). Ensure you
import or reference errors and use the same ctx/c.Executor.Execute and podman
rmi call sites (i.e., the same function/statement) so cancellation is
propagated.

In `@internal/ostreeclient/ostreeclient.go`:
- Around line 70-72: The call to IsOstreeAdminSetDefaultFeatureEnabled in Deploy
is discarding its error, so probe failures are treated as "feature disabled" and
can change deploy semantics; update Deploy (the block that builds args and calls
Execute) to capture the (bool, error) result, handle the error by returning or
propagating it (instead of using `_`), and only use the boolean when err == nil;
reference the IsOstreeAdminSetDefaultFeatureEnabled result variables and ensure
Deploy returns the error so Execute is not run under a failed probe.

In `@lca-cli/cmd/ipconfig/prepivot.go`:
- Line 127: The code replaces the command context with context.Background(),
which drops cancellation/deadline propagation; instead thread the Cobra command
context into runIPConfigPrePivot by passing cmd.Context() (or a derived context)
rather than creating context.Background(), so all ctx-aware calls (the
OSTree/systemd/reboot invocations invoked inside runIPConfigPrePivot) honor
cancellation and deadlines; update the runIPConfigPrePivot signature and all
call sites to accept the incoming ctx and use that ctx for those operations.

In `@lca-cli/cmd/ipconfig/rollback.go`:
- Line 70: Replace the unconditional root context with the Cobra command's
context so rollback honors cancellations/timeouts: change the local ctx :=
context.Background() to use cmd.Context() (or directly pass cmd.Context() into
runIPConfigRollback), and ensure runIPConfigRollback is invoked with that
cmd.Context() instead of a background context so subsequent ctx-aware calls
(rollback/reboot) respect upstream aborts.

In `@lca-cli/cmd/restore.go`:
- Line 33: The CLI entry uses cmd.Execute() so restore(cmd.Context()) gets a
non-cancellable Background context; change the root command invocation to
ExecuteContext using a signal-aware context (e.g., create ctx :=
signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM) and
defer cancel()) so cmd.Context() inside restore() is cancelled on
SIGINT/SIGTERM; update imports to include os, os/signal and syscall and call
rootCmd.ExecuteContext(ctx) instead of rootCmd.Execute().

In `@lca-cli/ops/execute.go`:
- Around line 67-68: The pipe writer returned by e.log.Writer() is not closed,
leaking the goroutine backing it; change the call sites (e.g.,
regularExecutor.ExecuteWithLiveLogger and the two other methods that pass
e.log.Writer() into execute) to first obtain the writer into a variable (w :=
e.log.Writer()), defer w.Close(), then pass w to e.execute(...) so the writer is
closed after execute returns and no goroutine is leaked.

In `@lca-cli/ops/ops.go`:
- Around line 457-463: The deferred etcd teardown is using the incoming ctx
which may be canceled when RecertFullFlow returns; change the defer to call
StopEtcdServer with a non-cancelable context (e.g. context.Background() or a
fresh context) so the temporary etcd container is always stopped. Locate the
RecertFullFlow function and replace the defer that currently calls
o.StopEtcdServer(ctx, authFile, common.EtcdContainerName) with a call that uses
a new non-cancelable context while keeping the same authFile and
common.EtcdContainerName arguments and error swallowing behavior.

---

Outside diff comments:
In `@controllers/prep_handlers.go`:
- Around line 318-343: The function validateSeedOcpVersion currently uses
context.Background() for the r.Get call which drops reconcile
cancellation/timeout; change validateSeedOcpVersion to accept a context.Context
parameter and use that context when calling r.Get (replace context.Background()
with the passed ctx), update callers (validateIBUSpec and ultimately handlePrep)
to pass the reconcile ctx through to validateIBUSpec and then to
validateSeedOcpVersion so the API lookup honors the reconcile context and
timeouts.

In `@controllers/upgrade_handlers.go`:
- Around line 354-366: The autoRollbackIfEnabled method currently reuses the
reconcile ctx which can be cancelled and abort InitiateRollback; change it so
UpgHandler.autoRollbackIfEnabled creates and uses a non-cancellable context
(e.g., context.Background() or context.WithTimeout/WithDeadline) instead of the
incoming ctx when calling u.RebootClient.InitiateRollback(ctx, msg), ensuring
InitiateRollback runs to completion even if the reconcile context is cancelled;
update the call site in autoRollbackIfEnabled to pass the new context and keep
existing logging of failures.

In `@internal/reboot/reboot.go`:
- Around line 180-191: The Reboot method in IBURebootClient currently uses a
blocking time.Sleep(defaultRebootTimeout) after invoking systemd-run which
ignores ctx cancellation; replace the sleep with a select that waits on
time.After(defaultRebootTimeout) and ctx.Done() so callers can be canceled or
time out - e.g., in Reboot (and the other Reboot implementation around 467-478)
use select { case <-time.After(defaultRebootTimeout): /* proceed */; case
<-ctx.Done(): return ctx.Err() } and keep the existing log behavior when the
wait completes normally.

In `@lca-cli/ibi-preparation/ibipreparation.go`:
- Line 74: The precache flow drops the propagated context: change precacheFlow
to accept a context.Context (e.g., precacheFlow(ctx context.Context, ...)) and
update its callers (the Run(ctx) invocation and any other calls) to pass the
incoming ctx instead of using context.Background(); inside precacheFlow replace
workload.Precache(context.Background(), ...) with workload.Precache(ctx, ...) so
the long-running precache operation is cancellable and honors
timeouts/cancellation from Run(ctx).

In `@lca-cli/ipconfig/prepivot.go`:
- Around line 180-194: The deferred recovery uses the request-scoped ctx which
may be cancelled before the defer runs; change the defer to call
p.ops.EnableClusterServices with a non-cancellable context (e.g.,
context.Background()) instead of the original ctx so the recovery command can
run even if Run's ctx is cancelled; update the defer closure around
p.ops.EnableClusterServices(ctx) to create and use a background context and
preserve error handling logic that merges internalErr into the outer err.

In `@lca-cli/ops/ops.go`:
- Around line 737-745: The poll call's returned error is ignored so failures or
context cancellation during the crictl stop loop are swallowed; assign the
result of wait.PollUntilContextCancel(...) to a variable, check it, and
propagate or return a wrapped error instead of continuing to systemctl stop
crio. Locate the anonymous func passed to wait.PollUntilContextCancel in the
same block (which calls o.RunBashInHostNamespace with args and logs "Stop
running containers") and change the caller to: err :=
wait.PollUntilContextCancel(...); if err != nil { return fmt.Errorf("stopping
containers failed: %w", err) } (or propagate appropriately from the surrounding
function).
- Around line 252-275: waitForEtcd does not honor the passed ctx: replace the
ad-hoc timeout/ticker with a ctx-aware loop by deriving a cancellable timeout
context (e.g., ctx, cancel := context.WithTimeout(ctx, 1*time.Minute); defer
cancel()), keep a ticker (time.NewTicker(1*time.Second)) and in the select
listen for ctx.Done() instead of the timeout channel, and when probing use
http.NewRequestWithContext(ctx, "GET", healthzEndpoint, nil) followed by
http.DefaultClient.Do(req) so the HTTP request is cancelable; ensure resp.Body
is closed and log ctx.Err() (using o.log.Infof) when exiting due to context
cancellation.

In `@lca-cli/postpivot/postpivot.go`:
- Around line 275-291: The PollUntilContextCancel call's result is ignored so a
timeout/cancel on ctxWithTimeout is not propagated; change the block around
wait.PollUntilContextCancel (used with ctxWithTimeout and cancel) to capture its
returned error and, if non-nil, return or wrap that error from the surrounding
function instead of falling through into RecertFullFlow; ensure you propagate
cancellation/timeout from ctxWithTimeout and preserve context when invoking
p.ops.RunInHostNamespace / p.ops.RunBashInHostNamespace so the real pull failure
is surfaced to callers.

---

Duplicate comments:
In `@controllers/ipc_controller.go`:
- Around line 327-334: The function isUnbootedStaterootAvailable currently
swallows all errors from rpmOstreeClient.GetUnbootedStaterootName and returns
false, which hides canceled/timeouts; change the logic so that if
GetUnbootedStaterootName returns a non-nil error you propagate that error
(return nil, err) instead of returning false, and only return lo.ToPtr(false),
nil when err == nil and unbootedStaterootName == "" — update
isUnbootedStaterootAvailable to call
rpmOstreeClient.GetUnbootedStaterootName(ctx) and return nil, err on any error
so canceled/deadline errors bubble up to callers like validNextStages.

In `@lca-cli/cmd/ibuStaterootSetup.go`:
- Around line 116-124: The SIGTERM handler uses context.Background() when
calling opsClient.ImageExists, which can block; replace that call to use a short
bounded context via context.WithTimeout (e.g., a few seconds) and defer cancel
so the probe can't hang the handler; update the logic around
opsClient.ImageExists(seedImage) to accept the (exists, err) return, treat
errors as non-fatal (log them with logger.Error) and proceed to the
time.Sleep/exit path using prep.StaterootSetupTerminationGracePeriodSeconds as
before so the handler always makes forward progress even if the container store
is wedged.

In `@main/main.go`:
- Around line 270-274: Replace the detached contexts with a signal-derived
context created before the init calls and passed into them: create the
manager/signal context (the same one used by mgr.Start(ctx) and created by
ctrl.SetupSignalHandler()) earlier, then call
chrootOp.GetContainerStorageTarget(ctx) instead of context.Background(), and
call lcautils.InitIBU(ctx,...), initSeedGen(ctx,...), and
lcautils.InitIPConfig(ctx,...) instead of context.TODO(); ensure mgr.Start
continues to receive the same ctx so startup commands abort on signals.

---

Nitpick comments:
In `@lca-cli/cmd/ibi.go`:
- Around line 86-98: The adapter preinstallExecutorAdapter.Execute currently
calls a.exec.Execute(context.Background(), ...) which intentionally drops any
caller cancellation; add a concise inline comment above the Execute method
(referencing preinstallExecutorAdapter, Execute and ops.Execute) explaining that
context.Background() is used because preinstall-utils' Execute signature lacks a
context, that this will not propagate cancellation (e.g.,
cmd.Context()/SIGTERM/timeouts) and therefore cleanupDevice invocations may not
be aborted—make the limitation explicit so future maintainers won't assume
context propagation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: e9132831-b28e-4ebb-83db-cd05c0882981

📥 Commits

Reviewing files that changed from the base of the PR and between 0f3c830 and 3a47e05.

📒 Files selected for processing (56)
  • controllers/ibu_controller.go
  • controllers/ibu_controller_test.go
  • controllers/idle_handlers.go
  • controllers/idle_handlers_test.go
  • controllers/ipc_config_handlers.go
  • controllers/ipc_config_handlers_test.go
  • controllers/ipc_controller.go
  • controllers/ipc_controller_test.go
  • controllers/ipc_idle_handlers.go
  • controllers/ipc_idle_handlers_test.go
  • controllers/ipc_rollback_handlers.go
  • controllers/ipc_rollback_handlers_test.go
  • controllers/prep_handlers.go
  • controllers/rollback_handlers.go
  • controllers/seedgen_controller.go
  • controllers/upgrade_handlers.go
  • controllers/upgrade_handlers_test.go
  • internal/imagemgmt/imagemgmt.go
  • internal/imagemgmt/imagemgmt_test.go
  • internal/imagemgmt/mock_imagemgmt.go
  • internal/ostreeclient/mock_ostreeclient.go
  • internal/ostreeclient/ostreeclient.go
  • internal/precache/workload/pullImages.go
  • internal/prep/prep.go
  • internal/reboot/mock_reboot.go
  • internal/reboot/reboot.go
  • internal/reboot/reboot_test.go
  • lca-cli/cmd/create.go
  • lca-cli/cmd/ibi.go
  • lca-cli/cmd/ibuPrecacheWorkload.go
  • lca-cli/cmd/ibuStaterootSetup.go
  • lca-cli/cmd/ipconfig/postpivot.go
  • lca-cli/cmd/ipconfig/prepivot.go
  • lca-cli/cmd/ipconfig/rollback.go
  • lca-cli/cmd/postpivotcmd.go
  • lca-cli/cmd/restore.go
  • lca-cli/ibi-preparation/ibipreparation.go
  • lca-cli/ibi-preparation/ibipreparation_test.go
  • lca-cli/initmonitor/initmonitor.go
  • lca-cli/ipconfig/postpivot.go
  • lca-cli/ipconfig/postpivot_test.go
  • lca-cli/ipconfig/prepivot.go
  • lca-cli/ipconfig/prepivot_test.go
  • lca-cli/ipconfig/rollback.go
  • lca-cli/ipconfig/rollback_test.go
  • lca-cli/ops/execute.go
  • lca-cli/ops/mock_execute.go
  • lca-cli/ops/mock_ops.go
  • lca-cli/ops/ops.go
  • lca-cli/ostreeclient/mock_rpmostreeclient.go
  • lca-cli/ostreeclient/rpmostreeclient.go
  • lca-cli/postpivot/postpivot.go
  • lca-cli/postpivot/postpivot_test.go
  • lca-cli/seedcreator/seedcreator.go
  • lca-cli/seedrestoration/seedrestoration.go
  • main/main.go
✅ Files skipped from review due to trivial changes (1)
  • lca-cli/ipconfig/postpivot.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • lca-cli/cmd/create.go
  • controllers/ibu_controller_test.go
  • lca-cli/cmd/ipconfig/postpivot.go

Comment on lines 588 to 596
func (r *SeedGeneratorReconciler) validateSystem(ctx context.Context) (msg string) {
// Check that the "ostree admin set-default" feature is available
if !ostreeclient.NewClient(r.Executor, false).IsOstreeAdminSetDefaultFeatureEnabled() {
if enabled, err := ostreeclient.NewClient(r.Executor, false).IsOstreeAdminSetDefaultFeatureEnabled(ctx); err != nil {
msg = fmt.Sprintf("Failure occurred during ostree admin set-default feature check: %v", err)
return
} else if !enabled {
msg = "Rejected: Installed release does not support \"ostree admin set-default\" feature"
return
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Transient ostree probe errors flip seedgen into terminal Failed state.

When IsOstreeAdminSetDefaultFeatureEnabled(ctx) returns an error (e.g., context cancellation during reconcile, transient ostree admin --help failure), validateSystem sets msg to a non-empty rejection string. The caller in Reconcile (around line 1017) treats any non-empty rejection as a permanent failure via setSeedGenStatusFailed, after which getPhase returns phaseFailed and the user must delete and recreate the CR to retry. Now that ctx cancellations can produce errors here, transient cancellations during reconcile teardown could permanently fail seedgen.

Consider returning the error separately from validateSystem so Reconcile can distinguish "rejected by validation" from "transient probe failure" and requeue the latter instead of marking the CR failed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@controllers/seedgen_controller.go` around lines 588 - 596, Change
validateSystem to return (string, error) instead of only msg: have
validateSystem call
ostreeclient.NewClient(...).IsOstreeAdminSetDefaultFeatureEnabled(ctx) and on
probe error return ("", err) while preserving the current rejection string only
for definitive false results (return msg, nil). Update the Reconcile caller that
currently treats any non-empty rejection as permanent failure (uses
setSeedGenStatusFailed/getPhase) to check the error return first and requeue
transient probe errors (do not call setSeedGenStatusFailed), only treating a
non-empty msg with nil error as the permanent rejection path.

Comment thread internal/imagemgmt/imagemgmt.go Outdated
Comment on lines 260 to 261
output, _ := c.Executor.Execute(ctx, "podman", args...)
c.Log.Info("Deleted unused images", "output", output)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't swallow cancellation from podman rmi.

Execute(ctx, ...) now returns context.Canceled / context.DeadlineExceeded too, but this line drops the error unconditionally and keeps cleaning images. That breaks the new cancellation contract and can make callers see a normal cleanup result after they already aborted the operation. Keep ignoring ordinary podman rmi failures, but return when the context is done.

Suggested fix
-			output, _ := c.Executor.Execute(ctx, "podman", args...)
-			c.Log.Info("Deleted unused images", "output", output)
+			output, err := c.Executor.Execute(ctx, "podman", args...)
+			if err != nil {
+				if ctx.Err() != nil {
+					return ctx.Err()
+				}
+				c.Log.Info("Ignoring podman rmi failure", "error", err, "targets", toRemove)
+			} else {
+				c.Log.Info("Deleted unused images", "output", output)
+			}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
output, _ := c.Executor.Execute(ctx, "podman", args...)
c.Log.Info("Deleted unused images", "output", output)
output, err := c.Executor.Execute(ctx, "podman", args...)
if err != nil {
if ctx.Err() != nil {
return ctx.Err()
}
c.Log.Info("Ignoring podman rmi failure", "error", err, "targets", toRemove)
} else {
c.Log.Info("Deleted unused images", "output", output)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/imagemgmt/imagemgmt.go` around lines 260 - 261, The call to
c.Executor.Execute(ctx, "podman", args...) currently ignores errors and thus
swallows context cancellation; change the line in imagemgmt.go (the podman rmi
invocation where output is captured) to capture the error (output, err :=
c.Executor.Execute(...)), then if errors.Is(err, context.Canceled) ||
errors.Is(err, context.DeadlineExceeded) return that error immediately;
otherwise, keep the old behavior of logging the output and ignoring non-context
errors (log the output with c.Log.Info and continue). Ensure you import or
reference errors and use the same ctx/c.Executor.Execute and podman rmi call
sites (i.e., the same function/statement) so cancellation is propagated.

Comment thread internal/ostreeclient/ostreeclient.go Outdated
Comment thread lca-cli/cmd/ipconfig/prepivot.go Outdated
Comment thread lca-cli/cmd/ipconfig/rollback.go Outdated
Comment thread lca-cli/cmd/restore.go
Short: "Restore seed cluster configurations",
Run: func(cmd *cobra.Command, args []string) {
restore()
restore(cmd.Context())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Check if the CLI main entry point uses ExecuteContext with signal handling
rg -n --type=go -C5 "ExecuteContext\|ExecuteWithContext\|signal\.NotifyContext\|rootCmd\.Execute" lca-cli/

Repository: openshift-kni/lifecycle-agent

Length of output: 55


🏁 Script executed:

#!/bin/bash
# Find main.go file(s)
find . -name "main.go" -type f | head -20

Repository: openshift-kni/lifecycle-agent

Length of output: 685


🏁 Script executed:

#!/bin/bash
# Search for rootCmd initialization and usage across the codebase
rg -n "rootCmd\s*:=|rootCmd\s*=|var rootCmd" --type=go | head -20

Repository: openshift-kni/lifecycle-agent

Length of output: 194


🏁 Script executed:

#!/bin/bash
# Check the structure of lca-cli directory
ls -la lca-cli/ 2>/dev/null || echo "lca-cli not found at root level"
fd -t f -e go lca-cli/ | head -30

Repository: openshift-kni/lifecycle-agent

Length of output: 1431


🏁 Script executed:

#!/bin/bash
# Check the CLI main entry point
cat -n ./main/lca-cli/main.go

Repository: openshift-kni/lifecycle-agent

Length of output: 894


🏁 Script executed:

#!/bin/bash
# Check the root command definition
cat -n lca-cli/cmd/root.go

Repository: openshift-kni/lifecycle-agent

Length of output: 3775


🏁 Script executed:

#!/bin/bash
# Verify the context threading in restore.go (mentioned in the original review)
sed -n '20,60p' lca-cli/cmd/restore.go

Repository: openshift-kni/lifecycle-agent

Length of output: 1125


Wire signal-aware context at the CLI entry point to make context cancellation effective.

The context threading in restore.go is correct, but it receives context.Background() from the CLI entry point since cmd.Execute() at lca-cli/cmd/root.go:91 does not pass a cancellable context. This means SIGINT/SIGTERM will not propagate to long-running child processes like podman pull or rpm-ostree, undermining the PR's goal of enabling cancellation support.

Update lca-cli/cmd/root.go:91 to use ExecuteContext with signal handling:

Fix for cmd/root.go
+import (
+	"context"
+	"os"
+	"os/signal"
+	"syscall"
+)

-	if err := rootCmd.Execute(); err != nil {
+	ctx, stop := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
+	defer stop()
+	if err := rootCmd.ExecuteContext(ctx); err != nil {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lca-cli/cmd/restore.go` at line 33, The CLI entry uses cmd.Execute() so
restore(cmd.Context()) gets a non-cancellable Background context; change the
root command invocation to ExecuteContext using a signal-aware context (e.g.,
create ctx := signal.NotifyContext(context.Background(), os.Interrupt,
syscall.SIGTERM) and defer cancel()) so cmd.Context() inside restore() is
cancelled on SIGINT/SIGTERM; update imports to include os, os/signal and syscall
and call rootCmd.ExecuteContext(ctx) instead of rootCmd.Execute().

Comment thread lca-cli/ops/execute.go
Comment on lines +67 to +68
func (e *regularExecutor) ExecuteWithLiveLogger(ctx context.Context, command string, args ...string) (string, error) {
return e.execute(ctx, e.log.Writer(), "", command, args...)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

cd lca-cli && git ls-files | grep -E "execute\.go|go\.mod" | head -20

Repository: openshift-kni/lifecycle-agent

Length of output: 108


🏁 Script executed:

cat -n lca-cli/ops/execute.go

Repository: openshift-kni/lifecycle-agent

Length of output: 5790


🌐 Web query:

logrus Logger Writer() method pipe close documentation

💡 Result:

In logrus, Logger.Writer returns an *io.PipeWriter, created by io.Pipe. The doc/behavior explicitly says callers are responsible for closing the returned writer when done. [1][2] What to call / expect - logger.Writer is a convenience that calls WriterLevel(InfoLevel) and returns the PipeWriter. [1] - Each line written to the PipeWriter is scanned and printed to the logger using the configured formatters/hooks. [2] - You should close it yourself (typical pattern is defer w.Close). [2] Does w.Close “pipe close” the reader and flush synchronously? - Closing the PipeWriter is non-deterministic in the sense that logs are produced by a goroutine scanning the pipe reader; depending on timing (e.g., tests/process exit), some writes may not appear even after Close. This is discussed in logrus issues. [3] Does closing the PipeWriter close some underlying file? - io.PipeWriter has no “underlying file”; it only closes the writer end of the pipe. Any actual file (if any) is on the logger’s Logger.Out side, and logrus can’t close that for you because Logger.Out is just an io.Writer. [4][5] - Therefore, closing w (the PipeWriter you got from Writer) does not automatically close whatever writer the logger is outputting to; you must close that writer separately if it needs closing (e.g., an *os.File you set as logger.Out via SetOutput). [4][5] Practical answer (code) - Use: w:= logger.Writer defer w.Close (then write to w; e.g., hook it to stdlib log.New). [2] - If your logger ultimately writes to a file you opened, close that file separately (closing w only closes the pipe writer end). [4][5] If you tell me your exact logrus version and how you wire stdlib log / http server / file output, I can suggest the safest shutdown/flush ordering for your case.

Citations:


🏁 Script executed:

# Verify all ExecuteWithLiveLogger implementations are covered
rg "ExecuteWithLiveLogger" lca-cli/ops/execute.go

Repository: openshift-kni/lifecycle-agent

Length of output: 521


🏁 Script executed:

# Confirm no other executor types with ExecuteWithLiveLogger exist
rg "type.*Executor.*struct" lca-cli/ops/execute.go

Repository: openshift-kni/lifecycle-agent

Length of output: 162


Close the pipe writer returned by logrus.Logger.Writer() to prevent goroutine leaks.

These three methods pass e.log.Writer() directly to execute functions without closing it. The pipe writer is backed by a goroutine that remains alive if not closed, accumulating leaked resources in long-lived controller processes.

♻️ Suggested fix
 func (e *regularExecutor) ExecuteWithLiveLogger(ctx context.Context, command string, args ...string) (string, error) {
-	return e.execute(ctx, e.log.Writer(), "", command, args...)
+	writer := e.log.Writer()
+	defer writer.Close()
+	return e.execute(ctx, writer, "", command, args...)
 }
 
 func (e *nsenterExecutor) ExecuteWithLiveLogger(ctx context.Context, command string, args ...string) (string, error) {
-	return e.baseExecute(ctx, e.log.Writer(), command, args...)
+	writer := e.log.Writer()
+	defer writer.Close()
+	return e.baseExecute(ctx, writer, command, args...)
 }
 
 func (e *chrootExecutor) ExecuteWithLiveLogger(ctx context.Context, command string, args ...string) (string, error) {
-	return e.baseExecute(ctx, e.log.Writer(), command, args...)
+	writer := e.log.Writer()
+	defer writer.Close()
+	return e.baseExecute(ctx, writer, command, args...)
 }

Applies to lines: 67–68, 79–80, 117–118.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (e *regularExecutor) ExecuteWithLiveLogger(ctx context.Context, command string, args ...string) (string, error) {
return e.execute(ctx, e.log.Writer(), "", command, args...)
func (e *regularExecutor) ExecuteWithLiveLogger(ctx context.Context, command string, args ...string) (string, error) {
writer := e.log.Writer()
defer writer.Close()
return e.execute(ctx, writer, "", command, args...)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lca-cli/ops/execute.go` around lines 67 - 68, The pipe writer returned by
e.log.Writer() is not closed, leaking the goroutine backing it; change the call
sites (e.g., regularExecutor.ExecuteWithLiveLogger and the two other methods
that pass e.log.Writer() into execute) to first obtain the writer into a
variable (w := e.log.Writer()), defer w.Close(), then pass w to e.execute(...)
so the writer is closed after execute returns and no goroutine is leaked.

Comment thread lca-cli/ops/ops.go Outdated
@sebrandon1
Copy link
Copy Markdown
Contributor Author

/retest

@sebrandon1 sebrandon1 force-pushed the add-context-to-execute-interface branch from 3a47e05 to aaae2ba Compare May 20, 2026 15:37
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
lca-cli/seedrestoration/seedrestoration.go (1)

119-123: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard filepath.Walk callback error before dereferencing info.

Line 122 can panic when Walk calls the callback with err != nil and info == nil.

Proposed fix
 func (s *SeedRestoration) cleanupServiceUnits(ctx context.Context) error {
 	dir := filepath.Join(common.InstallationConfigurationFilesDir, "services")
 	err := filepath.Walk(dir, func(path string, info os.FileInfo, err error) error {
+		if err != nil {
+			return err
+		}
+		if info == nil {
+			return fmt.Errorf("walk returned nil file info for %s", path)
+		}
 		if info.IsDir() {
 			return nil
 		}

As per coding guidelines, “Focus on major issues impacting performance, readability, maintainability and security.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lca-cli/seedrestoration/seedrestoration.go` around lines 119 - 123, The
filepath.Walk callback in function cleanupServiceUnits dereferences info without
checking the callback error; guard the callback by checking if err != nil at the
top of the anonymous func and return that error (or skip) before referencing
info to avoid nil dereference/panic. Update the callback used in filepath.Walk
to first handle the incoming err (and treat nil info defensively) and then
continue with the existing logic that checks info.IsDir() and paths.
lca-cli/ibi-preparation/ibipreparation.go (1)

74-75: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

precacheFlow drops cancellation by using context.Background().

Line 133 bypasses the Run(ctx) context, so precache cannot be cancelled/time-limited by the caller.

🔧 Proposed fix
-	if err := i.precacheFlow(common.ContainersListFilePath, common.IBISeedInfoFilePath, defaultRegistriesConfFile); err != nil {
+	if err := i.precacheFlow(ctx, common.ContainersListFilePath, common.IBISeedInfoFilePath, defaultRegistriesConfFile); err != nil {
 		return fmt.Errorf("failed to precache: %w", err)
 	}
-func (i *IBIPrepare) precacheFlow(imageListFile, seedInfoFile, registriesConfFile string) error {
+func (i *IBIPrepare) precacheFlow(ctx context.Context, imageListFile, seedInfoFile, registriesConfFile string) error {
-	if err := workload.Precache(context.Background(), imageList, common.IBIPullSecretFilePath, i.config.PrecacheBestEffort); err != nil {
+	if err := workload.Precache(ctx, imageList, common.IBIPullSecretFilePath, i.config.PrecacheBestEffort); err != nil {
 		return fmt.Errorf("failed to start precache: %w", err)
 	}

Also applies to: 89-90, 133-134

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lca-cli/ibi-preparation/ibipreparation.go` around lines 74 - 75, precacheFlow
currently ignores caller cancellation by creating its own context (using
context.Background()); change precacheFlow to accept a context.Context parameter
and propagate that context through all internal operations instead of calling
context.Background(), then update callers (the Run(ctx) method and any
i.precacheFlow(...) invocations) to pass the incoming ctx so
cancellation/timeouts are honored; ensure any downstream calls inside
precacheFlow (HTTP/IO/subprocess functions) also accept and use the passed ctx.
controllers/upgrade_handlers.go (1)

79-84: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t mark upgrade as permanently failed on context cancellation/deadline.

Line 79-Line 84 treats all errors from IsOrigStaterootBooted(ctx, ...) as terminal failures. With context-aware downstream calls, transient context.Canceled/context.DeadlineExceeded can now incorrectly fail the CR.

💡 Suggested fix
 import (
 	"context"
+	"errors"
 	"fmt"
 	"os"
 	"path/filepath"
 	"strings"
 )
@@
 	origStaterootBooted, err := r.RebootClient.IsOrigStaterootBooted(ctx, ibu.Spec.SeedImageRef.Version)
 
 	if err != nil {
+		if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
+			return requeueWithShortInterval(), err
+		}
 		utils.SetUpgradeStatusFailed(ibu, err.Error())
 		return doNotRequeue(), nil
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@controllers/upgrade_handlers.go` around lines 79 - 84, The current code
treats any error from r.RebootClient.IsOrigStaterootBooted(ctx, ...) as a
terminal failure and calls utils.SetUpgradeStatusFailed; instead detect context
cancellation/timeouts (use errors.Is(err, context.Canceled) || errors.Is(err,
context.DeadlineExceeded)) after the call and in that case do NOT call
utils.SetUpgradeStatusFailed — return early so the controller can retry (e.g.,
return doNotRequeue(), err) or propagate the context error appropriately; only
call utils.SetUpgradeStatusFailed for non-context-related errors.
internal/reboot/reboot.go (1)

180-191: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Respect context cancellation while waiting for reboot completion.

Both IBURebootClient.Reboot (lines 180–191) and IPCRebootClient.Reboot (lines 467–478) ignore ctx.Done() and can block for a full 60 minutes even after the caller cancels the context. This violates the context contract and prevents graceful cancellation.

💡 Suggested fix
 	c.log.Info(fmt.Sprintf("Wait for %s to be killed via SIGTERM", defaultRebootTimeout.String()))
-	time.Sleep(defaultRebootTimeout)
-
-	return fmt.Errorf("failed to reboot. This should never happen! Please check the system")
+	select {
+	case <-ctx.Done():
+		return fmt.Errorf("reboot wait aborted: %w", ctx.Err())
+	case <-time.After(defaultRebootTimeout):
+		return fmt.Errorf("failed to reboot. This should never happen! Please check the system")
+	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/reboot/reboot.go` around lines 180 - 191, The Reboot methods
(IBURebootClient.Reboot and IPCRebootClient.Reboot) block using
time.Sleep(defaultRebootTimeout) and ignore ctx cancellation; replace the sleep
with a context-aware wait (e.g. create a timer with
time.NewTimer(defaultRebootTimeout) and use a select between <-timer.C and
<-ctx.Done()), cancel/stop the timer appropriately, and when ctx is done return
ctx.Err() (or wrap it) instead of continuing to the final "should never happen"
error; keep the existing hostCommandsExecutor.Execute call and logging (use
defaultRebootTimeout, IBURebootClient.Reboot and IPCRebootClient.Reboot
identifiers to locate the changes).
lca-cli/postpivot/postpivot.go (1)

275-291: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate poll cancellation instead of falling through.

Both waits discard wait.PollUntilContextCancel(...)'s return value. That means a canceled/timed-out context still lets post-pivot continue: the recert path can drop into RecertFullFlow after the 10-minute image pull timeout, and the node-IP path can keep configuring networking from incomplete/stale kubelet state. Please return the polling error in both places.

Suggested fix
-		_ = wait.PollUntilContextCancel(ctxWithTimeout, time.Second, true, func(ctx context.Context) (bool, error) {
+		if err := wait.PollUntilContextCancel(ctxWithTimeout, time.Second, true, func(ctx context.Context) (bool, error) {
 			p.log.Info("pulling recert image")
 			command := "podman"
 			if seedReconfiguration.Proxy != nil {
 				command = fmt.Sprintf("HTTP_PROXY=%s HTTPS_PROXY=%s NO_PROXY=%s %s",
 					seedReconfiguration.Proxy.HTTPProxy, seedReconfiguration.Proxy.HTTPSProxy,
 					seedReconfiguration.Proxy.NoProxy, command)
 			}
 			if _, err := p.ops.RunBashInHostNamespace(ctx, command, "pull", "--authfile", common.ImageRegistryAuthFile, seedClusterInfo.RecertImagePullSpec); err != nil {
 				p.log.Warnf("failed to pull recert image, will retry, err: %s", err.Error())
 				return false, nil
 			}
 			return true, nil
-		})
+		}); err != nil {
+			return fmt.Errorf("failed to pull recert image: %w", err)
+		}
 	}
-		_ = wait.PollUntilContextCancel(ctx, time.Second, true, func(ctx context.Context) (done bool, err error) {
+		if err := wait.PollUntilContextCancel(ctx, time.Second, true, func(ctx context.Context) (done bool, err error) {
 			if _, err := os.Stat(nodePrimaryIPFile); err != nil {
 				return false, nil
 			}
 			return true, nil
-		})
+		}); err != nil {
+			return fmt.Errorf("failed waiting for nodeip-configuration: %w", err)
+		}
 	}

Also applies to: 750-761

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lca-cli/postpivot/postpivot.go` around lines 275 - 291, The polling calls to
wait.PollUntilContextCancel are currently ignored which lets execution continue
after timeouts/cancellations; update the blocks around
p.ops.RunInHostNamespace/RunBashInHostNamespace so you capture the error
returned by wait.PollUntilContextCancel and propagate it (return it) instead of
discarding it; specifically, after calling wait.PollUntilContextCancel in the
recert image pull loop and the node-IP/kubelet polling loop, check the returned
error and return it up the call stack (so callers like RecertFullFlow or the
networking configuration path stop on timeout/cancel) rather than falling
through.
lca-cli/ops/ops.go (1)

252-276: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make waitForEtcd actually honor the passed context.

This method now takes ctx, but the loop never checks ctx.Done() and the probe request is still made with plain http.Get. A canceled reconcile/CLI call can therefore stay blocked here for the full minute, which defeats the main goal of this PR.

Suggested fix
 func (o *ops) waitForEtcd(ctx context.Context, healthzEndpoint string) error {
-	timeout := time.After(1 * time.Minute)
+	ctx, cancel := context.WithTimeout(ctx, time.Minute)
+	defer cancel()
 	ticker := time.NewTicker(1 * time.Second)
 	defer ticker.Stop()

 	for {
 		select {
-		case <-timeout:
-			return fmt.Errorf("timeout waiting for etcd")
+		case <-ctx.Done():
+			return ctx.Err()
 		case <-ticker.C:
-			resp, err := http.Get(healthzEndpoint) //nolint:gosec
+			req, err := http.NewRequestWithContext(ctx, http.MethodGet, healthzEndpoint, nil)
+			if err != nil {
+				return fmt.Errorf("failed to build etcd health request: %w", err)
+			}
+			resp, err := http.DefaultClient.Do(req) //nolint:gosec
 			if err != nil {
 				o.log.Infof("Waiting for etcd: %s", err)
 				continue
 			}
-			defer func() { _ = resp.Body.Close() }()
+			_ = resp.Body.Close()

 			if resp.StatusCode != http.StatusOK {
 				o.log.Infof("Waiting for etcd, status: %d", resp.StatusCode)
 				continue
 			}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lca-cli/ops/ops.go` around lines 252 - 276, waitForEtcd currently ignores ctx
and uses http.Get; update it to respect cancellation by adding a select case for
<-ctx.Done() that returns ctx.Err(), and replace http.Get(healthzEndpoint) with
an HTTP request built with http.NewRequestWithContext(ctx, "GET",
healthzEndpoint, nil) and executed via http.DefaultClient.Do(req) so the request
is cancelable; also stop deferring resp.Body.Close() inside the loop—call
resp.Body.Close() immediately after you're done with the response (before
continue) to avoid leaking closers across iterations.
♻️ Duplicate comments (1)
internal/imagemgmt/imagemgmt.go (1)

259-264: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate cancellation from podman rmi failures.

Line 259-263 still logs-and-continues for all errors, including canceled/deadline-expired contexts. That breaks the new cancellation contract and can keep deleting images after abort.

Suggested fix
-			output, err := c.Executor.Execute(ctx, "podman", args...)
-			if err != nil {
-				c.Log.Info("Some images could not be removed (may be in use)", "error", err)
-			}
-			c.Log.Info("Deleted unused images", "output", output)
+			output, err := c.Executor.Execute(ctx, "podman", args...)
+			if err != nil {
+				if ctx.Err() != nil {
+					return ctx.Err()
+				}
+				c.Log.Info("Some images could not be removed (may be in use)", "error", err)
+			} else {
+				c.Log.Info("Deleted unused images", "output", output)
+			}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/imagemgmt/imagemgmt.go` around lines 259 - 264, The current handling
of errors from c.Executor.Execute in the image deletion loop logs-and-continues
even when the context was canceled/expired; change the error handling so that
after Execute returns an error you detect cancellation (check ctx.Err()!=nil
and/or errors.Is(err, context.Canceled) || errors.Is(err,
context.DeadlineExceeded)) and immediately return that error to propagate
cancellation instead of logging and continuing; for non-cancellation errors keep
the existing info log ("Some images could not be removed (may be in use)") and
continue, and ensure you reference the Execute call (c.Executor.Execute) and
logging calls (c.Log.Info) when making this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@controllers/idle_handlers.go`:
- Around line 257-267: The cleanup paths treat context cancellation as routine
failures; change both the cleanupUnbootedStateroot call and the
rpmOstreeClient.RpmOstreeCleanup call to detect context cancellation (use
errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) or
errors.Is with ctx.Err()) and immediately return that error (or propagate/abort)
instead of logging and incrementing failures; update the logic around
cleanupUnbootedStateroot, rpmOstreeClient.RpmOstreeCleanup, ctx, log and
failures so context-cancellation/timeout results short-circuit the function
without mutating failures or continuing other cleanup steps.

In `@lca-cli/cmd/ibi.go`:
- Around line 73-74: preinstallExecutorAdapter currently forces
context.Background() inside its execution methods, causing commands invoked via
runIBI(cmd.Context()) to ignore cancellation/timeouts; update
preinstallExecutorAdapter so it uses the caller-supplied context instead of
context.Background() (either by adding a ctx parameter to its Exec/Run/Execute
methods or by storing a context field set from cmd.Context() when constructing
the adapter) and propagate cmd.Context() from runIBI into
preinstallUtils.NewCleanupDevice / preinstallUtils.NewDiskOps (or any call sites
that create the adapter) so all commands respect CLI cancellation and timeouts.

In `@lca-cli/ops/ops.go`:
- Around line 691-694: The current check for o.SystemctlAction(ctx, "is-active",
containerStorageMountUnit) treats any error as "no active mount" and returns
nil; change it to only swallow the specific error that indicates the unit is
inactive (the expected systemctl exit/status for inactive) and propagate all
other errors (including context cancellation/timeouts) so callers don't proceed
on bogus state. Update both occurrences (the call using o.SystemctlAction with
"is-active" and containerStorageMountUnit around lines referenced) to inspect
the error from SystemctlAction and return it unless it matches the known
inactive result; keep the original behavior for the true inactive case.

---

Outside diff comments:
In `@controllers/upgrade_handlers.go`:
- Around line 79-84: The current code treats any error from
r.RebootClient.IsOrigStaterootBooted(ctx, ...) as a terminal failure and calls
utils.SetUpgradeStatusFailed; instead detect context cancellation/timeouts (use
errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded))
after the call and in that case do NOT call utils.SetUpgradeStatusFailed —
return early so the controller can retry (e.g., return doNotRequeue(), err) or
propagate the context error appropriately; only call
utils.SetUpgradeStatusFailed for non-context-related errors.

In `@internal/reboot/reboot.go`:
- Around line 180-191: The Reboot methods (IBURebootClient.Reboot and
IPCRebootClient.Reboot) block using time.Sleep(defaultRebootTimeout) and ignore
ctx cancellation; replace the sleep with a context-aware wait (e.g. create a
timer with time.NewTimer(defaultRebootTimeout) and use a select between
<-timer.C and <-ctx.Done()), cancel/stop the timer appropriately, and when ctx
is done return ctx.Err() (or wrap it) instead of continuing to the final "should
never happen" error; keep the existing hostCommandsExecutor.Execute call and
logging (use defaultRebootTimeout, IBURebootClient.Reboot and
IPCRebootClient.Reboot identifiers to locate the changes).

In `@lca-cli/ibi-preparation/ibipreparation.go`:
- Around line 74-75: precacheFlow currently ignores caller cancellation by
creating its own context (using context.Background()); change precacheFlow to
accept a context.Context parameter and propagate that context through all
internal operations instead of calling context.Background(), then update callers
(the Run(ctx) method and any i.precacheFlow(...) invocations) to pass the
incoming ctx so cancellation/timeouts are honored; ensure any downstream calls
inside precacheFlow (HTTP/IO/subprocess functions) also accept and use the
passed ctx.

In `@lca-cli/ops/ops.go`:
- Around line 252-276: waitForEtcd currently ignores ctx and uses http.Get;
update it to respect cancellation by adding a select case for <-ctx.Done() that
returns ctx.Err(), and replace http.Get(healthzEndpoint) with an HTTP request
built with http.NewRequestWithContext(ctx, "GET", healthzEndpoint, nil) and
executed via http.DefaultClient.Do(req) so the request is cancelable; also stop
deferring resp.Body.Close() inside the loop—call resp.Body.Close() immediately
after you're done with the response (before continue) to avoid leaking closers
across iterations.

In `@lca-cli/postpivot/postpivot.go`:
- Around line 275-291: The polling calls to wait.PollUntilContextCancel are
currently ignored which lets execution continue after timeouts/cancellations;
update the blocks around p.ops.RunInHostNamespace/RunBashInHostNamespace so you
capture the error returned by wait.PollUntilContextCancel and propagate it
(return it) instead of discarding it; specifically, after calling
wait.PollUntilContextCancel in the recert image pull loop and the
node-IP/kubelet polling loop, check the returned error and return it up the call
stack (so callers like RecertFullFlow or the networking configuration path stop
on timeout/cancel) rather than falling through.

In `@lca-cli/seedrestoration/seedrestoration.go`:
- Around line 119-123: The filepath.Walk callback in function
cleanupServiceUnits dereferences info without checking the callback error; guard
the callback by checking if err != nil at the top of the anonymous func and
return that error (or skip) before referencing info to avoid nil
dereference/panic. Update the callback used in filepath.Walk to first handle the
incoming err (and treat nil info defensively) and then continue with the
existing logic that checks info.IsDir() and paths.

---

Duplicate comments:
In `@internal/imagemgmt/imagemgmt.go`:
- Around line 259-264: The current handling of errors from c.Executor.Execute in
the image deletion loop logs-and-continues even when the context was
canceled/expired; change the error handling so that after Execute returns an
error you detect cancellation (check ctx.Err()!=nil and/or errors.Is(err,
context.Canceled) || errors.Is(err, context.DeadlineExceeded)) and immediately
return that error to propagate cancellation instead of logging and continuing;
for non-cancellation errors keep the existing info log ("Some images could not
be removed (may be in use)") and continue, and ensure you reference the Execute
call (c.Executor.Execute) and logging calls (c.Log.Info) when making this
change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: c26b3938-951c-4361-82b3-7d4c04de035e

📥 Commits

Reviewing files that changed from the base of the PR and between 3a47e05 and aaae2ba.

📒 Files selected for processing (56)
  • controllers/ibu_controller.go
  • controllers/ibu_controller_test.go
  • controllers/idle_handlers.go
  • controllers/idle_handlers_test.go
  • controllers/ipc_config_handlers.go
  • controllers/ipc_config_handlers_test.go
  • controllers/ipc_controller.go
  • controllers/ipc_controller_test.go
  • controllers/ipc_idle_handlers.go
  • controllers/ipc_idle_handlers_test.go
  • controllers/ipc_rollback_handlers.go
  • controllers/ipc_rollback_handlers_test.go
  • controllers/prep_handlers.go
  • controllers/rollback_handlers.go
  • controllers/seedgen_controller.go
  • controllers/upgrade_handlers.go
  • controllers/upgrade_handlers_test.go
  • internal/imagemgmt/imagemgmt.go
  • internal/imagemgmt/imagemgmt_test.go
  • internal/imagemgmt/mock_imagemgmt.go
  • internal/ostreeclient/mock_ostreeclient.go
  • internal/ostreeclient/ostreeclient.go
  • internal/precache/workload/pullImages.go
  • internal/prep/prep.go
  • internal/reboot/mock_reboot.go
  • internal/reboot/reboot.go
  • internal/reboot/reboot_test.go
  • lca-cli/cmd/create.go
  • lca-cli/cmd/ibi.go
  • lca-cli/cmd/ibuPrecacheWorkload.go
  • lca-cli/cmd/ibuStaterootSetup.go
  • lca-cli/cmd/ipconfig/postpivot.go
  • lca-cli/cmd/ipconfig/prepivot.go
  • lca-cli/cmd/ipconfig/rollback.go
  • lca-cli/cmd/postpivotcmd.go
  • lca-cli/cmd/restore.go
  • lca-cli/ibi-preparation/ibipreparation.go
  • lca-cli/ibi-preparation/ibipreparation_test.go
  • lca-cli/initmonitor/initmonitor.go
  • lca-cli/ipconfig/postpivot.go
  • lca-cli/ipconfig/postpivot_test.go
  • lca-cli/ipconfig/prepivot.go
  • lca-cli/ipconfig/prepivot_test.go
  • lca-cli/ipconfig/rollback.go
  • lca-cli/ipconfig/rollback_test.go
  • lca-cli/ops/execute.go
  • lca-cli/ops/mock_execute.go
  • lca-cli/ops/mock_ops.go
  • lca-cli/ops/ops.go
  • lca-cli/ostreeclient/mock_rpmostreeclient.go
  • lca-cli/ostreeclient/rpmostreeclient.go
  • lca-cli/postpivot/postpivot.go
  • lca-cli/postpivot/postpivot_test.go
  • lca-cli/seedcreator/seedcreator.go
  • lca-cli/seedrestoration/seedrestoration.go
  • main/main.go
✅ Files skipped from review due to trivial changes (6)
  • lca-cli/ops/mock_execute.go
  • internal/imagemgmt/mock_imagemgmt.go
  • internal/reboot/mock_reboot.go
  • lca-cli/ostreeclient/mock_rpmostreeclient.go
  • internal/ostreeclient/mock_ostreeclient.go
  • lca-cli/ops/mock_ops.go

Comment on lines +257 to 267
if err := cleanupUnbootedStateroot(ctx, stateroot, ops, ostreeClient, rpmOstreeClient); err != nil {
log.Error(err, "failed to remove stateroot", "stateroot", stateroot)
failures += 1
}
}

// clear temporary files and reclaim disk space
if err := rpmOstreeClient.RpmOstreeCleanup(); err != nil {
if err := rpmOstreeClient.RpmOstreeCleanup(ctx); err != nil {
log.Error(err, "failed rpm-ostree cleanup -b")
failures += 1
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return immediately on context cancellation during cleanup.

Line 257 and Line 264 currently treat context.Canceled/context.DeadlineExceeded as normal cleanup failures and continue mutating state. That breaks cancellation semantics and can prolong shutdown/reconcile aborts.

Proposed fix
 import (
 	"context"
+	"errors"
 	"fmt"
 	"os"
@@
 	for _, stateroot := range staterootsToRemove {
@@
 		if err := cleanupUnbootedStateroot(ctx, stateroot, ops, ostreeClient, rpmOstreeClient); err != nil {
+			if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
+				return err
+			}
 			log.Error(err, "failed to remove stateroot", "stateroot", stateroot)
 			failures += 1
 		}
 	}
@@
 	if err := rpmOstreeClient.RpmOstreeCleanup(ctx); err != nil {
+		if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
+			return err
+		}
 		log.Error(err, "failed rpm-ostree cleanup -b")
 		failures += 1
 	}

As per coding guidelines, “Focus on major issues impacting performance, readability, maintainability and security.”

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if err := cleanupUnbootedStateroot(ctx, stateroot, ops, ostreeClient, rpmOstreeClient); err != nil {
log.Error(err, "failed to remove stateroot", "stateroot", stateroot)
failures += 1
}
}
// clear temporary files and reclaim disk space
if err := rpmOstreeClient.RpmOstreeCleanup(); err != nil {
if err := rpmOstreeClient.RpmOstreeCleanup(ctx); err != nil {
log.Error(err, "failed rpm-ostree cleanup -b")
failures += 1
}
if err := cleanupUnbootedStateroot(ctx, stateroot, ops, ostreeClient, rpmOstreeClient); err != nil {
if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
return err
}
log.Error(err, "failed to remove stateroot", "stateroot", stateroot)
failures += 1
}
}
// clear temporary files and reclaim disk space
if err := rpmOstreeClient.RpmOstreeCleanup(ctx); err != nil {
if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
return err
}
log.Error(err, "failed rpm-ostree cleanup -b")
failures += 1
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@controllers/idle_handlers.go` around lines 257 - 267, The cleanup paths treat
context cancellation as routine failures; change both the
cleanupUnbootedStateroot call and the rpmOstreeClient.RpmOstreeCleanup call to
detect context cancellation (use errors.Is(err, context.Canceled) ||
errors.Is(err, context.DeadlineExceeded) or errors.Is with ctx.Err()) and
immediately return that error (or propagate/abort) instead of logging and
incrementing failures; update the logic around cleanupUnbootedStateroot,
rpmOstreeClient.RpmOstreeCleanup, ctx, log and failures so
context-cancellation/timeout results short-circuit the function without mutating
failures or continuing other cleanup steps.

Comment thread lca-cli/cmd/ibi.go
Comment on lines +73 to 74
cleanupDevice := preinstallUtils.NewCleanupDevice(log, preinstallUtils.NewDiskOps(log, &preinstallExecutorAdapter{exec: hostCommandsExecutor}))
rpmOstreeClient := ostree.NewClient("lca-cli", hostCommandsExecutor)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve CLI cancellation in preinstallExecutorAdapter instead of forcing context.Background().

Line 93 always uses context.Background(), so commands invoked through preinstall-utils ignore cancellation/timeouts from runIBI(cmd.Context()).

🔧 Proposed fix
 type preinstallExecutorAdapter struct {
+	ctx  context.Context
 	exec ops.Execute
 }
 
 func (a *preinstallExecutorAdapter) Execute(command string, args ...string) (string, error) {
-	out, err := a.exec.Execute(context.Background(), command, args...)
+	execCtx := a.ctx
+	if execCtx == nil {
+		execCtx = context.Background()
+	}
+	out, err := a.exec.Execute(execCtx, command, args...)
 	if err != nil {
 		return out, fmt.Errorf("execute %s: %w", command, err)
 	}
 	return out, nil
 }
-cleanupDevice := preinstallUtils.NewCleanupDevice(log, preinstallUtils.NewDiskOps(log, &preinstallExecutorAdapter{exec: hostCommandsExecutor}))
+cleanupDevice := preinstallUtils.NewCleanupDevice(log, preinstallUtils.NewDiskOps(log, &preinstallExecutorAdapter{
+	ctx:  ctx,
+	exec: hostCommandsExecutor,
+}))

Also applies to: 88-96

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lca-cli/cmd/ibi.go` around lines 73 - 74, preinstallExecutorAdapter currently
forces context.Background() inside its execution methods, causing commands
invoked via runIBI(cmd.Context()) to ignore cancellation/timeouts; update
preinstallExecutorAdapter so it uses the caller-supplied context instead of
context.Background() (either by adding a ctx parameter to its Exec/Run/Execute
methods or by storing a context field set from cmd.Context() when constructing
the adapter) and propagate cmd.Context() from runIBI into
preinstallUtils.NewCleanupDevice / preinstallUtils.NewDiskOps (or any call sites
that create the adapter) so all commands respect CLI cancellation and timeouts.

Comment thread lca-cli/ops/ops.go
Comment on lines +691 to 694
if _, err := o.SystemctlAction(ctx, "is-active", containerStorageMountUnit); err != nil {
// No active mount, return nil
return "", nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't collapse canceled systemctl is-active calls into the inactive path.

These branches now receive context-driven failures too, but they still treat broad classes of errors as "unit is inactive/already stopped". If ctx times out or the command fails for another reason, callers can proceed with the wrong storage/service state instead of aborting. Only special-case the expected inactive exit code and propagate everything else.

Suggested fix
 	if _, err := o.SystemctlAction(ctx, "is-active", containerStorageMountUnit); err != nil {
-		// No active mount, return nil
-		return "", nil
+		var exitErr *exec.ExitError
+		if errors.As(err, &exitErr) && exitErr.ExitCode() == 3 {
+			return "", nil
+		}
+		return "", fmt.Errorf("unable to determine whether %s is active: %w", containerStorageMountUnit, err)
 	}
 	crioSystemdStatus, err := o.SystemctlAction(ctx, "is-active", "crio")
 	var exitErr *exec.ExitError
 	// If ExitCode is 3, the command succeeded and told us that crio is down
-	if err != nil && errors.As(err, &exitErr) && exitErr.ExitCode() != 3 {
-		return fmt.Errorf("failed to checking crio status: %w", err)
+	if err != nil {
+		if !(errors.As(err, &exitErr) && exitErr.ExitCode() == 3) {
+			return fmt.Errorf("failed checking crio status: %w", err)
+		}
 	}

Also applies to: 728-733

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lca-cli/ops/ops.go` around lines 691 - 694, The current check for
o.SystemctlAction(ctx, "is-active", containerStorageMountUnit) treats any error
as "no active mount" and returns nil; change it to only swallow the specific
error that indicates the unit is inactive (the expected systemctl exit/status
for inactive) and propagate all other errors (including context
cancellation/timeouts) so callers don't proceed on bogus state. Update both
occurrences (the call using o.SystemctlAction with "is-active" and
containerStorageMountUnit around lines referenced) to inspect the error from
SystemctlAction and return it unless it matches the known inactive result; keep
the original behavior for the true inactive case.

@sebrandon1 sebrandon1 force-pushed the add-context-to-execute-interface branch from aaae2ba to 70fcc6e Compare May 28, 2026 18:12
Replace exec.Command with exec.CommandContext so external commands
respect context cancellation and timeouts. Propagate ctx through
the Ops interface, downstream consumers, controllers, and CLI
commands.

- Add ctx as first parameter to Execute, ExecuteWithLiveLogger,
  and all executor implementations
- Thread ctx through ~25 Ops methods and all callers
- Use cmd.Context() in ipconfig CLI commands instead of
  context.Background()
- Add timeouts for startup and signal-handler probes
- Use context.WithoutCancel for deferred cleanup operations
- Handle errors from IsOstreeAdminSetDefaultFeatureEnabled
- Log podman rmi errors instead of silently discarding
- Regenerate all mocks
@sebrandon1 sebrandon1 force-pushed the add-context-to-execute-interface branch from 70fcc6e to e88e675 Compare May 29, 2026 15:00
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
controllers/ipc_controller.go (1)

327-338: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate cancellation instead of collapsing all errors to false.

When GetUnbootedStaterootName(ctx) fails due to context cancellation or timeout, line 333 returns false, nil, converting the control signal into a state decision. Downstream logic in validNextStages may incorrectly allow config re-entry or mark rollback validation as failed when nothing was actually checked. Only the genuine "not available" case (empty name) should map to false.

🔧 Suggested fix
 func isUnbootedStaterootAvailable(ctx context.Context, rpmOstreeClient rpmostreeclient.IClient) (*bool, error) {
 	if rpmOstreeClient == nil {
 		return nil, fmt.Errorf("rpmOstreeClient is nil")
 	}

 	unbootedStaterootName, err := rpmOstreeClient.GetUnbootedStaterootName(ctx)
-	if err != nil || unbootedStaterootName == "" {
+	if err != nil {
+		return nil, err
+	}
+	if unbootedStaterootName == "" {
 		return lo.ToPtr(false), nil
 	}

 	return lo.ToPtr(true), nil
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@controllers/ipc_controller.go` around lines 327 - 338, The function
isUnbootedStaterootAvailable currently swallows all errors from
rpmOstreeClient.GetUnbootedStaterootName and returns (false, nil), which
converts cancellations/timeouts into a state; change it to propagate errors
instead: call GetUnbootedStaterootName(ctx) and if err != nil return nil, err
(so context.Canceled/DeadlineExceeded propagate), only treat an empty
unbootedStaterootName as lo.ToPtr(false), nil and otherwise return
lo.ToPtr(true), nil; update references to isUnbootedStaterootAvailable (used by
validNextStages) to handle the returned error accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@controllers/ipc_controller.go`:
- Around line 327-338: The function isUnbootedStaterootAvailable currently
swallows all errors from rpmOstreeClient.GetUnbootedStaterootName and returns
(false, nil), which converts cancellations/timeouts into a state; change it to
propagate errors instead: call GetUnbootedStaterootName(ctx) and if err != nil
return nil, err (so context.Canceled/DeadlineExceeded propagate), only treat an
empty unbootedStaterootName as lo.ToPtr(false), nil and otherwise return
lo.ToPtr(true), nil; update references to isUnbootedStaterootAvailable (used by
validNextStages) to handle the returned error accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: d54da558-e48d-422b-9c64-74552246cba8

📥 Commits

Reviewing files that changed from the base of the PR and between aaae2ba and e88e675.

📒 Files selected for processing (57)
  • .konflux/overlay/release.in.yaml
  • controllers/ibu_controller.go
  • controllers/ibu_controller_test.go
  • controllers/idle_handlers.go
  • controllers/idle_handlers_test.go
  • controllers/ipc_config_handlers.go
  • controllers/ipc_config_handlers_test.go
  • controllers/ipc_controller.go
  • controllers/ipc_controller_test.go
  • controllers/ipc_idle_handlers.go
  • controllers/ipc_idle_handlers_test.go
  • controllers/ipc_rollback_handlers.go
  • controllers/ipc_rollback_handlers_test.go
  • controllers/prep_handlers.go
  • controllers/rollback_handlers.go
  • controllers/seedgen_controller.go
  • controllers/upgrade_handlers.go
  • controllers/upgrade_handlers_test.go
  • internal/imagemgmt/imagemgmt.go
  • internal/imagemgmt/imagemgmt_test.go
  • internal/imagemgmt/mock_imagemgmt.go
  • internal/ostreeclient/mock_ostreeclient.go
  • internal/ostreeclient/ostreeclient.go
  • internal/precache/workload/pullImages.go
  • internal/prep/prep.go
  • internal/reboot/mock_reboot.go
  • internal/reboot/reboot.go
  • internal/reboot/reboot_test.go
  • lca-cli/cmd/create.go
  • lca-cli/cmd/ibi.go
  • lca-cli/cmd/ibuPrecacheWorkload.go
  • lca-cli/cmd/ibuStaterootSetup.go
  • lca-cli/cmd/ipconfig/postpivot.go
  • lca-cli/cmd/ipconfig/prepivot.go
  • lca-cli/cmd/ipconfig/rollback.go
  • lca-cli/cmd/postpivotcmd.go
  • lca-cli/cmd/restore.go
  • lca-cli/ibi-preparation/ibipreparation.go
  • lca-cli/ibi-preparation/ibipreparation_test.go
  • lca-cli/initmonitor/initmonitor.go
  • lca-cli/ipconfig/postpivot.go
  • lca-cli/ipconfig/postpivot_test.go
  • lca-cli/ipconfig/prepivot.go
  • lca-cli/ipconfig/prepivot_test.go
  • lca-cli/ipconfig/rollback.go
  • lca-cli/ipconfig/rollback_test.go
  • lca-cli/ops/execute.go
  • lca-cli/ops/mock_execute.go
  • lca-cli/ops/mock_ops.go
  • lca-cli/ops/ops.go
  • lca-cli/ostreeclient/mock_rpmostreeclient.go
  • lca-cli/ostreeclient/rpmostreeclient.go
  • lca-cli/postpivot/postpivot.go
  • lca-cli/postpivot/postpivot_test.go
  • lca-cli/seedcreator/seedcreator.go
  • lca-cli/seedrestoration/seedrestoration.go
  • main/main.go
✅ Files skipped from review due to trivial changes (5)
  • .konflux/overlay/release.in.yaml
  • internal/imagemgmt/mock_imagemgmt.go
  • internal/ostreeclient/mock_ostreeclient.go
  • lca-cli/ostreeclient/mock_rpmostreeclient.go
  • lca-cli/ops/mock_execute.go
🚧 Files skipped from review as they are similar to previous changes (44)
  • controllers/ibu_controller.go
  • lca-cli/cmd/postpivotcmd.go
  • lca-cli/cmd/create.go
  • lca-cli/cmd/restore.go
  • controllers/ibu_controller_test.go
  • main/main.go
  • lca-cli/cmd/ipconfig/rollback.go
  • lca-cli/ipconfig/rollback_test.go
  • controllers/ipc_idle_handlers.go
  • lca-cli/cmd/ibuStaterootSetup.go
  • internal/reboot/reboot_test.go
  • lca-cli/cmd/ibi.go
  • lca-cli/postpivot/postpivot_test.go
  • lca-cli/cmd/ipconfig/postpivot.go
  • controllers/prep_handlers.go
  • controllers/ipc_idle_handlers_test.go
  • lca-cli/seedrestoration/seedrestoration.go
  • lca-cli/ops/execute.go
  • internal/imagemgmt/imagemgmt.go
  • internal/precache/workload/pullImages.go
  • lca-cli/ibi-preparation/ibipreparation.go
  • controllers/ipc_rollback_handlers_test.go
  • lca-cli/ipconfig/postpivot.go
  • lca-cli/initmonitor/initmonitor.go
  • internal/reboot/mock_reboot.go
  • controllers/idle_handlers_test.go
  • lca-cli/ipconfig/prepivot_test.go
  • lca-cli/ostreeclient/rpmostreeclient.go
  • internal/imagemgmt/imagemgmt_test.go
  • controllers/upgrade_handlers.go
  • controllers/idle_handlers.go
  • lca-cli/ipconfig/prepivot.go
  • controllers/rollback_handlers.go
  • controllers/ipc_controller_test.go
  • lca-cli/ipconfig/postpivot_test.go
  • lca-cli/ibi-preparation/ibipreparation_test.go
  • controllers/seedgen_controller.go
  • lca-cli/seedcreator/seedcreator.go
  • internal/prep/prep.go
  • lca-cli/postpivot/postpivot.go
  • lca-cli/ops/mock_ops.go
  • lca-cli/cmd/ipconfig/prepivot.go
  • internal/reboot/reboot.go
  • lca-cli/ops/ops.go

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 29, 2026

@sebrandon1: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/ci-job e88e675 link true /test ci-job
ci/prow/ibu-e2e-flow-v6v4 e88e675 link false /test ibu-e2e-flow-v6v4
ci/prow/ibi-e2e-flow-v4v6 e88e675 link false /test ibi-e2e-flow-v4v6
ci/prow/ibi-e2e-flow-v6v4 e88e675 link false /test ibi-e2e-flow-v6v4
ci/prow/ipc-e2e-flow-v4v6 e88e675 link false /test ipc-e2e-flow-v4v6
ci/prow/ibu-e2e-flow-v4v6 e88e675 link false /test ibu-e2e-flow-v4v6
ci/prow/ibu-e2e-flow e88e675 link false /test ibu-e2e-flow
ci/prow/ipc-e2e-flow e88e675 link true /test ipc-e2e-flow
ci/prow/ipc-e2e-flow-v6v4 e88e675 link false /test ipc-e2e-flow-v6v4
ci/prow/ibi-e2e-flow e88e675 link false /test ibi-e2e-flow

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace usages of exec.Command with exec.CommandContext

3 participants