From 9598cb807464aa848011ee388c094b45783c68ee Mon Sep 17 00:00:00 2001 From: Mark Chmarny Date: Tue, 26 May 2026 13:27:13 -0700 Subject: [PATCH 1/2] fix(bundler): pin OCI sync for argocd-helm mixed-component -pre/-post children MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Issue #1018 reported that `--deployer argocd-helm` bundles published to OCI registries fail to sync on the synthetic `-pre` / `-post` child Applications. The structural fix landed in #1035 (path-based child template now appends `/{{ .Chart.Name }}` to the rendered `source.repoURL` so Argo CD's generic OCI source resolves to the actual artifact). However, two stale doc references still pointed operators at the pre-#1035 contract — installing per those examples re-produces the original failure with a double-suffixed reference. Changes: 1. `pkg/bundler/deployer/argocdhelm/argocdhelm.go` — `writeParentApplicationTemplate` docstring example updated to pass `--set repoURL=` (no chart name). The docstring is the canonical reference for the deployer contract; the bundle's emitted README, the parent App's `required` directive, and the path-based child's `required` directive all already use this form. 2. `docs/user/cli-reference.md` — the argocd-helm "Recommended deploy flow" example also still showed `--set repoURL=oci://// aicr-bundle`. Replaced with the parent-namespace form, added the contract rationale inline, and added a new troubleshooting entry covering the three common failure modes for OCI sync (double-suffixed repoURL, Argo CD < v2.13, missing tag). 3. `pkg/bundler/deployer/argocdhelm/argocdhelm_test.go` — new `TestHelmTemplate_MixedComponentPreChildResolvesFromOCI` reproduces the exact recipe shape from #1018 (Helm component + sibling ComponentPreManifests producing an NNN--pre folder), runs `helm template --set repoURL=oci://`, and asserts the rendered `gpu-operator-pre` Application's `source.repoURL` equals `/`. Pins the post-#1035 contract for the mixed-component synthetic-split code path, which `TestHelmTemplate_RendersWithSetRepoURL` only covered for manifest-only components. Fixes #1018 Related #1034, #1035 --- docs/user/cli-reference.md | 21 ++- pkg/bundler/deployer/argocdhelm/argocdhelm.go | 8 +- .../deployer/argocdhelm/argocdhelm_test.go | 137 ++++++++++++++++++ 3 files changed, 162 insertions(+), 4 deletions(-) diff --git a/docs/user/cli-reference.md b/docs/user/cli-reference.md index 60bfc3fce..5167a83aa 100644 --- a/docs/user/cli-reference.md +++ b/docs/user/cli-reference.md @@ -1589,13 +1589,24 @@ helm package ./bundle -d /tmp/ helm push /tmp/aicr-bundle-*.tgz oci:/// # 3. Install — the URL is supplied here, not at bundle time +# `--set repoURL` is the PARENT NAMESPACE (no trailing chart name). +# The parent Application appends `.Chart.Name` via its `source.chart` +# field, and path-based children append it directly into their +# rendered `source.repoURL`. Including the chart name in --set +# repoURL double-appends it and the children fail to resolve. helm install aicr-bundle oci:////aicr-bundle --version \ -n argocd \ - --set repoURL=oci:////aicr-bundle \ + --set repoURL=oci:/// \ --set targetRevision= ``` -The chart's `templates/aicr-stack.yaml` renders the parent Argo Application with `.Values.repoURL` and `.Values.targetRevision` substituted in. The parent Application then triggers Argo to render the chart again from the OCI source, creating the per-component child Applications with sync-wave ordering preserved. Child Applications whose source is path-based (manifest-only and mixed-component `-post` folders) inherit `.Values.repoURL` so they too pull from the same published location. +The chart's `templates/aicr-stack.yaml` renders the parent Argo Application with `.Values.repoURL` and `.Values.targetRevision` substituted in. The parent Application then triggers Argo to render the chart again from the OCI source, creating the per-component child Applications with sync-wave ordering preserved. Child Applications whose source is path-based (manifest-only and mixed-component `-pre` / `-post` folders) inherit `.Values.repoURL` and append `.Chart.Name` so they pull from the same published artifact as the parent. + +**Argo CD OCI prerequisites.** Path-based child Applications use Argo CD's generic OCI artifact source type (introduced in Argo CD v2.13). The argocd-helm bundle therefore requires: +- Argo CD **≥ v2.13** on the target cluster. +- A registry that serves Helm-pushed OCI artifacts through the generic OCI manifest fetch path (most modern registries — ECR, GHCR, GAR, Harbor, Artifactory, plain `oras`-compatible registries — support this). + +If the cluster runs Argo CD < v2.13 *or* the recipe is pure-Helm (no manifest-only / mixed components), the path-based children are not exercised and the bundle works on older Argo versions too. See the troubleshooting section below if `Failed to load target state` appears on `aicr-stack` or any `-pre` / `-post` Application. **`helm install ./bundle` from a local directory** *also* works, but with a caveat: child Applications whose source is path-based require Argo's repo-server to fetch the bundle from a remote (git or OCI) — there is no local-filesystem source type for an Argo Application. Local `helm install` is therefore end-to-end only when the recipe contains pure-Helm components. For everything else, publish first. @@ -2428,6 +2439,12 @@ aicr --debug bundle -r recipe.yaml **"failed to load manifest \ for component \"** — the recipe references a manifest path that does not exist in the current AICR binary's embedded data. This usually means the recipe was generated by an older binary and a referenced manifest has since been removed or relocated. Regenerate the recipe with the current binary (`aicr recipe ...`) and re-bundle. AICR recipes are a point-in-time artifact of the binary that produced them; bundling a stale recipe against a newer binary is not supported. +**`--deployer argocd-helm`: `aicr-stack` or `-pre` / `-post` Application stuck at `Unknown` sync status / "Failed to load target state: ... `/:: not found`"** — Argo CD cannot resolve the OCI artifact the parent or path-based child Application points at. Three common causes, in order of likelihood: + +1. **Chart name doubled in `--set repoURL`.** Under the current contract, `--set repoURL` carries the **parent namespace only** (e.g., `oci://ghcr.io/myorg`). The parent Application appends `.Chart.Name` via its `source.chart` field, and path-based children append it directly into their rendered `source.repoURL`. Passing `--set repoURL=oci://ghcr.io/myorg/aicr-bundle` produces a double-suffixed reference (`.../aicr-bundle/aicr-bundle:`) that does not exist. Drop the trailing chart segment. +2. **Argo CD older than v2.13.** Path-based children rely on Argo CD's generic OCI artifact source type, added in v2.13. Older Argo treats the source as Git and fails to resolve. Check with `kubectl -n argocd get deploy argocd-repo-server -o jsonpath='{.spec.template.spec.containers[0].image}'`. Upgrade Argo, or use `--deployer helm` if Argo upgrade is not an option. +3. **Tag missing from the registry.** Verify the published artifact exists at the exact tag the parent expects: `oras manifest fetch //:`. If `aicr bundle` is invoked without a tag (`oci:////` with no `:` suffix), the CLI version is used as the default — make sure `--set targetRevision=` at install time matches. + ## External Data Directory The `--data` flag enables extending or overriding the embedded recipe data with external files. This allows customization without rebuilding the CLI. diff --git a/pkg/bundler/deployer/argocdhelm/argocdhelm.go b/pkg/bundler/deployer/argocdhelm/argocdhelm.go index c8850f28b..3f61323ee 100644 --- a/pkg/bundler/deployer/argocdhelm/argocdhelm.go +++ b/pkg/bundler/deployer/argocdhelm/argocdhelm.go @@ -481,10 +481,14 @@ spec: // helm package ./bundle -d /tmp/ // helm push /tmp/aicr-bundle-*.tgz oci://ghcr.io/myorg // -// # install — same URL passed once, used for parent + children +// # install — --set repoURL is the PARENT NAMESPACE (no chart name). +// # The parent App appends .Chart.Name via source.chart, and path-based +// # children append it directly into their rendered source.repoURL — +// # including the chart name in --set repoURL double-suffixes both and +// # the children fail to resolve. See issues #1018 / #1034. // helm install aicr-bundle oci://ghcr.io/myorg/aicr-bundle --version \ // -n argocd \ -// --set repoURL=oci://ghcr.io/myorg/aicr-bundle \ +// --set repoURL=oci://ghcr.io/myorg \ // --set targetRevision= // // The user could also `kubectl apply` the rendered parent App directly diff --git a/pkg/bundler/deployer/argocdhelm/argocdhelm_test.go b/pkg/bundler/deployer/argocdhelm/argocdhelm_test.go index 98534b014..8249f1ceb 100644 --- a/pkg/bundler/deployer/argocdhelm/argocdhelm_test.go +++ b/pkg/bundler/deployer/argocdhelm/argocdhelm_test.go @@ -1369,6 +1369,143 @@ func TestGenerate_AppNameValidatedAtBoundary(t *testing.T) { } } +// TestHelmTemplate_MixedComponentPreChildResolvesFromOCI is the live-render +// regression test for issue #1018: a recipe with a Helm component AND +// ComponentPreManifests for that same component (the gke-cos / EKS GB200 +// shape from the bug report) produces a synthetic NNN--pre folder +// rendered into a path-based child Application. Before the post-#1035 +// contract, the path-based child's `source.repoURL` did not append +// `.Chart.Name`, so when the user `--set repoURL=oci:///` +// (the parent namespace) Argo CD's generic OCI source resolved +// `/:` — an artifact that does not exist — and the +// child stuck at `Unknown / Failed to load target state`. +// +// This test exercises the full chain: generator → helm template render +// → rendered child Application asserts. It pins both the parent and the +// `-pre` child to the same expected `/: +// ` OCI artifact reference. The sibling upstream-helm child +// (`gpu-operator`) is also asserted to confirm its `repoURL` stays as +// the upstream Helm chart registry (not templated from .Values.repoURL). +func TestHelmTemplate_MixedComponentPreChildResolvesFromOCI(t *testing.T) { + if _, err := exec.LookPath("helm"); err != nil { + t.Skip("helm not available; skipping live-render test") + } + + outputDir := t.TempDir() + rr := newRecipeResult("v1.0.0", []recipe.ComponentRef{ + { + Name: "gpu-operator", Namespace: "gpu-operator", Chart: "gpu-operator", + Version: "v25.3.3", Type: recipe.ComponentTypeHelm, + Source: "https://helm.ngc.nvidia.com/nvidia", + }, + }) + rr.DeploymentOrder = []string{"gpu-operator"} + + g := &Generator{ + RecipeResult: rr, + ComponentValues: map[string]map[string]any{ + "gpu-operator": {"driver": map[string]any{"version": "580"}}, + }, + Version: "v0.0.0-test", + // Pre-manifest in the gke-cos / EKS GB200 shape from #1018 — a + // Namespace primer that must apply before the gpu-operator chart. + ComponentPreManifests: map[string]map[string][]byte{ + "gpu-operator": { + "components/gpu-operator/manifests/kernel-module-params.yaml": []byte( + "apiVersion: v1\nkind: ConfigMap\nmetadata:\n name: nvidia-kernel-module-params\n namespace: gpu-operator\n", + ), + }, + }, + } + if _, err := g.Generate(context.Background(), outputDir); err != nil { + t.Fatalf("Generate() error = %v", err) + } + + const setRepoURL = "oci://example.test/myorg" + const wantPathChildRepoURL = setRepoURL + "/" + DefaultChartName // post-#1035 fix + const wantTagName = "v9.9.9-issue-1018" + + cmdCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second) + defer cancel() + cmd := exec.CommandContext(cmdCtx, "helm", "template", "test-release", outputDir, //nolint:gosec // controlled args + "--set", "repoURL="+setRepoURL, + "--set", "targetRevision="+wantTagName, + ) + out, err := cmd.CombinedOutput() + if err != nil { + t.Fatalf("helm template failed: %v\noutput:\n%s", err, out) + } + + dec := yaml.NewDecoder(strings.NewReader(string(out))) + type appLite struct { + Kind string `yaml:"kind"` + Metadata struct{ Name string } `yaml:"metadata"` + Spec struct { + Source struct { + RepoURL string `yaml:"repoURL"` + Chart string `yaml:"chart"` + TargetRevision string `yaml:"targetRevision"` + Path string `yaml:"path"` + } `yaml:"source"` + } `yaml:"spec"` + } + found := map[string]appLite{} + for { + var a appLite + decErr := dec.Decode(&a) + if errors.Is(decErr, io.EOF) { + break + } + if decErr != nil { + t.Fatalf("failed to decode rendered YAML: %v\noutput:\n%s", decErr, out) + } + if a.Kind == "Application" && a.Metadata.Name != "" { + found[a.Metadata.Name] = a + } + } + + // Parent: uses Helm chart shape (source.chart = .Chart.Name). + parent, ok := found["aicr-stack"] + if !ok { + t.Fatalf("rendered output missing parent Application 'aicr-stack'\noutput:\n%s", out) + } + if parent.Spec.Source.RepoURL != setRepoURL { + t.Errorf("parent App repoURL: got %q, want %q (parent namespace, no chart name)", parent.Spec.Source.RepoURL, setRepoURL) + } + if parent.Spec.Source.Chart != DefaultChartName { + t.Errorf("parent App chart: got %q, want %q", parent.Spec.Source.Chart, DefaultChartName) + } + + // The path-based -pre child is the #1018 regression target. Argo CD's + // generic OCI source treats `repoURL` as the full artifact, so the + // rendered value MUST equal `/` for the + // `//:` lookup to resolve. + preChild, ok := found["gpu-operator-pre"] + if !ok { + t.Fatalf("rendered output missing path-based child 'gpu-operator-pre'\noutput:\n%s", out) + } + if preChild.Spec.Source.RepoURL != wantPathChildRepoURL { + t.Errorf("gpu-operator-pre child repoURL: got %q, want %q (issue #1018: must be parent-namespace + chart-name so the generic OCI source resolves to the published artifact)", preChild.Spec.Source.RepoURL, wantPathChildRepoURL) + } + if preChild.Spec.Source.Path != "001-gpu-operator-pre" { + t.Errorf("gpu-operator-pre path: got %q, want %q (NNN-folder name is structural; must not be templated)", preChild.Spec.Source.Path, "001-gpu-operator-pre") + } + if preChild.Spec.Source.TargetRevision != wantTagName { + t.Errorf("gpu-operator-pre child targetRevision: got %q, want %q", preChild.Spec.Source.TargetRevision, wantTagName) + } + + // The sibling upstream-helm child must keep the upstream chart + // registry — its source.repoURL is NOT templated from .Values.repoURL. + // Guards against accidentally widening the path-based template change. + upstream, ok := found["gpu-operator"] + if !ok { + t.Fatalf("rendered output missing upstream-helm child 'gpu-operator'\noutput:\n%s", out) + } + if upstream.Spec.Source.RepoURL != "https://helm.ngc.nvidia.com/nvidia" { + t.Errorf("gpu-operator upstream-helm child repoURL: got %q, want upstream Helm registry (must not be templated from .Values.repoURL)", upstream.Spec.Source.RepoURL) + } +} + // TestHelmTemplate_FailsWithoutRepoURL verifies the `required` directive // in the parent App template fires when the user omits --set repoURL. This // is the safety net that prevents users from accidentally publishing a From e168edc87a89a61c022688786be54dc1cc8771a5 Mon Sep 17 00:00:00 2001 From: Mark Chmarny Date: Tue, 26 May 2026 14:10:49 -0700 Subject: [PATCH 2/2] docs: clarify Argo CD v2.13 vs pure-Helm exception (PR #1039 review) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The previous "if Argo < v2.13 *or* pure-Helm" phrasing implied that an older Argo install alone exempted the bundle from the v2.13 requirement, which is backwards — path-based children are always emitted when the recipe has manifest-only / mixed components; older Argo just fails to resolve them. Split into two sentences per CodeRabbit's suggestion: the pure-Helm recipe is the actual exception that lets older Argo work; otherwise v2.13+ is required. --- docs/user/cli-reference.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/user/cli-reference.md b/docs/user/cli-reference.md index 5167a83aa..106186d1e 100644 --- a/docs/user/cli-reference.md +++ b/docs/user/cli-reference.md @@ -1606,7 +1606,7 @@ The chart's `templates/aicr-stack.yaml` renders the parent Argo Application with - Argo CD **≥ v2.13** on the target cluster. - A registry that serves Helm-pushed OCI artifacts through the generic OCI manifest fetch path (most modern registries — ECR, GHCR, GAR, Harbor, Artifactory, plain `oras`-compatible registries — support this). -If the cluster runs Argo CD < v2.13 *or* the recipe is pure-Helm (no manifest-only / mixed components), the path-based children are not exercised and the bundle works on older Argo versions too. See the troubleshooting section below if `Failed to load target state` appears on `aicr-stack` or any `-pre` / `-post` Application. +If the recipe is pure-Helm (no manifest-only / mixed components), path-based children are not exercised and the bundle can work on Argo CD versions older than v2.13. If path-based children are present, Argo CD v2.13+ is required. See the troubleshooting section below if `Failed to load target state` appears on `aicr-stack` or any `-pre` / `-post` Application. **`helm install ./bundle` from a local directory** *also* works, but with a caveat: child Applications whose source is path-based require Argo's repo-server to fetch the bundle from a remote (git or OCI) — there is no local-filesystem source type for an Argo Application. Local `helm install` is therefore end-to-end only when the recipe contains pure-Helm components. For everything else, publish first.