Add assisted installer deployment method for spoke TNF clusters#51
Add assisted installer deployment method for spoke TNF clusters#51gamado wants to merge 9 commits intoopenshift-eng:mainfrom
Conversation
|
Hi @gamado. Thanks for your PR. I'm waiting for a openshift-eng member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
fonta-rh
left a comment
There was a problem hiding this comment.
Review: Add Assisted Installer Deployment for Spoke TNF Clusters
Solid architecture — clean separation between acm-install (hub-side) and assisted-spoke (spoke-side) with good reuse of existing roles (proxy-setup, common/update-cluster-inventory). The RHCOS ISO auto-detection with fallback and the BMH fencing credential setup are well done.
Found 5 critical issues that should be fixed before merge, 8 important issues for follow-up, and 5 minor suggestions. Details below.
Critical Issues
C1. SSH key hardcodes id_rsa.pub — will fail on ed25519-only hosts
File: roles/assisted/assisted-spoke/tasks/create-cluster-resources.yml
The task uses cat ~/.ssh/id_rsa.pub but the project convention is ~/.ssh/id_ed25519.pub (see CLAUDE.md, kcli role's prepare.yml). Modern RHEL 9 defaults to ed25519. If the hypervisor was provisioned with ed25519 only, this task silently fails.
Suggestion: Use the same multi-key detection pattern from the kcli role or common/:
- name: Get SSH public key
shell: |
for key in ~/.ssh/id_ed25519.pub ~/.ssh/id_rsa.pub ~/.ssh/id_ecdsa.pub; do
[ -f "$key" ] && cat "$key" && exit 0
done
echo "ERROR: No SSH public key found" >&2 && exit 1
register: ssh_pub_key
changed_when: falseC2. pull_secret_path duplicated across both roles
Files:
roles/assisted/acm-install/vars/main.yml:23roles/assisted/assisted-spoke/vars/main.yml:13
Both independently define pull_secret_path: /opt/dev-scripts/pull_secret.json. If the path changes, both must be updated independently.
Suggestion: Define once as a playbook-level variable in assisted-install.yml (or in the vars file loaded by vars_files), remove from both role vars/main.yml.
C3. Hub release image extraction duplicated between roles
Files:
roles/assisted/acm-install/tasks/agent-service-config.yml:1-14— fetcheshub_release_imageandhub_ocp_versionroles/assisted/assisted-spoke/tasks/create-cluster-resources.yml:1-20— fetches the same data
Same oc get clusterversion calls, same parsing, same variable names. Runs twice during a single deployment.
Suggestion: Move to a pre_tasks block in assisted-install.yml and pass as facts, or create a shared task file under roles/assisted/common/.
C4. RHCOS ISO fallback logic has edge cases that pass silently
File: roles/assisted/acm-install/tasks/agent-service-config.yml
Two issues:
- The primary extraction pipes
oc image info -o json 2>/dev/nullto Python. Ifoc image infofails, Python gets empty stdin →JSONDecodeError→ rc=1. Butfailed_whenonly checks"'FAILED' in rhcos_iso_extraction.stdout", notrc. The task succeeds with empty stdout. - The
set_factusesrhcos_iso_fallback.stdout | default(rhcos_iso_extraction.stdout).default()only triggers when the variable is undefined (i.e., fallback was skipped). If fallback runs but produces empty output,rhcos_iso_urlbecomes""— thefailed_whenchecks for'FAILED'and'NEEDS_FALLBACK'but not empty string.
Suggestion:
- name: Set RHCOS ISO URL fact
set_fact:
rhcos_iso_url: >-
{{ (rhcos_iso_fallback.stdout | default(rhcos_iso_extraction.stdout)) | trim }}
failed_when: >-
rhcos_iso_url == 'FAILED' or
rhcos_iso_url == 'NEEDS_FALLBACK' or
rhcos_iso_url == ''And add failed_when: rhcos_iso_extraction.rc != 0 or 'FAILED' in rhcos_iso_extraction.stdout to the primary extraction task.
C5. VM cleanup can cause stale disk reuse
File: roles/assisted/assisted-spoke/tasks/cleanup.yml
virsh undefine --remove-all-storage is wrapped in 2>/dev/null || true + failed_when: false. If storage removal fails (locked disk, permission denied), the VM definition is removed but the disk persists. On re-deployment, qemu-img create uses creates: guard and skips disk creation — the new VM boots from the OLD disk.
Suggestion: Separate the storage cleanup from VM undefine, or explicitly delete the qcow2 files in the cleanup task:
- name: Remove spoke VM disk images
file:
path: "{{ spoke_vm_image_dir }}/{{ spoke_cluster_name }}-master-{{ item }}.qcow2"
state: absent
loop: "{{ range(spoke_ctlplanes) | list }}"
become: trueImportant Issues
I6. Deploy script hardcodes spoke path
File: scripts/deploy-fencing-assisted.sh:42
Success message says KUBECONFIG=~/spoke-tnf/auth/kubeconfig but spoke_cluster_name is configurable in vars/assisted.yml. A user with spoke_cluster_name: my-cluster gets the wrong path in the output.
Suggestion: Either source the vars file to get the actual name, or note that the path depends on the configured spoke_cluster_name.
I7. No input validation for spoke-specific variables
File: (missing — no validate.yml in assisted-spoke)
Both dev-scripts/install-dev and kcli/kcli-install have dedicated validate.yml files. The assisted roles validate hub health but not spoke inputs (CIDR format, VIP within CIDR, non-empty cluster name, valid BMC credentials format).
Suggestion: Add a validate.yml to assisted-spoke that checks at minimum:
spoke_api_vipandspoke_ingress_vipare withinspoke_network_cidrspoke_cluster_nameis non-empty and DNS-safespoke_ctlplanes >= 2
I8. No cluster state tracking — must integrate with existing system
Existing roles update cluster-vm-state.json via common/tasks/cluster-state.yml. The assisted roles skip this entirely, which means the deployment tracking system has no awareness of spoke clusters. This breaks make info, lifecycle management, and any tooling that depends on cluster-vm-state.json to know what's deployed.
The assisted-spoke role must call common/tasks/cluster-state.yml (or equivalent) to register spoke VMs and their state. This is a project convention — all deployment methods must support it.
I9. failed_when: false too broadly applied
Multiple locations silence legitimate errors:
cleanup.yml:3-7— namespace delete timeout silenced entirelyenable-watch-all-namespaces.yml:17— Provisioning CR patch hides API errorsretrieve-credentials.yml:63-78— VM start failure hidden, triggers 2-minute blindpause
Suggestion: At minimum, log the error output. For the VM start case, if virsh start fails, don't wait 120 seconds.
I10. Wait loops provide no diagnostics on timeout
When wait for assisted-service pod or wait for agents to register times out, the user sees only the retry count. Consider adding a rescue block that dumps pod status, events, or logs.
I11. hub_auth_dir defined but never used
File: roles/assisted/assisted-spoke/vars/main.yml:21
Dead variable.
I12. storage.yml destructive rm -rf on re-run
File: roles/assisted/acm-install/tasks/storage.yml
rm -rf {{ assisted_images_path }}/* {{ assisted_db_path }}/*
Runs every time the role executes (not just on cleanup), wiping existing assisted-service data. This belongs in cleanup.yml, not storage.yml. Gate it with a when: force_cleanup or move to cleanup.
I13. No README for the new roles
File: (missing — no README.md in either assisted/acm-install/ or assisted/assisted-spoke/)
Every existing role in this project has a comprehensive README (dev-scripts/install-dev/README.md, kcli/kcli-install/README.md, etc.). READMEs are mandatory for new roles — they document prerequisites, variables, usage examples, and known limitations. The assisted.yml.template comments are not a substitute.
Suggestions
S14. changed_when: true on network/VM creation masks idempotency. Use actual command output to determine change status.
S15. setup-ksushy.yml assertion message assumes 2 hub nodes. Hubs can have 3+ nodes (e.g., 3-node HA).
S16. Template should document DHCP range constraints — VIPs must be outside .50-.150 range.
S17. Missing hub_network_cidr in template — if hub network differs from 192.168.111.0/24, nftables rules silently fail.
S18. Nearly all new files lack a trailing newline. Minor but should be fixed for consistency.
Strengths
- Architecture is sound — clean separation between
acm-installandassisted-spokewith the playbook orchestrating both - Good reuse of existing roles —
proxy-setupandcommon/update-cluster-inventoryin post_tasks - RHCOS ISO auto-detection with fallback is genuinely useful
- Operational comments are excellent (e.g., "Requires both a ConfigMap AND an annotation on AgentServiceConfig")
assisted.yml.templateis well-structured as user-facing documentation- BMH creation with fencing credentials correctly uses
bmac.agent-install.openshift.io/fencing-credentials-secret-name - Makefile integration correctly chains
fencing-ipibeforefencing-assisted
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: gamado The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Both clusters are healthy: Spoke cluster (spoke-tnf):
Hub cluster:
The critical fixes (C1-C5) are fully validated. Ready to start on the important issues whenever you are. |
Review Fixes StatusTwo commits pushed addressing the review findings: Commit 1:
|
| Issue | Status | Description |
|---|---|---|
| C1 | Fixed | SSH key detection: multi-key loop (ed25519, rsa, ecdsa) instead of hardcoded id_rsa.pub |
| C2 | Fixed | pull_secret_path and hub_kubeconfig moved to playbook-level vars, removed from both roles |
| C3 | Fixed | Hub release image extraction moved to playbook pre_tasks, removed duplication from both roles |
| C4 | Fixed | RHCOS ISO extraction: added rc check on primary, empty string check on set_fact |
| C5 | Fixed | Explicit disk image removal after VM undefine to prevent stale disk reuse |
Commit 2: 38bfca3 — Important Fixes + Suggestions (I6, I8, I9, I10, I13, S16-S18)
| Issue | Status | Description |
|---|---|---|
| I6 | Fixed | Deploy script parses spoke_cluster_name from vars/assisted.yml instead of hardcoding path |
| I7 | Skipped | Input validation for spoke variables — deferred |
| I8 | Fixed | Cluster state tracking added to assisted-install.yml (deploying/deployed phases via common/cluster-state.yml) |
| I9 | Fixed | failed_when: false replaced: ignore_errors: true in cleanup, conditional check in provisioning patch, removed from virsh start |
| I10 | Fixed | Block/rescue diagnostic handlers added to all 9 wait loops across 4 files |
| I11 | Fixed | hub_auth_dir removed (was fixed in C2 commit) |
| I12 | Skipped | Destructive rm -rf in storage.yml — deferred |
| I13 | Fixed | READMEs added for acm-install and assisted-spoke roles |
| S14 | Skipped | changed_when: true idempotency — deferred |
| S15 | Skipped | setup-ksushy.yml assertion message — deferred |
| S16 | Fixed | DHCP range constraints documented in assisted.yml.template |
| S17 | Fixed | hub_network_cidr exposed in assisted.yml.template |
| S18 | Fixed | Trailing newlines added to all 22 new files |
Summary
- 13 of 18 review findings addressed
- 5 deferred (I7, I12, S14, S15 — low risk, can be follow-up)
- End-to-end deployment validated:
make deploy fencing-assistedcompleted successfully (ok=155, changed=48, failed=0) - Both hub and spoke clusters verified healthy (nodes Ready, all COs available, Pacemaker fencing operational)
Environment
Deployment SummaryFull end-to-end deployment completed successfully: Hub Cluster Status
Spoke Cluster Status
Fixes ValidatedAll fixes from the two review-fix commits were validated in this run:
|
|
Hey @gamado — looks like the March 12 rebase dropped your two fix commits ( The review fixes you validated on Feb 18-19 are no longer in the diff. A few concrete examples:
Looks like the fix commits weren't included during the interactive rebase. The original commits should still be recoverable via |
Deploy a spoke TNF cluster via ACM/MCE assisted installer on an existing hub cluster. This adds a third deployment path alongside dev-scripts (IPI) and kcli methods. New Ansible roles: - assisted/acm-install: installs ACM operator, MultiClusterHub, AgentServiceConfig with auto-detected RHCOS ISO, enables TNF support, and configures provisioning for external BMH management - assisted/assisted-spoke: creates spoke libvirt network and VMs, deploys cluster resources (ClusterDeployment, AgentClusterInstall, InfraEnv, BareMetalHost with fencing credentials), monitors installation, and extracts spoke credentials Usage: make deploy fencing-assisted Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- C1: SSH key detection now tries ed25519 first, falls back to rsa/ecdsa - C2: Deduplicate pull_secret_path and hub_kubeconfig to playbook level - C3: Move hub release image extraction to playbook pre_tasks (run once) - C4: RHCOS ISO extraction checks rc and catches empty string failures - C5: Explicit disk cleanup prevents stale qcow2 reuse on re-deploy - Remove unused hub_auth_dir variable (I11) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- I6: Parse spoke_cluster_name from vars/assisted.yml in deploy script instead of hardcoding ~/spoke-tnf/auth/kubeconfig - I8: Add cluster state tracking (deploying/deployed) to assisted-install.yml using common/cluster-state.yml, consistent with dev-scripts and kcli methods - I9: Replace blanket failed_when:false with ignore_errors:true in cleanup, conditional error checking in enable-watch-all-namespaces, and remove failed_when:false from virsh start in retrieve-credentials - I10: Add block/rescue diagnostic handlers to all 9 wait loops across install-operator, agent-service-config, wait-for-install, and retrieve-credentials, dumping relevant status on timeout - I13: Add README.md for acm-install and assisted-spoke roles - S16: Document DHCP range constraints in assisted.yml.template - S17: Expose hub_network_cidr in assisted.yml.template - S18: Add trailing newlines to all 22 new files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a "fencing-assisted" deployment path: a Makefile target and script that run a new Ansible playbook to prepare an existing hub (ACM/MCE/assisted-service) and provision an assisted-installer spoke TNF cluster via two new roles (acm-install and assisted-spoke), templates, vars, and documentation. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User
participant Makefile
participant Script as deploy-fencing-assisted.sh
participant Ansible as assisted-install.yml
participant Hub as Hub KubeAPI
participant Assisted as Assisted-Service/ACM
participant Hypervisor as Hypervisor (ksushy/libvirt)
User->>Makefile: make deploy fencing-assisted
Makefile->>Script: exec deploy-fencing-assisted.sh
Script->>Ansible: ansible-playbook assisted-install.yml -i inventory.ini
Ansible->>Hub: validate hub kubeconfig and proxy (oc)
Ansible->>Assisted: install ACM/MCE, create MultiClusterHub
Ansible->>Assisted: extract RHCOS ISO, apply AgentServiceConfig
Ansible->>Hypervisor: create libvirt network, images, VMs (ksushy)
Ansible->>Hub: apply spoke resources (Namespace, ClusterImageSet, ClusterDeployment, AgentClusterInstall, InfraEnv)
Ansible->>Hub: apply BareMetalHost + fencing secrets
Assisted->>Hub: agents register and progress installation
Hub->>Ansible: provide spoke kubeconfig and admin password
Ansible->>User: print spoke credentials and access instructions
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- Add OPENSHIFT_INSTALL_EXPERIMENTAL_DISABLE_IMAGE_POLICY to assisted-unsupported-config to prevent spoke MCD signature verification failures during package layering - Add retries (30x30s) to packagemanifest auto-detection to handle catalogsource sync delays (up to 25 minutes observed) - Add fencing-assisted to VALID_CLUSTER_TYPES in Makefile
There was a problem hiding this comment.
Actionable comments posted: 12
🧹 Nitpick comments (17)
deploy/openshift-clusters/roles/assisted/acm-install/tasks/storage.yml (1)
9-18: Consider adding nodeAffinity to hostPath PVs.For multi-node clusters, hostPath PVs should include
nodeAffinityto ensure pods are scheduled on the node where the storage exists. Without it, pods may be scheduled on a different node and fail to access the data.♻️ Example nodeAffinity addition
apiVersion: v1 kind: PersistentVolume metadata: name: assisted-pv-images spec: capacity: storage: {{ assisted_images_size }} accessModes: [ReadWriteOnce] hostPath: path: {{ assisted_images_path }} storageClassName: {{ assisted_storage_class }} nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - {{ hub_nodes.stdout_lines[0] }}Note: This requires restructuring to capture node names before creating PVs, or templating both PVs per node.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/openshift-clusters/roles/assisted/acm-install/tasks/storage.yml` around lines 9 - 18, Add nodeAffinity to the PersistentVolume definition for assisted-pv-images so the hostPath PV binds only to the node that actually contains {{ assisted_images_path }}; update the PersistentVolume resource (metadata.name: assisted-pv-images, spec.hostPath) to include a spec.nodeAffinity.required.nodeSelectorTerms matching the node's hostname (e.g., kubernetes.io/hostname) or otherwise ensure PVs are templated per-node, and adjust the playbook/template to capture a target node name (hub_nodes or similar variable) before creating the PV so the nodeAffinity can reference that node.deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/main.yml (1)
4-30: Consider adding a rescue block for diagnostic output on failure.Per the PR objectives, commit 38bfca3 was supposed to add block/rescue diagnostic handlers (I10 fix). The current code lacks a rescue block which would help capture diagnostic information when spoke deployment fails. This may have been lost during the rebase.
♻️ Example rescue block for diagnostics
- block: - name: Cleanup existing spoke resources include_tasks: cleanup.yml when: force_cleanup | bool # ... other tasks ... - name: Retrieve spoke cluster credentials include_tasks: retrieve-credentials.yml + rescue: + - name: Collect diagnostic information on failure + shell: | + echo "=== AgentClusterInstall status ===" + oc get agentclusterinstall -n {{ spoke_cluster_name }} -o yaml || true + echo "=== Agent status ===" + oc get agent -n {{ spoke_cluster_name }} -o wide || true + register: diagnostics + changed_when: false + ignore_errors: true + + - name: Display diagnostics + debug: + var: diagnostics.stdout_lines + + - name: Fail with context + fail: + msg: "Spoke cluster deployment failed. See diagnostics above." environment: KUBECONFIG: "{{ hub_kubeconfig }}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/main.yml` around lines 4 - 30, The block that runs spoke deployment tasks (the block including include_tasks: cleanup.yml, create-spoke-network.yml, create-spoke-vms.yml, setup-ksushy.yml, create-cluster-resources.yml, create-bmh.yml, wait-for-install.yml, retrieve-credentials.yml) needs a rescue: section to capture diagnostics on failure; add a rescue: that runs a diagnostic include_tasks (e.g. include_tasks: diagnostics.yml or include_tasks: gather-spoke-diagnostics.yml) and any necessary cleanup/notify steps, ensure the rescue runs with the same environment KUBECONFIG: "{{ hub_kubeconfig }}", and reference the same task names so failures during Create spoke VMs / Wait for spoke cluster installation / Retrieve spoke cluster credentials trigger diagnostic collection.deploy/openshift-clusters/scripts/deploy-fencing-assisted.sh (1)
38-42: Fragile YAML parsing - consider using a proper YAML parser.The
grep | awkapproach won't correctly handle quoted values containing colons, inline comments, or multi-line values. Consider usingyqor Python for robust YAML parsing.♻️ More robust parsing using yq
# Parse spoke_cluster_name from vars/assisted.yml -SPOKE_CLUSTER_NAME=$(grep '^spoke_cluster_name:' vars/assisted.yml | awk '{print $2}' | tr -d '"' | tr -d "'") +if command -v yq &> /dev/null; then + SPOKE_CLUSTER_NAME=$(yq -r '.spoke_cluster_name // "spoke-tnf"' vars/assisted.yml) +else + # Fallback to basic parsing (may not handle all edge cases) + SPOKE_CLUSTER_NAME=$(grep '^spoke_cluster_name:' vars/assisted.yml | awk '{print $2}' | tr -d '"' | tr -d "'" || true) +fi if [[ -z "${SPOKE_CLUSTER_NAME}" ]]; then SPOKE_CLUSTER_NAME="spoke-tnf" fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/openshift-clusters/scripts/deploy-fencing-assisted.sh` around lines 38 - 42, The current extraction of SPOKE_CLUSTER_NAME using grep/awk is fragile; replace the pipeline that sets SPOKE_CLUSTER_NAME by invoking a YAML-aware parser (e.g., yq or a small Python snippet) to read the spoke_cluster_name key from vars/assisted.yml and fall back to "spoke-tnf" if empty; update the logic surrounding the SPOKE_CLUSTER_NAME variable so the script uses the parsed value (keep the same variable name SPOKE_CLUSTER_NAME) and preserve the existing empty-check fallback.CLAUDE.md (1)
135-139: Consider adding link to template file or noting key variables.The Assisted Installer Method section mentions copying
vars/assisted.yml.templatebut doesn't indicate what key variables users should customize. This differs from the Kcli Method section (line 131-133) which mentions specific configuration details.Consider adding a brief note about key variables like
spoke_cluster_name,spoke_network_cidr, or referencing the role README for full variable documentation.As per coding guidelines: "Document configuration options and environment variables in documentation".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CLAUDE.md` around lines 135 - 139, Update the "Assisted Installer Method" section to reference the vars/assisted.yml.template and call out the key variables users must customize (e.g., spoke_cluster_name, spoke_network_cidr) and/or add a link to the role README for full variable documentation; ensure you mention that users should copy vars/assisted.yml.template to vars/assisted.yml and populate those variables before running the installer and optionally list a couple of other common variables to customize.deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/create-cluster-resources.yml (2)
39-43:changed_whendetection may miss updates.The
changed_when: "'created' in ..."pattern only detects new resource creation. Ifoc applyupdates an existing resource (showing "configured"), the task incorrectly reports no change. This doesn't affect functionality but misrepresents actual changes.♻️ Detect both create and update
- changed_when: "'created' in cis_result.stdout" + changed_when: "'created' in cis_result.stdout or 'configured' in cis_result.stdout"Apply similar change to all four
Applytasks.Also applies to: 51-55, 63-67, 75-79
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/create-cluster-resources.yml` around lines 39 - 43, The task "Apply ClusterImageSet" currently sets changed_when to only detect "'created' in cis_result.stdout" which misses updates; change the condition to detect both creation and updates (e.g., use "changed_when: \"'created' in cis_result.stdout or 'configured' in cis_result.stdout\"" or include 'patched' if desired) for the cis_result variable, and apply the same change to the other three "Apply ..." tasks that register their own result variables so they report changes on both create and update.
22-31: Pull secret creation is idempotent but has a minor race condition.The check-then-create pattern has a small TOCTOU window. In practice, this is unlikely to matter for this use case, but using
oc applywith a templated secret would be more robust.♻️ Alternative using oc apply for true idempotency
Consider templating the secret and using
oc applysimilar to the other resources, which avoids the check-then-create pattern.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/create-cluster-resources.yml` around lines 22 - 31, The current "Create spoke pull secret" shell task uses a check-then-create pattern and should be replaced with an idempotent apply; change the task that references spoke_cluster_name and pull_secret_path to generate a secret manifest (or run oc create secret generic ... --from-file=.dockerconfigjson={{ pull_secret_path }} --type=kubernetes.io/dockerconfigjson --dry-run=client -o yaml) and pipe that into oc apply -n {{ spoke_cluster_name }} -f - so the operation is atomic and avoids the TOCTOU race; remove reliance on ps_result and adjust changed_when to rely on the apply result (or let Ansible detect changes) instead.deploy/openshift-clusters/roles/assisted/acm-install/tasks/enable-tnf.yml (1)
34-41: Pod selection could fail silently if no pods exist.If no
assisted-servicepods exist when this runs (e.g., rollout is still in progress despite the previous wait),head -1returns empty string, causingoc execto fail with an unclear error.♻️ Add guard for empty pod selection
- name: Verify TNF support is enabled shell: | - oc exec -n {{ assisted_service_namespace }} \ - $(oc get pod -n {{ assisted_service_namespace }} -l app=assisted-service -o name | head -1) \ - -c assisted-service -- env | grep -i TNF_CLUSTERS_SUPPORT + POD=$(oc get pod -n {{ assisted_service_namespace }} -l app=assisted-service -o name 2>/dev/null | head -1) + if [ -z "$POD" ]; then + echo "ERROR: No assisted-service pod found" >&2 + exit 1 + fi + oc exec -n {{ assisted_service_namespace }} "$POD" -c assisted-service -- env | grep -i TNF_CLUSTERS_SUPPORT register: tnf_verify changed_when: false failed_when: "'TNF_CLUSTERS_SUPPORT=true' not in tnf_verify.stdout"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/openshift-clusters/roles/assisted/acm-install/tasks/enable-tnf.yml` around lines 34 - 41, The current task "Verify TNF support is enabled" uses a shell pipeline with head -1 and then oc exec, which will error unclearly if no assisted-service pod exists; change the task to first capture the pod name (from oc get pod -n {{ assisted_service_namespace }} -l app=assisted-service -o name | head -1) into a variable, add an explicit guard/failure if that variable is empty (fail with a clear message referencing assisted_service_namespace and the app=assisted-service selector), and only then run the oc exec to check TNF_CLUSTERS_SUPPORT, keeping the tnf_verify registration and failed_when logic intact but relying on the pre-check to avoid silent/unclear failures.deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/create-spoke-network.yml (2)
33-44: nftables rule persistence and check logic.Two observations:
The grep pattern on line 36 only checks for rules in one direction (
spoke...hub). If only the reverse rule exists, both rules would still be inserted, potentially creating duplicates for the hub→spoke direction.nftables rules added via
nft insertare not persisted across reboots. Consider whether persistence is needed for your use case.♻️ More robust rule checking
- name: Add cross-bridge nftables FORWARD rules for spoke<->hub connectivity shell: | # Check if rules already exist - if sudo nft list chain ip filter FORWARD 2>/dev/null | grep -q "{{ spoke_network_cidr }}.*{{ hub_network_cidr }}"; then + FORWARD_RULES=$(sudo nft list chain ip filter FORWARD 2>/dev/null || echo "") + if echo "$FORWARD_RULES" | grep -q "{{ spoke_network_cidr }}.*{{ hub_network_cidr }}" && \ + echo "$FORWARD_RULES" | grep -q "{{ hub_network_cidr }}.*{{ spoke_network_cidr }}"; then echo "Rules already exist" else🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/create-spoke-network.yml` around lines 33 - 44, Update the shell task "Add cross-bridge nftables FORWARD rules for spoke<->hub connectivity" to check and insert each direction separately using the variables spoke_network_cidr and hub_network_cidr (e.g., grep for both "{{ spoke_network_cidr }}.*{{ hub_network_cidr }}" and "{{ hub_network_cidr }}.*{{ spoke_network_cidr }}" or run two independent checks) so you only insert missing rules and avoid duplicates; adjust the echo messages and changed_when (registered nft_result.stdout) to reflect per-direction additions; additionally, if persistence is required, save the nftables ruleset after changes (e.g., run "sudo nft list ruleset > /etc/nftables.conf" or use the distro's nftables persistence mechanism) so rules survive reboots.
23-31:changed_when: truedoesn't accurately reflect actual state changes.Both
virsh net-startandvirsh net-autostarttasks reportchanged_when: trueunconditionally, even when the network was already started or autostart was already enabled. This breaks idempotent reporting.♻️ Suggested improvement for accurate change detection
- name: Start spoke libvirt network shell: | - sudo virsh net-start {{ spoke_network_name }} 2>/dev/null || true - changed_when: true + if sudo virsh net-info {{ spoke_network_name }} 2>/dev/null | grep -q "Active:.*yes"; then + echo "ALREADY_RUNNING" + else + sudo virsh net-start {{ spoke_network_name }} && echo "STARTED" + fi + register: net_start_result + changed_when: "'STARTED' in net_start_result.stdout" - name: Set spoke libvirt network to autostart shell: | - sudo virsh net-autostart {{ spoke_network_name }} - changed_when: true + if sudo virsh net-info {{ spoke_network_name }} 2>/dev/null | grep -q "Autostart:.*yes"; then + echo "ALREADY_AUTOSTART" + else + sudo virsh net-autostart {{ spoke_network_name }} && echo "ENABLED" + fi + register: net_autostart_result + changed_when: "'ENABLED' in net_autostart_result.stdout"As per coding guidelines: "Make deployments idempotent in Ansible playbooks and scripts".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/create-spoke-network.yml` around lines 23 - 31, The two tasks "Start spoke libvirt network" and "Set spoke libvirt network to autostart" should stop unconditionally reporting changes; register each command's result (e.g., start_result, autostart_result) and set changed_when to a real change-detection expression instead of true — for example, for "Start spoke libvirt network" check the network's current state with virsh net-list --all or inspect start_result.rc/output to only mark changed when the network was actually started, and for "Set spoke libvirt network to autostart" query virsh net-info or parse autostart_result to only mark changed when autostart was newly enabled. Ensure both tasks use the registered result in failed_when/changed_when to preserve idempotence and accurate reporting.deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/setup-ksushy.yml (2)
19-25: Redfish endpoint query lacks error handling for connection failures.If sushy-tools is running but the curl request fails (network issue, wrong IP/port), the task will fail with a JSON parsing error rather than a clear connection error message.
♻️ Add connection error handling
- name: Verify spoke VMs are visible via sushy-tools shell: | - curl -sk https://{{ spoke_ksushy_ip }}:{{ spoke_ksushy_port }}/redfish/v1/Systems/ \ + RESPONSE=$(curl -sk --connect-timeout 10 https://{{ spoke_ksushy_ip }}:{{ spoke_ksushy_port }}/redfish/v1/Systems/ \ -u {{ spoke_bmc_user }}:{{ spoke_bmc_password }} \ - | python3 -c "import json,sys; d=json.load(sys.stdin); print(d['Members@odata.count'])" + 2>&1) || { echo "ERROR: Failed to connect to sushy-tools at {{ spoke_ksushy_ip }}:{{ spoke_ksushy_port }}" >&2; exit 1; } + echo "$RESPONSE" | python3 -c "import json,sys; d=json.load(sys.stdin); print(d['Members@odata.count'])" register: sushy_systems changed_when: false🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/setup-ksushy.yml` around lines 19 - 25, The Redfish curl+python task "Verify spoke VMs are visible via sushy-tools" can raise a JSON parse error when curl fails; update this shell task to detect curl failures before JSON parsing by using curl with --fail/-f and capturing its exit code and output into the registered variable sushy_systems, then add a failed_when that checks curl succeeded (e.g., sushy_systems.rc == 0) and otherwise fail with a clear message including the curl stderr/output and the spoke_ksushy_ip/spoke_ksushy_port; only run the python/json extraction when curl succeeded (or guard the python command to return a sane empty JSON on failure) so the JSON parser never receives non-JSON input.
27-34: Assert may fail if spoke VMs aren't created yet.The assertion checks for
spoke_ctlplanes + 2systems, but according to the task flow in the README (and main.yml context),setup-ksushy.ymlruns aftercreate-spoke-vms.yml. However, if the task order changes or VMs fail to register with sushy-tools immediately, this could produce confusing failures.Consider whether this is intentional validation or if the check should be more lenient.
The current logic assumes spoke VMs exist and are visible to sushy-tools. If this is a pre-flight check, the assertion message could clarify that spoke VMs must already be created and registered.
deploy/openshift-clusters/roles/assisted/assisted-spoke/templates/agentclusterinstall.yml.j2 (1)
14-19: Consider makinghostPrefixconfigurable.The
hostPrefix: 23is hardcoded, which works for the current/14pod network and 2-node TNF topology. For future flexibility with different cluster network CIDRs or scaling needs, consider making this configurable via a role variable with a sensible default.♻️ Proposed change to make hostPrefix configurable
In
defaults/main.yml:spoke_host_prefix: 23In template:
networking: clusterNetwork: - cidr: "{{ spoke_cluster_network_cidr }}" - hostPrefix: 23 + hostPrefix: {{ spoke_host_prefix }}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/openshift-clusters/roles/assisted/assisted-spoke/templates/agentclusterinstall.yml.j2` around lines 14 - 19, The template currently hardcodes hostPrefix: 23 in agentclusterinstall.yml.j2; make this value configurable by introducing a role variable (e.g., spoke_host_prefix) in defaults/main.yml with a sensible default of 23 and replace the literal 23 in the template with the variable reference (spoke_host_prefix) so different clusterNetwork CIDRs or topologies can override it; update any role README or variable docs to mention spoke_host_prefix and its purpose.deploy/openshift-clusters/roles/assisted/acm-install/tasks/enable-watch-all-namespaces.yml (1)
30-38: Consider adding rescue diagnostics for metal3 pod wait timeout.Unlike other wait loops in this PR that use block/rescue patterns for timeout diagnostics, this wait loop will fail with minimal context if the metal3 pod doesn't stabilize. Consider adding a rescue block to collect pod status and events on timeout for easier troubleshooting.
♻️ Proposed improvement with rescue diagnostics
-- name: Wait for metal3 pod to stabilize after provisioning change +- block: + - name: Wait for metal3 pod to stabilize after provisioning change + shell: | + oc get pods -n openshift-machine-api \ + -l baremetal.openshift.io/cluster-baremetal-operator=metal3-state \ + --no-headers 2>/dev/null | grep -v Terminating | grep -q Running + register: metal3_pod + until: metal3_pod.rc == 0 + retries: "{{ (metal3_stabilize_timeout / 15) | int }}" + delay: 15 + rescue: + - name: Collect metal3 pod diagnostics + shell: | + echo "=== Metal3 Pods ===" + oc get pods -n openshift-machine-api -l baremetal.openshift.io/cluster-baremetal-operator=metal3-state + echo "" + echo "=== Recent Events ===" + oc get events -n openshift-machine-api --sort-by='.lastTimestamp' | tail -15 + register: metal3_diag + changed_when: false + failed_when: false + + - name: Display metal3 pod diagnostics + debug: + msg: "{{ metal3_diag.stdout }}" + + - name: Fail after metal3 pod timeout + fail: + msg: "metal3 pod did not stabilize within timeout" - shell: | - oc get pods -n openshift-machine-api \ - -l baremetal.openshift.io/cluster-baremetal-operator=metal3-state \ - --no-headers 2>/dev/null | grep -v Terminating | grep -q Running - register: metal3_pod - until: metal3_pod.rc == 0 - retries: "{{ (metal3_stabilize_timeout / 15) | int }}" - delay: 15🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/openshift-clusters/roles/assisted/acm-install/tasks/enable-watch-all-namespaces.yml` around lines 30 - 38, Wrap the existing task "Wait for metal3 pod to stabilize after provisioning change" (the shell task that sets metal3_pod and uses until with metal3_stabilize_timeout) into an Ansible block with a rescue section; on timeout (rescue) run diagnostic commands like oc get pods -n openshift-machine-api -o wide, oc describe pod for the metal3 pod(s) and oc get events -n openshift-machine-api, register their outputs and log them (via debug or copy to a file) so you surface pod status, descriptions and recent events when metal3_pod.wait fails; ensure you still preserve the same retry/delay logic and variables (metal3_pod, metal3_stabilize_timeout) inside the block.deploy/openshift-clusters/assisted-install.yml (1)
148-154: Clarify proxy.env source in deployment summary.Line 149 instructs users to
source proxy.envfor spoke access, but this file gets overwritten byproxy-setuprole. The message should clarify this is the spoke'sproxy.env. Consider making the instruction more explicit.♻️ Proposed clarification
Access spoke cluster: - source proxy.env + source proxy.env # (spoke cluster proxy - written by proxy-setup role) KUBECONFIG={{ spoke_kubeconfig_path }} oc get nodes Access hub cluster: - source hub-proxy.env + source hub-proxy.env # (hub cluster proxy - preserved from original deployment) KUBECONFIG=~/auth/kubeconfig oc get nodes🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/openshift-clusters/assisted-install.yml` around lines 148 - 154, The deployment summary currently says to "source proxy.env" which is ambiguous because proxy.env is overwritten by the proxy-setup role; update the text to explicitly state "source the spoke proxy.env (generated by the proxy-setup role for the spoke cluster)" when instructing spoke access and reference the spoke_kubeconfig_path variable in the same sentence; similarly, clarify hub access as "source hub-proxy.env (for the hub cluster)" to avoid confusion between spoke and hub proxy files.deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/create-spoke-vms.yml (1)
37-42: Consider reporting change when VMs are actually destroyed.The
changed_when: falsesuppresses change reporting even when VMs are successfully destroyed. While this is functionally correct (the goal is to ensure VMs are off), it reduces visibility in Ansible output. Consider checking if destruction actually occurred.♻️ Proposed improvement for accurate change reporting
- name: Ensure spoke VMs are shut off shell: | - sudo virsh destroy {{ spoke_cluster_name }}-master-{{ item }} 2>/dev/null || true + if sudo virsh domstate {{ spoke_cluster_name }}-master-{{ item }} 2>/dev/null | grep -q running; then + sudo virsh destroy {{ spoke_cluster_name }}-master-{{ item }} + echo "DESTROYED" + else + echo "ALREADY_OFF" + fi loop: "{{ range(spoke_ctlplanes) | list }}" - changed_when: false - failed_when: false + register: destroy_result + changed_when: "'DESTROYED' in destroy_result.stdout" + failed_when: false🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/create-spoke-vms.yml` around lines 37 - 42, The task "Ensure spoke VMs are shut off" currently forces no change reporting; instead, register the shell result (e.g., register: virsh_destroy) for the loop that runs "sudo virsh destroy {{ spoke_cluster_name }}-master-{{ item }}" over "{{ range(spoke_ctlplanes) | list }}" and set changed_when based on the command outcome (mark changed when virsh actually destroyed a domain and not when it returned "cannot find domain" or similar). Update failed_when accordingly to preserve suppression of real errors while allowing changed_when to reflect actual destructions.deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/create-bmh.yml (1)
52-53: Treatconfiguredoutput as a change too.
oc applyreports updates asconfigured, so this predicate hides real Secret/BMH reconciliations on reruns.Suggested change
- changed_when: "'created' in bmh_result.stdout" + changed_when: > + 'created' in bmh_result.stdout or + 'configured' in bmh_result.stdoutAs per coding guidelines,
deploy/**/*.{yml,yaml,sh}: Make deployments idempotent in Ansible playbooks and scripts.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/create-bmh.yml` around lines 52 - 53, The current changed_when predicate in create-bmh.yml only treats "'created' in bmh_result.stdout" as a change, so updates reported as "configured" by oc apply are ignored; update the changed_when condition that references bmh_result to also consider "configured" (and possibly "patched"/other positive indicators) in bmh_result.stdout so that Secret/BMH reconciliations on reruns are marked as changed (e.g., expand the predicate for the registered variable bmh_result to detect "configured" in addition to "created").deploy/openshift-clusters/vars/assisted.yml.template (1)
24-29: Don’t documentspoke_ctlplanesas two-node-only here.
spoke_ctlplanesis parameterized, but this comment hardcodes the assisted path to2. Please document the supported values instead of saying it is always two-node so the new workflow stays aligned with the repo’s topology requirements.As per coding guidelines,
deploy/openshift-clusters/**: Support both SNO (Single Node OpenShift) and two-node topologies in cluster management code.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/openshift-clusters/vars/assisted.yml.template` around lines 24 - 29, The comment for the template variable spoke_ctlplanes incorrectly states it is "Always 2 for TNF"; update the comment next to the spoke_ctlplanes declaration to document the supported values (e.g., 1 for SNO, 2 for two-node/TNF) and note that it is parameterized rather than fixed, so the variable (spoke_ctlplanes) can be set to either 1 or 2 depending on topology; locate the spoke_ctlplanes line in assisted.yml.template and replace the hardcoded "Always 2 for TNF" text with a brief description of allowed values and intended use.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deploy/Makefile`:
- Around line 79-81: The Makefile defines a target fencing-assisted but the
validation variable VALID_CLUSTER_TYPES does not include "fencing-assisted",
causing make deploy fencing-assisted to fail; update the VALID_CLUSTER_TYPES
list (the variable named VALID_CLUSTER_TYPES) to include the token
"fencing-assisted" or adjust the validation logic that checks cluster types to
accept this alias so the fencing-assisted target (the fencing-assisted recipe
that calls fencing-ipi and
./openshift-clusters/scripts/deploy-fencing-assisted.sh) can run without being
blocked by arg validation.
In
`@deploy/openshift-clusters/roles/assisted/acm-install/tasks/agent-service-config.yml`:
- Around line 72-74: The task "Get RHCOS version from ISO URL" currently sets
rhcos_version by applying regex_search(... ) | first which can fail if the regex
doesn't match; update this task to validate the extraction from rhcos_iso_url by
capturing the regex result into a temporary variable (e.g.,
rhcos_version_matches) using regex_search, then assert or fail with a clear
error via the fail module if the list is empty, or set rhcos_version using a
safe default via the default filter only after confirming matches exist;
reference the existing symbols rhcos_iso_url, rhcos_version, regex_search and
the task name to locate and modify the code.
In
`@deploy/openshift-clusters/roles/assisted/acm-install/tasks/install-operator.yml`:
- Around line 107-153: The play currently only creates and waits for
MultiClusterHub when hub_operator == 'acm', so when hub_operator == 'mce' no
operand CR is created; add an equivalent branch for MCE: template a
multiclusterengine.yml.j2 (similar to multiclusterhub.yml.j2) and write it to
/tmp/multiclusterengine.yml, then oc apply -f /tmp/multiclusterengine.yml
(registering result as e.g. mce_result), and add a wait block that polls the
MultiClusterEngine resource (oc get multiclusterengine multiclusterengine -n {{
op_config.namespace }} -o jsonpath='{.status.phase}' or appropriate ready field)
with retries/delay and a rescue block that collects the same diagnostics and
fails with a clear message; guard all these tasks with when: hub_operator ==
'mce' and reuse naming patterns (task names, registered vars) analogous to the
existing MultiClusterHub tasks.
- Around line 31-40: Add an explicit guard after the "Auto-detect channel from
packagemanifest" task to fail fast when detection fails or returns empty: insert
a task (e.g., "Fail if auto-detected channel empty") that uses the fail module
and triggers when detected_channel.rc != 0 or detected_channel.stdout | trim ==
'' and include a clear message mentioning op_config.package_name and the
detected_channel output; keep the existing "Set operator channel fact" task only
executed when detected_channel.stdout | trim is not empty (or rely on the fail
task to stop execution).
In `@deploy/openshift-clusters/roles/assisted/acm-install/tasks/storage.yml`:
- Around line 48-58: The task "Fix hostPath permissions and SELinux context on
each hub node" unconditionally runs rm -rf on {{ assisted_images_path }} and {{
assisted_db_path }}, risking data loss on re-runs; remove the unconditional
deletion from that task and instead create a separate "Clean hostPath
directories" task that runs the rm -rf only when a boolean flag (e.g.,
force_cleanup | default(false) | bool) is true, loop over
hub_nodes.stdout_lines, and keep the permissions/SELinux chown/chcon steps in
the original task so the selinux_fix registration still applies.
In `@deploy/openshift-clusters/roles/assisted/acm-install/vars/main.yml`:
- Around line 7-17: The hard-coded catalog source values under
operator_config.acm.source and operator_config.mce.source prevent disconnected
deployments; replace those literals with configurable variables (e.g.,
acm_catalog_source and mce_catalog_source or a single operator_catalog_source)
and update references in the vars file so operator_config.acm.source and
operator_config.mce.source use those variables (provide sensible defaults like
"redhat-operators" to preserve current behavior). Ensure the variable names are
declared in the same vars file so mirrored catalogs can be supplied in
disconnected environments and that both the acm and mce blocks reference the new
variables.
In
`@deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/retrieve-credentials.yml`:
- Around line 10-18: The "Wait for admin-kubeconfig secret" task uses fixed
retries/delay (10 × 15s) which is inconsistent with the password wait that uses
spoke_credentials_timeout; update this task (named "Wait for admin-kubeconfig
secret" in retrieve-credentials.yml) to derive retries and delay from the same
spoke_credentials_timeout variable used elsewhere (use the same chunk size as
the password wait, e.g. 30s) so the total timeout is configurable and consistent
with the password secret wait.
In
`@deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/wait-for-install.yml`:
- Around line 107-138: The task "Wait for all agents to reach Done stage"
currently succeeds when every currently listed agent is Done, but it must also
ensure the cluster has the expected agent count; update the shell/python check
so it only exits 0 when done == total AND total == spoke_ctlplanes. Inject the
expected count (spoke_ctlplanes) into the embedded Python (via Jinja variable
interpolation or an exported env var) and compare int(total) to
int(spoke_ctlplanes) before exiting; also print a clear message when counts
mismatch and keep the registered name agents_done and the until condition
semantics unchanged.
In
`@deploy/openshift-clusters/roles/assisted/assisted-spoke/templates/clusterimageset.yml.j2`:
- Line 6: The YAML template sets releaseImage: {{ effective_release_image }}
without quotes which can break when the pull spec contains special characters
(':' or '@'); update the clusterimageset.yml.j2 template to render the
releaseImage value as a quoted string (wrap the Jinja expression for
effective_release_image in quotes) so releaseImage is always emitted as a proper
YAML string.
In
`@deploy/openshift-clusters/roles/assisted/assisted-spoke/templates/spoke-network.xml.j2`:
- Line 8: The template uses {{ spoke_network_name }} (derived from
spoke_cluster_name) as the libvirt bridge name which is limited to 15
characters; add either a pre-flight validation that asserts
len(spoke_cluster_name) ≤ 15 (fail early with a clear message referencing
spoke_cluster_name/spoke_network_name and virsh net-define) or update the README
variable documentation for spoke_cluster_name to explicitly state the
15-character Linux bridge name limit and show an example; ensure the check/log
mentions the exact constraint and references the libvirt/virsh failure mode so
maintainers know why the restriction exists.
In `@deploy/openshift-clusters/scripts/deploy-fencing-assisted.sh`:
- Line 1: The script's shebang uses /bin/bash but must follow guidelines to use
/usr/bin/bash; update the top line of deploy-fencing-assisted.sh (the existing
"#!/bin/bash") to "#!/usr/bin/bash" so the script invokes the expected bash
binary location.
In `@deploy/openshift-clusters/vars/assisted.yml.template`:
- Around line 4-5: Update the usage comment in assisted.yml.template to
reference the correct make target: change the second sentence from "run 'make
fencing-assisted' to deploy a spoke TNF cluster via ACM/assisted installer." to
use "run 'make deploy fencing-assisted'" so it matches the PR description and
the actual entrypoint; ensure the wording stays consistent with the first
sentence that references 'make deploy fencing-ipi'.
---
Nitpick comments:
In `@CLAUDE.md`:
- Around line 135-139: Update the "Assisted Installer Method" section to
reference the vars/assisted.yml.template and call out the key variables users
must customize (e.g., spoke_cluster_name, spoke_network_cidr) and/or add a link
to the role README for full variable documentation; ensure you mention that
users should copy vars/assisted.yml.template to vars/assisted.yml and populate
those variables before running the installer and optionally list a couple of
other common variables to customize.
In `@deploy/openshift-clusters/assisted-install.yml`:
- Around line 148-154: The deployment summary currently says to "source
proxy.env" which is ambiguous because proxy.env is overwritten by the
proxy-setup role; update the text to explicitly state "source the spoke
proxy.env (generated by the proxy-setup role for the spoke cluster)" when
instructing spoke access and reference the spoke_kubeconfig_path variable in the
same sentence; similarly, clarify hub access as "source hub-proxy.env (for the
hub cluster)" to avoid confusion between spoke and hub proxy files.
In `@deploy/openshift-clusters/roles/assisted/acm-install/tasks/enable-tnf.yml`:
- Around line 34-41: The current task "Verify TNF support is enabled" uses a
shell pipeline with head -1 and then oc exec, which will error unclearly if no
assisted-service pod exists; change the task to first capture the pod name (from
oc get pod -n {{ assisted_service_namespace }} -l app=assisted-service -o name |
head -1) into a variable, add an explicit guard/failure if that variable is
empty (fail with a clear message referencing assisted_service_namespace and the
app=assisted-service selector), and only then run the oc exec to check
TNF_CLUSTERS_SUPPORT, keeping the tnf_verify registration and failed_when logic
intact but relying on the pre-check to avoid silent/unclear failures.
In
`@deploy/openshift-clusters/roles/assisted/acm-install/tasks/enable-watch-all-namespaces.yml`:
- Around line 30-38: Wrap the existing task "Wait for metal3 pod to stabilize
after provisioning change" (the shell task that sets metal3_pod and uses until
with metal3_stabilize_timeout) into an Ansible block with a rescue section; on
timeout (rescue) run diagnostic commands like oc get pods -n
openshift-machine-api -o wide, oc describe pod for the metal3 pod(s) and oc get
events -n openshift-machine-api, register their outputs and log them (via debug
or copy to a file) so you surface pod status, descriptions and recent events
when metal3_pod.wait fails; ensure you still preserve the same retry/delay logic
and variables (metal3_pod, metal3_stabilize_timeout) inside the block.
In `@deploy/openshift-clusters/roles/assisted/acm-install/tasks/storage.yml`:
- Around line 9-18: Add nodeAffinity to the PersistentVolume definition for
assisted-pv-images so the hostPath PV binds only to the node that actually
contains {{ assisted_images_path }}; update the PersistentVolume resource
(metadata.name: assisted-pv-images, spec.hostPath) to include a
spec.nodeAffinity.required.nodeSelectorTerms matching the node's hostname (e.g.,
kubernetes.io/hostname) or otherwise ensure PVs are templated per-node, and
adjust the playbook/template to capture a target node name (hub_nodes or similar
variable) before creating the PV so the nodeAffinity can reference that node.
In
`@deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/create-bmh.yml`:
- Around line 52-53: The current changed_when predicate in create-bmh.yml only
treats "'created' in bmh_result.stdout" as a change, so updates reported as
"configured" by oc apply are ignored; update the changed_when condition that
references bmh_result to also consider "configured" (and possibly
"patched"/other positive indicators) in bmh_result.stdout so that Secret/BMH
reconciliations on reruns are marked as changed (e.g., expand the predicate for
the registered variable bmh_result to detect "configured" in addition to
"created").
In
`@deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/create-cluster-resources.yml`:
- Around line 39-43: The task "Apply ClusterImageSet" currently sets
changed_when to only detect "'created' in cis_result.stdout" which misses
updates; change the condition to detect both creation and updates (e.g., use
"changed_when: \"'created' in cis_result.stdout or 'configured' in
cis_result.stdout\"" or include 'patched' if desired) for the cis_result
variable, and apply the same change to the other three "Apply ..." tasks that
register their own result variables so they report changes on both create and
update.
- Around line 22-31: The current "Create spoke pull secret" shell task uses a
check-then-create pattern and should be replaced with an idempotent apply;
change the task that references spoke_cluster_name and pull_secret_path to
generate a secret manifest (or run oc create secret generic ...
--from-file=.dockerconfigjson={{ pull_secret_path }}
--type=kubernetes.io/dockerconfigjson --dry-run=client -o yaml) and pipe that
into oc apply -n {{ spoke_cluster_name }} -f - so the operation is atomic and
avoids the TOCTOU race; remove reliance on ps_result and adjust changed_when to
rely on the apply result (or let Ansible detect changes) instead.
In
`@deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/create-spoke-network.yml`:
- Around line 33-44: Update the shell task "Add cross-bridge nftables FORWARD
rules for spoke<->hub connectivity" to check and insert each direction
separately using the variables spoke_network_cidr and hub_network_cidr (e.g.,
grep for both "{{ spoke_network_cidr }}.*{{ hub_network_cidr }}" and "{{
hub_network_cidr }}.*{{ spoke_network_cidr }}" or run two independent checks) so
you only insert missing rules and avoid duplicates; adjust the echo messages and
changed_when (registered nft_result.stdout) to reflect per-direction additions;
additionally, if persistence is required, save the nftables ruleset after
changes (e.g., run "sudo nft list ruleset > /etc/nftables.conf" or use the
distro's nftables persistence mechanism) so rules survive reboots.
- Around line 23-31: The two tasks "Start spoke libvirt network" and "Set spoke
libvirt network to autostart" should stop unconditionally reporting changes;
register each command's result (e.g., start_result, autostart_result) and set
changed_when to a real change-detection expression instead of true — for
example, for "Start spoke libvirt network" check the network's current state
with virsh net-list --all or inspect start_result.rc/output to only mark changed
when the network was actually started, and for "Set spoke libvirt network to
autostart" query virsh net-info or parse autostart_result to only mark changed
when autostart was newly enabled. Ensure both tasks use the registered result in
failed_when/changed_when to preserve idempotence and accurate reporting.
In
`@deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/create-spoke-vms.yml`:
- Around line 37-42: The task "Ensure spoke VMs are shut off" currently forces
no change reporting; instead, register the shell result (e.g., register:
virsh_destroy) for the loop that runs "sudo virsh destroy {{ spoke_cluster_name
}}-master-{{ item }}" over "{{ range(spoke_ctlplanes) | list }}" and set
changed_when based on the command outcome (mark changed when virsh actually
destroyed a domain and not when it returned "cannot find domain" or similar).
Update failed_when accordingly to preserve suppression of real errors while
allowing changed_when to reflect actual destructions.
In `@deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/main.yml`:
- Around line 4-30: The block that runs spoke deployment tasks (the block
including include_tasks: cleanup.yml, create-spoke-network.yml,
create-spoke-vms.yml, setup-ksushy.yml, create-cluster-resources.yml,
create-bmh.yml, wait-for-install.yml, retrieve-credentials.yml) needs a rescue:
section to capture diagnostics on failure; add a rescue: that runs a diagnostic
include_tasks (e.g. include_tasks: diagnostics.yml or include_tasks:
gather-spoke-diagnostics.yml) and any necessary cleanup/notify steps, ensure the
rescue runs with the same environment KUBECONFIG: "{{ hub_kubeconfig }}", and
reference the same task names so failures during Create spoke VMs / Wait for
spoke cluster installation / Retrieve spoke cluster credentials trigger
diagnostic collection.
In
`@deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/setup-ksushy.yml`:
- Around line 19-25: The Redfish curl+python task "Verify spoke VMs are visible
via sushy-tools" can raise a JSON parse error when curl fails; update this shell
task to detect curl failures before JSON parsing by using curl with --fail/-f
and capturing its exit code and output into the registered variable
sushy_systems, then add a failed_when that checks curl succeeded (e.g.,
sushy_systems.rc == 0) and otherwise fail with a clear message including the
curl stderr/output and the spoke_ksushy_ip/spoke_ksushy_port; only run the
python/json extraction when curl succeeded (or guard the python command to
return a sane empty JSON on failure) so the JSON parser never receives non-JSON
input.
In
`@deploy/openshift-clusters/roles/assisted/assisted-spoke/templates/agentclusterinstall.yml.j2`:
- Around line 14-19: The template currently hardcodes hostPrefix: 23 in
agentclusterinstall.yml.j2; make this value configurable by introducing a role
variable (e.g., spoke_host_prefix) in defaults/main.yml with a sensible default
of 23 and replace the literal 23 in the template with the variable reference
(spoke_host_prefix) so different clusterNetwork CIDRs or topologies can override
it; update any role README or variable docs to mention spoke_host_prefix and its
purpose.
In `@deploy/openshift-clusters/scripts/deploy-fencing-assisted.sh`:
- Around line 38-42: The current extraction of SPOKE_CLUSTER_NAME using grep/awk
is fragile; replace the pipeline that sets SPOKE_CLUSTER_NAME by invoking a
YAML-aware parser (e.g., yq or a small Python snippet) to read the
spoke_cluster_name key from vars/assisted.yml and fall back to "spoke-tnf" if
empty; update the logic surrounding the SPOKE_CLUSTER_NAME variable so the
script uses the parsed value (keep the same variable name SPOKE_CLUSTER_NAME)
and preserve the existing empty-check fallback.
In `@deploy/openshift-clusters/vars/assisted.yml.template`:
- Around line 24-29: The comment for the template variable spoke_ctlplanes
incorrectly states it is "Always 2 for TNF"; update the comment next to the
spoke_ctlplanes declaration to document the supported values (e.g., 1 for SNO, 2
for two-node/TNF) and note that it is parameterized rather than fixed, so the
variable (spoke_ctlplanes) can be set to either 1 or 2 depending on topology;
locate the spoke_ctlplanes line in assisted.yml.template and replace the
hardcoded "Always 2 for TNF" text with a brief description of allowed values and
intended use.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 85da9815-129b-4cc2-8bbf-4b85d5ca6970
📒 Files selected for processing (37)
CLAUDE.mddeploy/Makefiledeploy/aws-hypervisor/scripts/create.shdeploy/openshift-clusters/assisted-install.ymldeploy/openshift-clusters/collections/requirements.ymldeploy/openshift-clusters/roles/assisted/acm-install/README.mddeploy/openshift-clusters/roles/assisted/acm-install/defaults/main.ymldeploy/openshift-clusters/roles/assisted/acm-install/tasks/agent-service-config.ymldeploy/openshift-clusters/roles/assisted/acm-install/tasks/enable-tnf.ymldeploy/openshift-clusters/roles/assisted/acm-install/tasks/enable-watch-all-namespaces.ymldeploy/openshift-clusters/roles/assisted/acm-install/tasks/install-operator.ymldeploy/openshift-clusters/roles/assisted/acm-install/tasks/main.ymldeploy/openshift-clusters/roles/assisted/acm-install/tasks/storage.ymldeploy/openshift-clusters/roles/assisted/acm-install/tasks/validate.ymldeploy/openshift-clusters/roles/assisted/acm-install/templates/agentserviceconfig.yml.j2deploy/openshift-clusters/roles/assisted/acm-install/templates/multiclusterhub.yml.j2deploy/openshift-clusters/roles/assisted/acm-install/templates/operator-subscription.yml.j2deploy/openshift-clusters/roles/assisted/acm-install/vars/main.ymldeploy/openshift-clusters/roles/assisted/assisted-spoke/README.mddeploy/openshift-clusters/roles/assisted/assisted-spoke/defaults/main.ymldeploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/cleanup.ymldeploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/create-bmh.ymldeploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/create-cluster-resources.ymldeploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/create-spoke-network.ymldeploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/create-spoke-vms.ymldeploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/main.ymldeploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/retrieve-credentials.ymldeploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/setup-ksushy.ymldeploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/wait-for-install.ymldeploy/openshift-clusters/roles/assisted/assisted-spoke/templates/agentclusterinstall.yml.j2deploy/openshift-clusters/roles/assisted/assisted-spoke/templates/clusterdeployment.yml.j2deploy/openshift-clusters/roles/assisted/assisted-spoke/templates/clusterimageset.yml.j2deploy/openshift-clusters/roles/assisted/assisted-spoke/templates/infraenv.yml.j2deploy/openshift-clusters/roles/assisted/assisted-spoke/templates/spoke-network.xml.j2deploy/openshift-clusters/roles/assisted/assisted-spoke/vars/main.ymldeploy/openshift-clusters/scripts/deploy-fencing-assisted.shdeploy/openshift-clusters/vars/assisted.yml.template
| - name: Get RHCOS version from ISO URL | ||
| set_fact: | ||
| rhcos_version: "{{ rhcos_iso_url | regex_search('rhcos-([\\d.]+-\\d+)-live', '\\1') | first }}" |
There was a problem hiding this comment.
Add validation for rhcos_version extraction.
The regex extraction assumes the RHCOS ISO URL follows a specific naming pattern. If the URL format changes or the regex doesn't match, rhcos_version could be empty or cause a Jinja2 error (if | first is called on an empty list). Consider adding validation.
🛡️ Proposed validation
- name: Get RHCOS version from ISO URL
set_fact:
- rhcos_version: "{{ rhcos_iso_url | regex_search('rhcos-([\\d.]+-\\d+)-live', '\\1') | first }}"
+ rhcos_version: "{{ rhcos_iso_url | regex_search('rhcos-([\\d.]+-\\d+)-live', '\\1') | first | default('unknown') }}"
+
+- name: Warn if RHCOS version could not be extracted
+ debug:
+ msg: "WARNING: Could not extract RHCOS version from URL, using 'unknown'"
+ when: rhcos_version == 'unknown'📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Get RHCOS version from ISO URL | |
| set_fact: | |
| rhcos_version: "{{ rhcos_iso_url | regex_search('rhcos-([\\d.]+-\\d+)-live', '\\1') | first }}" | |
| - name: Get RHCOS version from ISO URL | |
| set_fact: | |
| rhcos_version: "{{ rhcos_iso_url | regex_search('rhcos-([\\d.]+-\\d+)-live', '\\1') | default([], true) | first | default('unknown') }}" | |
| - name: Warn if RHCOS version could not be extracted | |
| debug: | |
| msg: "WARNING: Could not extract RHCOS version from URL, using 'unknown'" | |
| when: rhcos_version == 'unknown' |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@deploy/openshift-clusters/roles/assisted/acm-install/tasks/agent-service-config.yml`
around lines 72 - 74, The task "Get RHCOS version from ISO URL" currently sets
rhcos_version by applying regex_search(... ) | first which can fail if the regex
doesn't match; update this task to validate the extraction from rhcos_iso_url by
capturing the regex result into a temporary variable (e.g.,
rhcos_version_matches) using regex_search, then assert or fail with a clear
error via the fail module if the list is empty, or set rhcos_version using a
safe default via the default filter only after confirming matches exist;
reference the existing symbols rhcos_iso_url, rhcos_version, regex_search and
the task name to locate and modify the code.
deploy/openshift-clusters/roles/assisted/acm-install/tasks/install-operator.yml
Show resolved
Hide resolved
| # Create MultiClusterHub (for ACM) or MultiClusterEngine (for MCE) | ||
| - name: Create MultiClusterHub CR | ||
| template: | ||
| src: multiclusterhub.yml.j2 | ||
| dest: /tmp/multiclusterhub.yml | ||
| mode: '0644' | ||
| when: hub_operator == 'acm' | ||
|
|
||
| - name: Apply MultiClusterHub CR | ||
| shell: | | ||
| oc apply -f /tmp/multiclusterhub.yml | ||
| register: mch_result | ||
| changed_when: "'created' in mch_result.stdout" | ||
| when: hub_operator == 'acm' | ||
|
|
||
| - block: | ||
| - name: Wait for MultiClusterHub to reach Running phase | ||
| shell: | | ||
| oc get multiclusterhub multiclusterhub -n {{ op_config.namespace }} \ | ||
| -o jsonpath='{.status.phase}' | ||
| register: mch_phase | ||
| until: mch_phase.stdout == 'Running' | ||
| retries: "{{ (multiclusterhub_timeout / 30) | int }}" | ||
| delay: 30 | ||
| rescue: | ||
| - name: Collect MCH timeout diagnostics | ||
| shell: | | ||
| echo "=== MultiClusterHub Status ===" | ||
| oc get multiclusterhub multiclusterhub -n {{ op_config.namespace }} -o yaml 2>/dev/null | grep -A 50 'status:' | ||
| echo "" | ||
| echo "=== Non-Running Pods ===" | ||
| oc get pods -n {{ op_config.namespace }} --no-headers 2>/dev/null | grep -v Running | ||
| echo "" | ||
| echo "=== Recent Events ===" | ||
| oc get events -n {{ op_config.namespace }} --sort-by='.lastTimestamp' 2>/dev/null | tail -20 | ||
| register: mch_diag | ||
| changed_when: false | ||
| failed_when: false | ||
|
|
||
| - name: Display MCH timeout diagnostics | ||
| debug: | ||
| msg: "{{ mch_diag.stdout }}" | ||
|
|
||
| - name: Fail after MCH timeout | ||
| fail: | ||
| msg: "MultiClusterHub did not reach Running phase within timeout" | ||
| when: hub_operator == 'acm' |
There was a problem hiding this comment.
hub_operator: mce never creates the operand CR.
This task file exposes ACM/MCE selection, but the operand creation path only exists for ACM: it templates/applies multiclusterhub.yml.j2 and waits on MultiClusterHub. When hub_operator is mce, the CSV may install successfully but no MultiClusterEngine resource is ever created, so the assisted-service stack is never brought up.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@deploy/openshift-clusters/roles/assisted/acm-install/tasks/install-operator.yml`
around lines 107 - 153, The play currently only creates and waits for
MultiClusterHub when hub_operator == 'acm', so when hub_operator == 'mce' no
operand CR is created; add an equivalent branch for MCE: template a
multiclusterengine.yml.j2 (similar to multiclusterhub.yml.j2) and write it to
/tmp/multiclusterengine.yml, then oc apply -f /tmp/multiclusterengine.yml
(registering result as e.g. mce_result), and add a wait block that polls the
MultiClusterEngine resource (oc get multiclusterengine multiclusterengine -n {{
op_config.namespace }} -o jsonpath='{.status.phase}' or appropriate ready field)
with retries/delay and a rescue block that collects the same diagnostics and
fails with a clear message; guard all these tasks with when: hub_operator ==
'mce' and reuse naming patterns (task names, registered vars) analogous to the
existing MultiClusterHub tasks.
| - block: | ||
| - name: Wait for all agents to reach Done stage | ||
| shell: | | ||
| oc get agents -n {{ spoke_cluster_name }} -o json 2>/dev/null \ | ||
| | python3 -c " | ||
| import json | ||
| import sys | ||
|
|
||
| data = json.load(sys.stdin) | ||
| agents = data.get('items', []) | ||
| total = len(agents) | ||
| done = 0 | ||
| stuck = [] | ||
| for a in agents: | ||
| stage = a.get('status', {}).get('progress', {}).get('currentStage', 'unknown') | ||
| if stage == 'Done': | ||
| done += 1 | ||
| else: | ||
| state = a.get('status', {}).get('debugInfo', {}).get('state', 'unknown') | ||
| hostname = a.get('spec', {}).get('hostname', 'unknown') | ||
| stuck.append(f'{hostname}: state={state}, stage={stage}') | ||
|
|
||
| print(f'Agents Done: {done} / {total}') | ||
| for s in stuck: | ||
| print(f' {s}') | ||
| sys.exit(0 if done == total else 1) | ||
| " | ||
| register: agents_done | ||
| until: agents_done.rc == 0 | ||
| retries: "{{ (spoke_install_timeout / 30) | int }}" | ||
| delay: 30 | ||
| changed_when: false |
There was a problem hiding this comment.
Also require the expected agent count in the final Done gate.
This block passes whenever every currently listed agent is Done. If the namespace only has 1 agent left after a partial cleanup or host loss, done == total still succeeds even though spoke_ctlplanes was not satisfied at the end. Please make this check require total == spoke_ctlplanes as well.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/wait-for-install.yml`
around lines 107 - 138, The task "Wait for all agents to reach Done stage"
currently succeeds when every currently listed agent is Done, but it must also
ensure the cluster has the expected agent count; update the shell/python check
so it only exits 0 when done == total AND total == spoke_ctlplanes. Inject the
expected count (spoke_ctlplanes) into the embedded Python (via Jinja variable
interpolation or an exported env var) and compare int(total) to
int(spoke_ctlplanes) before exiting; also print a clear message when counts
mismatch and keep the registered name agents_done and the until condition
semantics unchanged.
| metadata: | ||
| name: "{{ effective_ocp_version }}.0" | ||
| spec: | ||
| releaseImage: {{ effective_release_image }} |
There was a problem hiding this comment.
Quote releaseImage to safely handle pull spec special characters.
At Line 6, this value should be quoted (:, @ are common in release images).
💡 Suggested change
- releaseImage: {{ effective_release_image }}
+ releaseImage: "{{ effective_release_image }}"As per coding guidelines, "**/*.{yml,yaml}: Quote strings when they contain special characters in YAML files".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| releaseImage: {{ effective_release_image }} | |
| releaseImage: "{{ effective_release_image }}" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@deploy/openshift-clusters/roles/assisted/assisted-spoke/templates/clusterimageset.yml.j2`
at line 6, The YAML template sets releaseImage: {{ effective_release_image }}
without quotes which can break when the pull spec contains special characters
(':' or '@'); update the clusterimageset.yml.j2 template to render the
releaseImage value as a quoted string (wrap the Jinja expression for
effective_release_image in quotes) so releaseImage is always emitted as a proper
YAML string.
| <port start="1024" end="65535"/> | ||
| </nat> | ||
| </forward> | ||
| <bridge name="{{ spoke_network_name }}" stp="on" delay="0"/> |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if there's any documentation or validation for bridge name length
rg -n "15|IFNAMSIZ|bridge.*name.*length|name.*limit" deploy/openshift-clusters/roles/assisted/Repository: openshift-eng/two-node-toolbox
Length of output: 1100
🏁 Script executed:
# Read the spoke-network.xml.j2 template file
cat -n deploy/openshift-clusters/roles/assisted/assisted-spoke/templates/spoke-network.xml.j2Repository: openshift-eng/two-node-toolbox
Length of output: 1462
🏁 Script executed:
# Check vars/main.yml for spoke_network_name definition
cat -n deploy/openshift-clusters/roles/assisted/assisted-spoke/vars/main.ymlRepository: openshift-eng/two-node-toolbox
Length of output: 930
🏁 Script executed:
# Search for README files in the assisted role
find deploy/openshift-clusters/roles/assisted -name "README*" -o -name "readme*" | head -20Repository: openshift-eng/two-node-toolbox
Length of output: 202
🏁 Script executed:
# Look for any validation logic related to network or bridge names
rg -n "assert|fail|validate" deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/ | grep -i "name\|bridge\|network" | head -20Repository: openshift-eng/two-node-toolbox
Length of output: 56
🏁 Script executed:
# Search for any comments or documentation about network or bridge naming constraints
rg -n "bridge.*name|network.*name" deploy/openshift-clusters/roles/assisted/assisted-spoke/ -A 2 | head -40Repository: openshift-eng/two-node-toolbox
Length of output: 3889
🏁 Script executed:
# Read the assisted-spoke README to check for naming constraints documentation
cat -n deploy/openshift-clusters/roles/assisted/assisted-spoke/README.mdRepository: openshift-eng/two-node-toolbox
Length of output: 6375
🏁 Script executed:
# Check if there are any default values or examples that might set spoke_cluster_name
find deploy/openshift-clusters/roles/assisted/assisted-spoke -name "defaults" -type d
cat deploy/openshift-clusters/roles/assisted/assisted-spoke/defaults/main.yml 2>/dev/null || echo "No defaults file found"Repository: openshift-eng/two-node-toolbox
Length of output: 1055
🏁 Script executed:
# Search for any examples or documentation about cluster names in the entire assisted role
rg -n "spoke_cluster_name|cluster.*name" deploy/openshift-clusters/roles/assisted/ -B 2 -A 2 | grep -i "example\|default\|must\|should\|limit\|constraint\|validation" | head -30Repository: openshift-eng/two-node-toolbox
Length of output: 1202
Add bridge name length validation or documentation.
The spoke_network_name (derived from spoke_cluster_name) is used directly as the libvirt bridge name. Linux bridge interface names are limited to 15 characters. While the default spoke-tnf (8 chars) is safe, custom cluster names exceeding 15 characters would cause virsh net-define to fail. The README documents that spoke_cluster_name must be "DNS-safe" but does not mention this constraint.
Either:
- Add validation in a pre-flight check to assert
spoke_cluster_namelength ≤ 15 characters, or - Document the 15-character limit in the README under the
spoke_cluster_namevariable description
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@deploy/openshift-clusters/roles/assisted/assisted-spoke/templates/spoke-network.xml.j2`
at line 8, The template uses {{ spoke_network_name }} (derived from
spoke_cluster_name) as the libvirt bridge name which is limited to 15
characters; add either a pre-flight validation that asserts
len(spoke_cluster_name) ≤ 15 (fail early with a clear message referencing
spoke_cluster_name/spoke_network_name and virsh net-define) or update the README
variable documentation for spoke_cluster_name to explicitly state the
15-character Linux bridge name limit and show an example; ensure the check/log
mentions the exact constraint and references the libvirt/virsh failure mode so
maintainers know why the restriction exists.
| @@ -0,0 +1,59 @@ | |||
| #!/bin/bash | |||
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Use #!/usr/bin/bash shebang per coding guidelines.
As per coding guidelines, shell scripts should use #!/usr/bin/bash shebang.
♻️ Proposed fix
-#!/bin/bash
+#!/usr/bin/bash📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #!/bin/bash | |
| #!/usr/bin/bash |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deploy/openshift-clusters/scripts/deploy-fencing-assisted.sh` at line 1, The
script's shebang uses /bin/bash but must follow guidelines to use /usr/bin/bash;
update the top line of deploy-fencing-assisted.sh (the existing "#!/bin/bash")
to "#!/usr/bin/bash" so the script invokes the expected bash binary location.
| # Usage: After deploying a hub cluster with 'make deploy fencing-ipi', | ||
| # run 'make fencing-assisted' to deploy a spoke TNF cluster via ACM/assisted installer. |
There was a problem hiding this comment.
The usage comment points to the wrong make target.
The PR description introduces this flow as make deploy fencing-assisted, but this template tells users to run make fencing-assisted. That will send readers to the wrong entrypoint.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deploy/openshift-clusters/vars/assisted.yml.template` around lines 4 - 5,
Update the usage comment in assisted.yml.template to reference the correct make
target: change the second sentence from "run 'make fencing-assisted' to deploy a
spoke TNF cluster via ACM/assisted installer." to use "run 'make deploy
fencing-assisted'" so it matches the PR description and the actual entrypoint;
ensure the wording stays consistent with the first sentence that references
'make deploy fencing-ipi'.
- I7: Add validate.yml for assisted-spoke role that checks cluster name is DNS-safe, ctlplanes >= 2, VIPs are within spoke CIDR and outside DHCP range, and API/Ingress VIPs are different - I12: Remove rm -rf from storage.yml that wiped assisted-service data on every run. Directories are now created idempotently with mkdir -p without destroying existing data Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
deploy/openshift-clusters/roles/assisted/acm-install/tasks/storage.yml (1)
5-40: Consider improvingchanged_whenaccuracy and hostPath data locality.Two minor observations:
changed_whenunderreports:oc applyoutputs"configured"when updating existing resources, so this condition only detects initial creation. Consider:changed_when: "'created' in storage_result.stdout or 'configured' in storage_result.stdout"hostPath without nodeAffinity: In a multi-node hub cluster, if the assisted-service pod migrates to a different node, it will access an empty directory instead of its existing data. If high availability isn't required, consider adding
nodeAffinityto pin the PV to a specific node, or document this as a single-node limitation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/openshift-clusters/roles/assisted/acm-install/tasks/storage.yml` around lines 5 - 40, The play currently only marks changes when oc apply prints "created" (registered to storage_result), so update the changed_when for storage_result to also check for "configured" in storage_result.stdout (e.g., "'created' in storage_result.stdout or 'configured' in storage_result.stdout") to accurately reflect updates; additionally, to avoid hostPath data loss when pods move nodes, add nodeAffinity to the PersistentVolume specs (assisted-pv-images and assisted-pv-db) to pin them to the host node where assisted_images_path/assisted_db_path reside, or explicitly document the limitation that these PVs require a single-node hub if you choose not to add nodeAffinity.deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/validate.yml (1)
42-48: Refactor DHCP exclusion check for readability/maintainability.The logic is correct, but repeated conversions make this block hard to audit. Consider task-level
varsfor precomputed integer values and keepthatconditions compact.♻️ Suggested refactor
- name: Validate VIPs are not in DHCP range (.50-.150) + vars: + dhcp_start_int: "{{ spoke_dhcp_start | ansible.utils.ipaddr('int') | int }}" + dhcp_end_int: "{{ spoke_dhcp_end | ansible.utils.ipaddr('int') | int }}" + api_vip_int: "{{ spoke_api_vip | ansible.utils.ipaddr('int') | int }}" + ingress_vip_int: "{{ spoke_ingress_vip | ansible.utils.ipaddr('int') | int }}" assert: that: - - spoke_api_vip | ansible.utils.ipaddr('int') | int < spoke_dhcp_start | ansible.utils.ipaddr('int') | int or - spoke_api_vip | ansible.utils.ipaddr('int') | int > spoke_dhcp_end | ansible.utils.ipaddr('int') | int - - spoke_ingress_vip | ansible.utils.ipaddr('int') | int < spoke_dhcp_start | ansible.utils.ipaddr('int') | int or - spoke_ingress_vip | ansible.utils.ipaddr('int') | int > spoke_dhcp_end | ansible.utils.ipaddr('int') | int + - api_vip_int < dhcp_start_int or api_vip_int > dhcp_end_int + - ingress_vip_int < dhcp_start_int or ingress_vip_int > dhcp_end_intAs per coding guidelines, "Comment complex configurations in YAML files."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/validate.yml` around lines 42 - 48, Precompute integer forms of the IPs and DHCP bounds to simplify the assert: create task-level vars or a preceding set_fact that converts spoke_api_vip, spoke_ingress_vip, spoke_dhcp_start and spoke_dhcp_end to integers (e.g. api_vip_int, ingress_vip_int, dhcp_start_int, dhcp_end_int) using the ansible.utils.ipaddr('int') filter, then replace the long repeated expressions in the "Validate VIPs are not in DHCP range (.50-.150)" task's assert/that with concise comparisons using those new names (e.g. api_vip_int < dhcp_start_int or api_vip_int > dhcp_end_int and similarly for ingress_vip_int) to improve readability and maintainability.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/validate.yml`:
- Around line 21-24: The assert checks use "ansible.utils.ipaddr(...) is not
none" which is unsafe because the filter returns False for invalid/mismatched
inputs and "False is not none" passes; change each assertion (e.g., the task
named "Validate spoke_network_cidr is a valid CIDR" and the other two assertions
at the same block) to rely on the filter's truthiness instead of "is not none" —
e.g., replace "spoke_network_cidr | ansible.utils.ipaddr('network/prefix') is
not none" with simply "spoke_network_cidr |
ansible.utils.ipaddr('network/prefix')" (do the same for the other two
variables/filters referenced at lines 31 and 38).
---
Nitpick comments:
In `@deploy/openshift-clusters/roles/assisted/acm-install/tasks/storage.yml`:
- Around line 5-40: The play currently only marks changes when oc apply prints
"created" (registered to storage_result), so update the changed_when for
storage_result to also check for "configured" in storage_result.stdout (e.g.,
"'created' in storage_result.stdout or 'configured' in storage_result.stdout")
to accurately reflect updates; additionally, to avoid hostPath data loss when
pods move nodes, add nodeAffinity to the PersistentVolume specs
(assisted-pv-images and assisted-pv-db) to pin them to the host node where
assisted_images_path/assisted_db_path reside, or explicitly document the
limitation that these PVs require a single-node hub if you choose not to add
nodeAffinity.
In `@deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/validate.yml`:
- Around line 42-48: Precompute integer forms of the IPs and DHCP bounds to
simplify the assert: create task-level vars or a preceding set_fact that
converts spoke_api_vip, spoke_ingress_vip, spoke_dhcp_start and spoke_dhcp_end
to integers (e.g. api_vip_int, ingress_vip_int, dhcp_start_int, dhcp_end_int)
using the ansible.utils.ipaddr('int') filter, then replace the long repeated
expressions in the "Validate VIPs are not in DHCP range (.50-.150)" task's
assert/that with concise comparisons using those new names (e.g. api_vip_int <
dhcp_start_int or api_vip_int > dhcp_end_int and similarly for ingress_vip_int)
to improve readability and maintainability.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a50f92a1-91e7-458d-8888-5acf115e7240
📒 Files selected for processing (3)
deploy/openshift-clusters/roles/assisted/acm-install/tasks/storage.ymldeploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/main.ymldeploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/validate.yml
🚧 Files skipped from review as they are similar to previous changes (1)
- deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/main.yml
deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/validate.yml
Outdated
Show resolved
Hide resolved
|
Thanks @gamado — confirmed the fix commits are restored ( All 5 critical fixes present and correct:
Important fixes (7/9): I6 deploy script, I7 validate.yml, I8 cluster state, I9 One remaining item — I12: Otherwise LGTM — nice work getting these back in. |
The ipaddr() filter returns False (not None) for invalid inputs, so 'is not none' always passes. Use truthiness check instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/validate.yml (1)
45-48: Quote complexthatexpressions containing special characters.Please quote these boolean expressions to match repository YAML rules and avoid parser ambiguity.
Proposed fix
- - spoke_api_vip | ansible.utils.ipaddr('int') | int < spoke_dhcp_start | ansible.utils.ipaddr('int') | int or - spoke_api_vip | ansible.utils.ipaddr('int') | int > spoke_dhcp_end | ansible.utils.ipaddr('int') | int - - spoke_ingress_vip | ansible.utils.ipaddr('int') | int < spoke_dhcp_start | ansible.utils.ipaddr('int') | int or - spoke_ingress_vip | ansible.utils.ipaddr('int') | int > spoke_dhcp_end | ansible.utils.ipaddr('int') | int + - "spoke_api_vip | ansible.utils.ipaddr('int') | int < spoke_dhcp_start | ansible.utils.ipaddr('int') | int or spoke_api_vip | ansible.utils.ipaddr('int') | int > spoke_dhcp_end | ansible.utils.ipaddr('int') | int" + - "spoke_ingress_vip | ansible.utils.ipaddr('int') | int < spoke_dhcp_start | ansible.utils.ipaddr('int') | int or spoke_ingress_vip | ansible.utils.ipaddr('int') | int > spoke_dhcp_end | ansible.utils.ipaddr('int') | int"As per coding guidelines, "Quote strings in YAML files when they contain special characters".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/validate.yml` around lines 45 - 48, The boolean expressions comparing spoke_api_vip and spoke_ingress_vip against spoke_dhcp_start and spoke_dhcp_end contain special characters (pipes, comparison operators) and must be quoted to avoid YAML parsing issues; update the conditional lines that reference spoke_api_vip, spoke_ingress_vip, spoke_dhcp_start and spoke_dhcp_end in validate.yml so each complex that expression is wrapped in quotes (e.g. quote the entire expression that uses ansible.utils.ipaddr('int') | int < ... or ... > ...) preserving the exact logic and spacing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/validate.yml`:
- Around line 4-13: The assert task "Validate spoke_cluster_name is non-empty
and DNS-safe" currently checks definition, non-empty, and a regex but not DNS
label length; update the assert to add a condition that spoke_cluster_name |
length <= 63 and update the fail_msg to mention the 63-character maximum so
invalid names are rejected early. Ensure you reference the existing task name
and the variable spoke_cluster_name when making the change.
- Around line 42-52: Add a pre-assert before the VIP range checks that verifies
the DHCP bounds are derivable and valid: ensure spoke_dhcp_start and
spoke_dhcp_end are defined, are valid IP addresses (parsable with
ansible.utils.ipaddr) and that the integer value of spoke_dhcp_start is less
than spoke_dhcp_end; if not, fail with a clear message explaining the CIDR/DHCP
range could not be derived so subsequent VIP comparisons (spoke_api_vip,
spoke_ingress_vip) won't run against invalid/empty bounds.
---
Nitpick comments:
In `@deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/validate.yml`:
- Around line 45-48: The boolean expressions comparing spoke_api_vip and
spoke_ingress_vip against spoke_dhcp_start and spoke_dhcp_end contain special
characters (pipes, comparison operators) and must be quoted to avoid YAML
parsing issues; update the conditional lines that reference spoke_api_vip,
spoke_ingress_vip, spoke_dhcp_start and spoke_dhcp_end in validate.yml so each
complex that expression is wrapped in quotes (e.g. quote the entire expression
that uses ansible.utils.ipaddr('int') | int < ... or ... > ...) preserving the
exact logic and spacing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b8d37fbd-4b5b-4553-b128-7312a4ed199b
📒 Files selected for processing (1)
deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/validate.yml
| - name: Validate spoke_cluster_name is non-empty and DNS-safe | ||
| assert: | ||
| that: | ||
| - spoke_cluster_name is defined | ||
| - spoke_cluster_name | length > 0 | ||
| - spoke_cluster_name is match('^[a-z0-9]([a-z0-9-]*[a-z0-9])?$') | ||
| fail_msg: >- | ||
| spoke_cluster_name '{{ spoke_cluster_name | default("") }}' is invalid. | ||
| Must be non-empty, lowercase, start/end with alphanumeric, and contain only [a-z0-9-]. | ||
|
|
There was a problem hiding this comment.
Enforce DNS label max length for spoke_cluster_name.
Line 9 validates characters/format, but not DNS-label length. Add <= 63 to fail early before downstream Kubernetes/OpenShift resource creation fails.
Proposed fix
- name: Validate spoke_cluster_name is non-empty and DNS-safe
assert:
that:
- spoke_cluster_name is defined
- spoke_cluster_name | length > 0
+ - spoke_cluster_name | length <= 63
- spoke_cluster_name is match('^[a-z0-9]([a-z0-9-]*[a-z0-9])?$')
fail_msg: >-
spoke_cluster_name '{{ spoke_cluster_name | default("") }}' is invalid.
- Must be non-empty, lowercase, start/end with alphanumeric, and contain only [a-z0-9-].
+ Must be 1-63 chars, lowercase, start/end with alphanumeric, and contain only [a-z0-9-].📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Validate spoke_cluster_name is non-empty and DNS-safe | |
| assert: | |
| that: | |
| - spoke_cluster_name is defined | |
| - spoke_cluster_name | length > 0 | |
| - spoke_cluster_name is match('^[a-z0-9]([a-z0-9-]*[a-z0-9])?$') | |
| fail_msg: >- | |
| spoke_cluster_name '{{ spoke_cluster_name | default("") }}' is invalid. | |
| Must be non-empty, lowercase, start/end with alphanumeric, and contain only [a-z0-9-]. | |
| - name: Validate spoke_cluster_name is non-empty and DNS-safe | |
| assert: | |
| that: | |
| - spoke_cluster_name is defined | |
| - spoke_cluster_name | length > 0 | |
| - spoke_cluster_name | length <= 63 | |
| - spoke_cluster_name is match('^[a-z0-9]([a-z0-9-]*[a-z0-9])?$') | |
| fail_msg: >- | |
| spoke_cluster_name '{{ spoke_cluster_name | default("") }}' is invalid. | |
| Must be 1-63 chars, lowercase, start/end with alphanumeric, and contain only [a-z0-9-]. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/validate.yml`
around lines 4 - 13, The assert task "Validate spoke_cluster_name is non-empty
and DNS-safe" currently checks definition, non-empty, and a regex but not DNS
label length; update the assert to add a condition that spoke_cluster_name |
length <= 63 and update the fail_msg to mention the 63-character maximum so
invalid names are rejected early. Ensure you reference the existing task name
and the variable spoke_cluster_name when making the change.
| - name: Validate VIPs are not in DHCP range (.50-.150) | ||
| assert: | ||
| that: | ||
| - spoke_api_vip | ansible.utils.ipaddr('int') | int < spoke_dhcp_start | ansible.utils.ipaddr('int') | int or | ||
| spoke_api_vip | ansible.utils.ipaddr('int') | int > spoke_dhcp_end | ansible.utils.ipaddr('int') | int | ||
| - spoke_ingress_vip | ansible.utils.ipaddr('int') | int < spoke_dhcp_start | ansible.utils.ipaddr('int') | int or | ||
| spoke_ingress_vip | ansible.utils.ipaddr('int') | int > spoke_dhcp_end | ansible.utils.ipaddr('int') | int | ||
| fail_msg: >- | ||
| VIPs must be outside the DHCP range ({{ spoke_dhcp_start }} - {{ spoke_dhcp_end }}). | ||
| API VIP: {{ spoke_api_vip }}, Ingress VIP: {{ spoke_ingress_vip }}. | ||
|
|
There was a problem hiding this comment.
Validate DHCP bounds are derivable before VIP range checks.
This block assumes spoke_dhcp_start/spoke_dhcp_end are valid IPs. Add an explicit pre-assert so unsupported/small CIDRs fail with a clear message instead of reaching ambiguous integer comparisons.
Proposed fix
+- name: Validate DHCP bounds are derivable from spoke_network_cidr
+ assert:
+ that:
+ - spoke_dhcp_start | ansible.utils.ipaddr
+ - spoke_dhcp_end | ansible.utils.ipaddr
+ - (spoke_dhcp_start | ansible.utils.ipaddr('int') | int) < (spoke_dhcp_end | ansible.utils.ipaddr('int') | int)
+ fail_msg: >-
+ DHCP range could not be derived from spoke_network_cidr '{{ spoke_network_cidr }}'.
+ Ensure the CIDR is large enough for configured DHCP offsets.
+
- name: Validate VIPs are not in DHCP range (.50-.150)
assert:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deploy/openshift-clusters/roles/assisted/assisted-spoke/tasks/validate.yml`
around lines 42 - 52, Add a pre-assert before the VIP range checks that verifies
the DHCP bounds are derivable and valid: ensure spoke_dhcp_start and
spoke_dhcp_end are defined, are valid IP addresses (parsable with
ansible.utils.ipaddr) and that the integer value of spoke_dhcp_start is less
than spoke_dhcp_end; if not, fail with a clear message explaining the CIDR/DHCP
range could not be derived so subsequent VIP comparisons (spoke_api_vip,
spoke_ingress_vip) won't run against invalid/empty bounds.
Ansible assert requires boolean conditionals. The ipaddr() filter returns a string on success and False on failure. Adding '| bool' converts both cases to proper booleans. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The ipaddr() filter returns a string on valid input. '| bool' coerces strings like "192.168.125.0/24" to False. Use '!= false' to correctly distinguish valid results (strings) from invalid results (False). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Deploy a spoke TNF cluster via ACM/MCE assisted installer on an existing hub cluster. This adds a third deployment path alongside dev-scripts (IPI) and kcli methods.
New Ansible roles:
Usage: make deploy fencing-assisted
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
Verified by :
Spoke Cluster Verification
Nodes:
NAME STATUS ROLES AGE VERSION
spoke-tnf-master-0 Ready control-plane,master,worker 46m v1.34.2
spoke-tnf-master-1 Ready control-plane,master,worker 27m v1.34.2
Cluster Operators: 34/34 Available, 0 Progressing, 0 Degraded
Full List of Resources:
* Clone Set: kubelet-clone [kubelet]:
* Started: [ spoke-tnf-master-0 spoke-tnf-master-1 ]
* spoke-tnf-master-0_redfish (stonith:fence_redfish): Started
* spoke-tnf-master-1_redfish (stonith:fence_redfish): Started
* Clone Set: etcd-clone [etcd]:
* Started: [ spoke-tnf-master-0 spoke-tnf-master-1 ]
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Result: PASS