Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
bc94810
modify templates for doca ofed
VrindaMarwah Dec 31, 2025
2766c97
doca ofed installation changes for k8s
VrindaMarwah Dec 31, 2025
a71819d
add ansible builtin
VrindaMarwah Jan 2, 2026
07dbf6b
Merge pull request #3826 from VrindaMarwah/pub/ib_support
jagadeeshnv Jan 5, 2026
30d5135
Update ansible-lint.yml
VrindaMarwah Jan 5, 2026
b20eb56
Update pylint.yml
VrindaMarwah Jan 5, 2026
46a8a92
Merge pull request #3827 from VrindaMarwah/pub/ib_support
jagadeeshnv Jan 5, 2026
8f7dec7
Update image-build to use docker.io/dellhpcomniaaisolution/image-buil…
balajikumaran-c-s Jan 6, 2026
94e7e5e
Remove rpmdb rebuild commands from base_image_commands
balajikumaran-c-s Jan 6, 2026
f120e16
Add retry logic for image pull with pull_image_retries and pull_image…
balajikumaran-c-s Jan 6, 2026
942f0ce
Merge pull request #3829 from balajikumaran-c-s/pub/ib_support
abhishek-sa1 Jan 6, 2026
50c87bd
doca changes to build image
VrindaMarwah Jan 7, 2026
4261525
slurm user uid set to 6001
jagadeeshnv Jan 7, 2026
9d97814
Merge pull request #3834 from jagadeeshnv/pub/ib_support
snarthan Jan 8, 2026
27974c5
add static ip for ib interface
VrindaMarwah Jan 8, 2026
c07acc6
Merge branch 'dell:pub/ib_support' into pub/ib_support
VrindaMarwah Jan 8, 2026
40f36b9
Update openchami_image_cmd.yml
VrindaMarwah Jan 8, 2026
5096861
Update slurm_custom.json
VrindaMarwah Jan 8, 2026
43827ef
Update slurm_custom.json
VrindaMarwah Jan 8, 2026
6df6515
Update service_k8s.json
VrindaMarwah Jan 8, 2026
6630ec7
Update local_repo_config.yml
VrindaMarwah Jan 8, 2026
f03b32c
remove unused vars main.yml
balajikumaran-c-s Jan 10, 2026
322ccd0
Updated image tag in main.yml
balajikumaran-c-s Jan 10, 2026
0407906
Update image tag in default_packages.json
balajikumaran-c-s Jan 10, 2026
6f17b12
Merge pull request #3838 from balajikumaran-c-s/pub/ib_support
abhishek-sa1 Jan 10, 2026
951a5e2
Merge branch 'dell:pub/ib_support' into pub/ib_support
VrindaMarwah Jan 10, 2026
9974216
add package mounts for doca installation
VrindaMarwah Jan 11, 2026
2a86f1c
updating comments in network_spec
VrindaMarwah Jan 11, 2026
3abb36c
passwordless_ssh changes
sakshi-singla-1735 Jan 12, 2026
216a06c
ansible lint fixes
sakshi-singla-1735 Jan 12, 2026
cd729f5
input validation for ib network
sakshi-singla-1735 Jan 12, 2026
d38cf10
Merge pull request #3841 from VrindaMarwah/pub/ib_support
snarthan Jan 12, 2026
2cca244
Merge branch 'pub/ib_support' into pub/input_validation_ib
sakshi-singla-1735 Jan 12, 2026
a12179e
Merge pull request #3844 from sakshi-singla-1735/pub/input_validation_ib
snarthan Jan 12, 2026
f015e98
removing duplicate code
sakshi-singla-1735 Jan 13, 2026
e0b1fe5
Merge branch 'pub/v2.1_rc1' into pub/passwordlessssh
sakshi-singla-1735 Jan 13, 2026
684cfc6
Update README.md
vasanthsathya Jan 13, 2026
fd2f0fe
Merge pull request #3854 from vasanthsathya/main
abhishek-sa1 Jan 13, 2026
e770d86
variablize filenames
sakshi-singla-1735 Jan 13, 2026
d3ac541
Merge branch 'pub/passwordlessssh' of github.com:sakshi-singla-1735/o…
sakshi-singla-1735 Jan 13, 2026
a700dd3
Merge pull request #3843 from sakshi-singla-1735/pub/passwordlessssh
snarthan Jan 13, 2026
078997e
extract cuda in nfs
Nagachandan-P Jan 14, 2026
ddc00f8
making path changes
sakshi-singla-1735 Jan 14, 2026
64d4b28
Update ci-group-login_compiler_node_aarch64.yaml.j2
Nagachandan-P Jan 14, 2026
34aea37
Update ci-group-login_compiler_node_x86_64.yaml.j2
Nagachandan-P Jan 14, 2026
53290e6
Merge pull request #3857 from Nagachandan-P/pub/v2.1_rc1
jagadeeshnv Jan 14, 2026
e3dc75a
adding the repo for apptainer
sakshi-singla-1735 Jan 14, 2026
66661de
add set pipefail to doca-ofed script
VrindaMarwah Jan 14, 2026
6670061
Update ansible-lint.yml
VrindaMarwah Jan 14, 2026
05c1146
Update pylint.yml
VrindaMarwah Jan 14, 2026
72e5971
Merge pull request #3858 from VrindaMarwah/pub/v2.1_rc1
snarthan Jan 14, 2026
a7c3a62
Merge pull request #3856 from sakshi-singla-1735/pub/passwordlessssh
jagadeeshnv Jan 14, 2026
be91349
variablize the cuda version
Nagachandan-P Jan 16, 2026
63106ba
Merge branch 'pub/v2.1_rc1' of https://github.com/Nagachandan-P/omnia…
Nagachandan-P Jan 16, 2026
eeda08f
dynamic extraction of cuda version
Nagachandan-P Jan 19, 2026
e392595
lint issue fixed
Nagachandan-P Jan 19, 2026
7851138
Merge pull request #3862 from Nagachandan-P/pub/v2.1_rc1
snarthan Jan 19, 2026
83a5625
file path change
sakshi-singla-1735 Jan 20, 2026
503a295
Update image-builder version to 1.1
balajikumaran-c-s Jan 20, 2026
2d74de0
Update image-builder version to 1.1 in default_packages.json
balajikumaran-c-s Jan 20, 2026
fad0025
Merge pull request #3875 from balajikumaran-c-s/pub/v2.1_rc1
abhishek-sa1 Jan 20, 2026
3b770c0
Merge branch 'pub/v2.1_rc1' into main
balajikumaran-c-s Jan 21, 2026
072d557
Merge pull request #3878 from balajikumaran-c-s/main
abhishek-sa1 Jan 21, 2026
5dd6678
Merge pull request #3873 from sakshi-singla-1735/origin/pub/ssh
snarthan Jan 21, 2026
f5f4f57
Update configure-ib-network for fixing race condition
Katakam-Rakesh Jan 21, 2026
719da55
Merge pull request #3879 from Katakam-Rakesh/pub/v2.1_rc1
snarthan Jan 21, 2026
7640fa7
Added powervault input
jagadeeshnv Jan 22, 2026
112681f
added powervault packages
balajikumaran-c-s Jan 22, 2026
903157f
Merge branch 'dell:pub/v2.1_rc1' into pub/v2.1_rc1
balajikumaran-c-s Jan 22, 2026
ed551a7
Update storage_config.yml
jagadeeshnv Jan 22, 2026
0c28ab6
Commented powervault details
balajikumaran-c-s Jan 22, 2026
38a7bc2
powervault cloud-init changes
balajikumaran-c-s Jan 22, 2026
e892dc8
Merge pull request #3882 from balajikumaran-c-s/pub/v2.1_rc1
jagadeeshnv Jan 22, 2026
5462626
Merge branch 'pub/q1_dev' into pub/v2.1_rc1
nethramg Jan 23, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/ansible-lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ on:
- main
- staging
- release_1.7.1
- pub/v2.1_rc1
- pub/q1_dev

jobs:
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/pylint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ on:
- main
- staging
- release_1.7.1
- pub/v2.1_rc1
- pub/q1_dev

jobs:
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Omnia 1.x Documentation is hosted on [Read The Docs 1.x](https://omnia-doc.readt

Omnia 2.x Documentation is hosted on [Read The Docs 2.x](https://omnia.readthedocs.io/en/latest/index.html).

Current Status: ![GitHub](https://readthedocs.org/projects/omnia-doc/badge/?version=latest)
Current Status: ![GitHub](https://readthedocs.org/projects/omnia/badge/?version=latest)

## Licensing

Expand Down
2 changes: 1 addition & 1 deletion build_image_aarch64/roles/prepare_arm_node/tasks/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@

- name: Build full Podman image path
ansible.builtin.set_fact:
pulp_aarch_image: "{{ hostvars['localhost']['oim_pxe_ip'] }}:2225/dellhpcomniaaisolution/image-build-aarch64:1.0"
pulp_aarch_image: "{{ hostvars['localhost']['oim_pxe_ip'] }}:2225/dellhpcomniaaisolution/image-build-aarch64:1.1"

- name: Pull aarch64 image using Podman
ansible.builtin.command:
Expand Down
15 changes: 5 additions & 10 deletions build_image_x86_64/roles/image_creation/tasks/build_image_tag.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,21 +13,16 @@
# limitations under the License.
---

- name: Pull specific OpenCHAMI image by version tag
- name: Pull image-build image
ansible.builtin.command:
cmd: "podman pull {{ openchami_image_sha }}"
cmd: "podman pull {{ image_build_el10 }}"
register: pull_result
retries: "{{ pull_image_retries }}"
delay: "{{ pull_image_delay }}"
until: pull_result.rc == 0
changed_when: "'Image is up to date' not in pull_result.stdout"

- name: Fail if image not pulled successfully
ansible.builtin.fail:
msg: "{{ pull_result.stdout }}"
when: pull_result.rc != 0

- name: Tagging OpenCHAMI image with stable name
ansible.builtin.command:
cmd: "{{ ochami_stable_image_tag }}"
args:
creates: "{{ ochami_stable_image_path }}"
register: tag_result
changed_when: "'Tagged' in tag_result.stdout"
10 changes: 4 additions & 6 deletions build_image_x86_64/roles/image_creation/vars/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,9 @@
# See the License for the specific language governing permissions and
# limitations under the License.
---
openchami_image_sha: "ghcr.io/openchami/image-build@sha256:52dd9d546951ce4f2f6f9febd08a228cfcb5b9e8e204ca4f5ee232f6be65d3a4"
image_build_el10: "docker.io/dellhpcomniaaisolution/image-build-el10:1.0"
pull_image_retries: "3"
pull_image_delay: "10"
input_project_dir: "{{ hostvars['localhost']['input_project_dir'] }}"
omnia_metadata_file: "/opt/omnia/.data/oim_metadata.yml"
dir_permissions_644: "0644"
Expand All @@ -33,7 +35,7 @@ ochami_compute_mounts:

ochami_x86_64_image:
- --entrypoint /bin/bash
- ghcr.io/openchami/image-build:stable
- docker.io/dellhpcomniaaisolution/image-build-el10:1.0
ochami_base_command:
- -c 'update-ca-trust extract && image-build --config /home/builder/config.yaml --log-level DEBUG'

Expand All @@ -52,7 +54,3 @@ compute_image_failure_msg: |
# build_compute_image.yml
openchami_compute_image_vars_template: "{{ role_path }}/templates/compute_images_templates.j2"
openchami_compute_image_vars_path: "/opt/omnia/openchami/compute_images_template.yaml"

# build_image_tag.yml
ochami_stable_image_tag: "podman tag {{ openchami_image_sha }} ghcr.io/openchami/image-build:stable"
ochami_stable_image_path: "/var/lib/containers/storage/overlay-images/{{ openchami_image_sha }}"
Original file line number Diff line number Diff line change
Expand Up @@ -326,6 +326,12 @@ def json_file_mandatory(file_path):
"Please ensure the CSV file has the required headers."
)
NETWORK_SPEC_FILE_NOT_FOUND_MSG = "network_spec.yml file not found in input folder."
IB_NETMASK_BITS_MISMATCH_MSG = (
"netmask_bits configured for ib_network must match admin_network netmask_bits in network_spec.yml."
)
IB_SUBNET_IN_ADMIN_RANGE_MSG = (
"ib_network subnet must be outside the admin network range derived from primary_oim_admin_ip/netmask_bits in network_spec.yml."
)

# telemetry
MANDATORY_FIELD_FAIL_MSG = "must not be empty"
Expand Down Expand Up @@ -427,3 +433,4 @@ def get_logic_failed(input_file_path):
def get_logic_success(input_file_path):
"""Returns a formatted message indicating logic validation success for a file."""
return f"{'#' * 10} Logic validation successful for {input_file_path} {'#' * 10}"

Original file line number Diff line number Diff line change
Expand Up @@ -100,9 +100,35 @@
}
},
"additionalProperties": false
},
{
"type": "object",
"required": ["ib_network"],
"properties": {
"ib_network": {
"type": "object",
"required": [
"subnet",
"netmask_bits"
],
"properties": {
"subnet": {
"type": "string",
"pattern": "^(?:(?:25[0-5]|2[0-4][0-9]|1?[0-9]{1,2})\\.){3}(?:25[0-5]|2[0-4][0-9]|1?[0-9]{1,2})$"
},
"netmask_bits": {
"type": "string",
"pattern": "^(1[0-9]|2[0-9]|[1-9])$|^3[0-2]$"
}
},
"additionalProperties": false
}
},
"additionalProperties": false
}
]
}
}
}
}

Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,36 @@
]
},
"minItems": 1
},
"powervault_config": {
"required": ["ip", "isci_initiators", "volume_id"],
"properties": {
"ip": {
"description": "List of target controller IP addresses",
"type": "array",
"minItems": 1,
"items": {
"type": "string",
"format": "ipv4"
},
"uniqueItems": true
},

"port": {
"description": "TCP port for iSCSI (default 3260)",
"type": "integer"
},

"isci_initiators": {
"description": "iSCSI initiator IQN",
"type": "string"
},

"volume_id": {
"description": "Volume identifier (hex string)",
"type": "string"
}
}
}
},
"required": [
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
import itertools
import csv
import yaml
import ipaddress
from ansible.module_utils.input_validation.common_utils import validation_utils
from ansible.module_utils.input_validation.common_utils import config
from ansible.module_utils.input_validation.common_utils import en_us_validation_msg
Expand Down Expand Up @@ -744,6 +745,54 @@ def validate_network_spec(
)
return errors

# Extract admin and IB parameters for cross-validation
admin_netmask_bits = None
admin_primary_ip = None
ib_netmask_bits = None
ib_subnet = None
ib_present = False

for network in data["Networks"]:
if "admin_network" in network and isinstance(network["admin_network"], dict):
admin_net = network["admin_network"]
admin_netmask_bits = admin_net.get("netmask_bits", admin_netmask_bits)
admin_primary_ip = admin_net.get("primary_oim_admin_ip", admin_primary_ip)

if "ib_network" in network and isinstance(network["ib_network"], dict):
ib_net = network["ib_network"]
# Consider IB network present only when config is non-empty
if ib_net:
ib_present = True
ib_netmask_bits = ib_net.get("netmask_bits", ib_netmask_bits)
ib_subnet = ib_net.get("subnet", ib_subnet)

# If IB network is configured and both netmask bits are available, they must match
if ib_present and ib_netmask_bits and admin_netmask_bits and ib_netmask_bits != admin_netmask_bits:
errors.append(
create_error_msg(
"ib_network.netmask_bits",
ib_netmask_bits,
en_us_validation_msg.IB_NETMASK_BITS_MISMATCH_MSG,
)
)

# If IB subnet and admin primary IP are available, ensure IB subnet is not in admin range
if ib_present and ib_subnet and admin_primary_ip and admin_netmask_bits:
try:
admin_network = ipaddress.IPv4Network(f"{admin_primary_ip}/{admin_netmask_bits}", strict=False)
ib_ip = ipaddress.IPv4Address(ib_subnet)
if ib_ip in admin_network:
errors.append(
create_error_msg(
"ib_network.subnet",
ib_subnet,
en_us_validation_msg.IB_SUBNET_IN_ADMIN_RANGE_MSG,
)
)
except ValueError:
# If IPs/netmask are invalid, rely on existing validations to report issues
pass

for network in data["Networks"]:
errors.extend(_validate_admin_network(network))

Expand Down Expand Up @@ -941,3 +990,4 @@ def _validate_ip_ranges(dynamic_range, network_type, netmask_bits):
)

return errors

2 changes: 0 additions & 2 deletions common/vars/openchami_image_cmd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,6 @@ rhel_aarch64_base_image_name: "rhel-aarch64_base"
base_image_commands:
- "dracut --add 'dmsquash-live livenet network-manager' --install '/usr/lib/systemd/systemd-sysroot-fstab-check' --kver $(basename /lib/modules/*) -N -f --logfile /tmp/dracut.log 2>/dev/null" # noqa: yaml[line-length]
- "echo DRACUT LOG:; cat /tmp/dracut.log"
- "rm -f /var/lib/rpm/__db*"
- "rpmdb --rebuilddb"

# x86_64 compute commands
default_x86_64_compute_commands:
Expand Down
17 changes: 17 additions & 0 deletions discovery/discovery.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,18 @@
name: discovery_validations
tasks_from: validate_oim_timezone.yml

- name: Build cluster host lists from PXE mapping
hosts: localhost
connection: local
roles:
- passwordless_ssh

- name: Configure OIM SSH from cluster host lists
hosts: oim
connection: ssh
roles:
- passwordless_ssh

- name: Validate discovery parameters
hosts: oim
connection: ssh
Expand Down Expand Up @@ -102,6 +114,11 @@
ansible.builtin.include_role:
name: configure_ochami
tasks_from: discover_mapping_nodes.yml

- name: Read nodes.yaml and derive Omnia node facts
ansible.builtin.include_role:
name: passwordless_ssh
tasks_from: read_nodes_yaml.yml
roles:
- nfs_client
- k8s_config
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,12 @@
register: read_ssh_key
no_log: true

- name: Read the ssh private key
ansible.builtin.command: cat {{ ssh_private_key_path }}
changed_when: false
register: read_ssh_private_key
no_log: true

- name: Hash the password
ansible.builtin.command: openssl passwd -6 "{{ hostvars['localhost']['provision_password'] }}"
changed_when: false
Expand Down
Loading