Skip to content

NO-JIRA: feat: add ssh-node target for quick SSH access to cluster nodes#55

Open
clobrano wants to merge 1 commit intoopenshift-eng:mainfrom
clobrano:feature/simplify-ssh-into-node
Open

NO-JIRA: feat: add ssh-node target for quick SSH access to cluster nodes#55
clobrano wants to merge 1 commit intoopenshift-eng:mainfrom
clobrano:feature/simplify-ssh-into-node

Conversation

@clobrano
Copy link
Copy Markdown
Contributor

@clobrano clobrano commented Mar 12, 2026

Add 'make ssh-node' command to simplify SSH access to cluster nodes.

Usage:
make ssh-node node=master-0 # Interactive SSH session
make ssh-node node=1 # SSH to master-1
make ssh-node node=0 cmd='pcs status' # Run command and return

Features:

  • Supports multiple node name formats (master-0, master_0, node0, 0)
  • Handles known_hosts issues from cluster redeploys
  • Uses ProxyJump through the hypervisor automatically

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 12, 2026
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 12, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 12, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot
Copy link
Copy Markdown

@clobrano: This pull request explicitly references no jira issue.

Details

In response to this:

Add 'make ssh-node' command to simplify SSH access to cluster nodes.

Usage:
make ssh-node master-0 # Interactive SSH session
make ssh-node 1 # SSH to master-1
make ssh-node 0 'pcs status' # Run command and return

Features:

  • Supports multiple node name formats (master-0, master_0, node0, 0)
  • Handles known_hosts issues from cluster redeploys
  • Uses ProxyJump through the hypervisor automatically

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 12, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: clobrano

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 12, 2026
@clobrano clobrano force-pushed the feature/simplify-ssh-into-node branch 2 times, most recently from 9ac5588 to 3c99bea Compare March 12, 2026 08:58
@clobrano clobrano marked this pull request as ready for review March 12, 2026 08:59
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 12, 2026
@openshift-ci openshift-ci bot requested review from fonta-rh and qJkee March 12, 2026 08:59
Copy link
Copy Markdown
Contributor

@fonta-rh fonta-rh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! For attacking both nodes, ad hoc ansible like this works well:
ansible -a "sudo podman inspect sushy-tools" cluster_vms -i openshift-clusters/inventory.ini

But adapting it work constantly on a single node was a pain, so this is a great addition!

Out of the review comments, I only think the unreachable echos are necessary, the rest is just nitpicking

@@ -0,0 +1,92 @@
#!/bin/bash

set -euo pipefail
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to make a few lines of helpful errors unreachable, I'll point them out below

# Extract node IP from inventory
NODE_IP=$(grep -E "master_0|master_1" "${INVENTORY_FILE}" | grep "${NODE_PATTERN}" | grep -oP "ansible_host='\\K[^']+")

if [[ -z "${NODE_IP}" ]]; then
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be unreachable: if line 55 grep doesn't find anything, it will have exit code 1 and script should exit.

Adding "|| true" to the command on 55 should be enough, we're checking for content of grep output anyway

# Extract hypervisor info from inventory
HYPERVISOR=$(grep -E "^[^#]*@" "${INVENTORY_FILE}" | head -1 | awk '{print $1}')

if [[ -z "${HYPERVISOR}" ]]; then
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be unreachable: if line 64 grep doesn't find anything, it will have exit code 1 and script should exit

fi

# Extract hypervisor info from inventory
HYPERVISOR=$(grep -E "^[^#]*@" "${INVENTORY_FILE}" | head -1 | awk '{print $1}')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great to parse the ProxyJump, but it might also hit the @ in "ansible_ssh_common_args" if the inventory were ordered differently, so although it works with the current inventory structure, it's a bit fragile.
Claude suggests parsing just in the metal_machine subsection
HYPERVISOR=$(awk '/^\[metal_machine\]/{found=1;next} /^\[/{found=0} found && /@/{print $1; exit}' "${INVENTORY_FILE}")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I trust you on this one :D

deploy/Makefile Outdated
@echo " make ssh-node node=1 cmd=\"sudo pcs status\""
@exit 1
endif
@./openshift-clusters/scripts/ssh-node.sh $(node) $(cmd)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the $(cmd) might be expanded as an "empty" argument to make if it's not provided. To allow for no command safely, might need something like this:

@./openshift-clusters/scripts/ssh-node.sh $(node) $(if $(cmd),$(cmd))

esac

# Extract node IP from inventory
NODE_IP=$(grep -E "master_0|master_1" "${INVENTORY_FILE}" | grep "${NODE_PATTERN}" | grep -oP "ansible_host='\\K[^']+")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this'll happen, but maybe we need to parse for the full word (master_0 or master_1, as this will match master10, master100, and also things like ostest_master_10)

Suggestion might be changing grep -E to grep -W to parse only full words

SSH_OPTS=(
-o StrictHostKeyChecking=no
-o UserKnownHostsFile=/dev/null
-o LogLevel=ERROR
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this to WARN to make debugging easier

@clobrano clobrano force-pushed the feature/simplify-ssh-into-node branch from 3c99bea to 174fa33 Compare April 8, 2026 09:52
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 8, 2026

Walkthrough

Adds a Makefile ssh-node target and a new Bash script deploy/openshift-clusters/scripts/ssh-node.sh to SSH into OpenShift cluster nodes via a hypervisor jump host by resolving node IP and hypervisor from inventory.ini and proxying the connection.

Changes

Cohort / File(s) Summary
Makefile SSH Target
deploy/Makefile
Added ssh-node target requiring node=<...> (optional cmd="...") that forwards args to ./openshift-clusters/scripts/ssh-node.sh. Adjusted get-tnf-logs recipe spacing and updated help output to document ssh-node.
SSH Node Script
deploy/openshift-clusters/scripts/ssh-node.sh
New script implementing ssh-node.sh <node> [command]. Validates/normalizes node aliases, reads ansible_host and hypervisor from inventory.ini, constructs SSH with ProxyCommand via hypervisor, and runs interactive or non-interactive SSH sessions.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Makefile
    participant Script
    participant Inventory
    participant Hypervisor
    participant Node

    User->>Makefile: make ssh-node node=0 [cmd="..."]
    Makefile->>Script: ./openshift-clusters/scripts/ssh-node.sh 0 "..."
    Script->>Script: validate args & normalize node id
    Script->>Inventory: read inventory.ini (ansible_host, metal_machine)
    Script->>Script: select node IP and hypervisor entry
    Script->>Hypervisor: ssh -o ProxyCommand="ssh -W %h:%p core@hypervisor"
    Hypervisor->>Node: forward connection to node IP
    Node->>User: command output / interactive session
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
deploy/openshift-clusters/scripts/ssh-node.sh (1)

55-56: ⚠️ Potential issue | 🔴 Critical

The NODE_IP extraction will always fail due to invalid grep option syntax and inventory format mismatch.

Line 56 uses grep -W, which is not a valid grep option (only -w for word-regexp exists). This causes the grep command to fail silently due to || true, resulting in an empty NODE_IP and a "Could not find" error on line 58.

Additionally, the extraction assumes ansible_host='...' format, but the sample inventory file (helpers/inventory_ocp_hosts.sample:2-4) uses core@IP format without this field. These must be fixed for the script to work correctly.

Proposed fix
-# NOTE: Preventing this command to fail with `|| true`. The script setting at the top will make it exit immediately in case of failure.
-NODE_IP=$(grep -W "master_0|master_1" "${INVENTORY_FILE}" | grep "${NODE_PATTERN}" | grep -oP "ansible_host='\\K[^']+" || true)
+# Extract node IP from inventory
+# Supports both: ansible_host='IP' and core@IP formats
+NODE_LINE=$(grep -Ew "${NODE_PATTERN}" "${INVENTORY_FILE}" || true)
+NODE_IP=$(printf '%s\n' "${NODE_LINE}" | grep -oP "ansible_host='\\K[^']+" || true)
+if [[ -z "${NODE_IP}" ]]; then
+    NODE_IP=$(printf '%s\n' "${NODE_LINE}" | awk '{print $1}' | cut -d'@' -f2 || true)
+fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/openshift-clusters/scripts/ssh-node.sh` around lines 55 - 56, The
NODE_IP extraction fails because grep uses an invalid -W option and the
inventory format may be "core@IP" instead of "ansible_host='...'", so update the
pipeline that sets NODE_IP (the line assigning NODE_IP using INVENTORY_FILE and
NODE_PATTERN) to use the correct grep option (-w) and an alternation-aware
extraction that handles both formats (e.g., extract either ansible_host='IP' or
core@IP with a single regex), and keep a proper failure check (don’t silently
swallow errors with || true) so that if NODE_IP is empty you log the "Could not
find" error immediately. Ensure you reference and update the exact variable
assignment for NODE_IP and maintain use of INVENTORY_FILE and NODE_PATTERN.
🧹 Nitpick comments (1)
deploy/Makefile (1)

84-87: Help text for ssh-node is narrower than actual accepted node formats.

ssh-node.sh accepts master-0, master_0, node0, 0 (and same for node 1), but Lines 84-87 and 131-132 only advertise 0|1. Consider aligning docs to reduce user confusion.

Also applies to: 131-132

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/Makefile` around lines 84 - 87, The help text for the Makefile target
ssh-node currently advertises only "0|1" but ssh-node.sh accepts multiple node
formats (e.g., master-0, master_0, node0, 0 and similarly for node1); update the
echo lines in the ssh-node help and the other help block referenced by the
second occurrence so they list accepted formats (master-0, master_0, node0, 0
and master-1, master_1, node1, 1) and adjust the usage/examples to show at least
one non-numeric form (e.g., make ssh-node node=master-0 cmd="...") so
documentation matches ssh-node.sh’s accepted inputs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deploy/Makefile`:
- Line 93: The get-tnf-logs Make target currently invokes the wrong script and
has a malformed recipe prefix: change the command invoked in the get-tnf-logs
target from openshift-clusters/scripts/ssh-node.sh to
openshift-clusters/scripts/get-tnf-logs.sh (so it runs the TNF log collection
script), and remove the stray leading space before the @ in the recipe line so
the @ is recognized as Make’s silent-command modifier (i.e., ensure the recipe
line begins with a tab immediately followed by
@./openshift-clusters/scripts/get-tnf-logs.sh $(node) $(if $(cmd),$(cmd))).

---

Duplicate comments:
In `@deploy/openshift-clusters/scripts/ssh-node.sh`:
- Around line 55-56: The NODE_IP extraction fails because grep uses an invalid
-W option and the inventory format may be "core@IP" instead of
"ansible_host='...'", so update the pipeline that sets NODE_IP (the line
assigning NODE_IP using INVENTORY_FILE and NODE_PATTERN) to use the correct grep
option (-w) and an alternation-aware extraction that handles both formats (e.g.,
extract either ansible_host='IP' or core@IP with a single regex), and keep a
proper failure check (don’t silently swallow errors with || true) so that if
NODE_IP is empty you log the "Could not find" error immediately. Ensure you
reference and update the exact variable assignment for NODE_IP and maintain use
of INVENTORY_FILE and NODE_PATTERN.

---

Nitpick comments:
In `@deploy/Makefile`:
- Around line 84-87: The help text for the Makefile target ssh-node currently
advertises only "0|1" but ssh-node.sh accepts multiple node formats (e.g.,
master-0, master_0, node0, 0 and similarly for node1); update the echo lines in
the ssh-node help and the other help block referenced by the second occurrence
so they list accepted formats (master-0, master_0, node0, 0 and master-1,
master_1, node1, 1) and adjust the usage/examples to show at least one
non-numeric form (e.g., make ssh-node node=master-0 cmd="...") so documentation
matches ssh-node.sh’s accepted inputs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 06cc907b-101b-449f-950b-b5f8e94a47bf

📥 Commits

Reviewing files that changed from the base of the PR and between c2c0209 and 174fa33.

📒 Files selected for processing (2)
  • deploy/Makefile
  • deploy/openshift-clusters/scripts/ssh-node.sh

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 8, 2026
@fonta-rh
Copy link
Copy Markdown
Contributor

fonta-rh commented Apr 8, 2026

/hold testing

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 8, 2026
@fonta-rh
Copy link
Copy Markdown
Contributor

fonta-rh commented Apr 8, 2026

/lgtm

Add 'make ssh-node' command to simplify SSH access to cluster nodes.

Usage:
  make ssh-node node=master-0               # Interactive SSH session
  make ssh-node node=1                      # SSH to master-1
  make ssh-node node=0 cmd='pcs status'     # Run command and return

Features:
- Supports multiple node name formats (master-0, master_0, node0, 0)
- Handles known_hosts issues from cluster redeploys
- Uses ProxyJump through the hypervisor automatically
@clobrano clobrano force-pushed the feature/simplify-ssh-into-node branch from 174fa33 to bb23774 Compare April 8, 2026 11:49
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 8, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 8, 2026

New changes are detected. LGTM label has been removed.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
deploy/Makefile (1)

93-93: ⚠️ Potential issue | 🔴 Critical

Fix malformed recipe prefix in get-tnf-logs.

Line 93 still has a space between the tab and @, so @ is treated as part of the command text instead of Make’s silent prefix. This can break target execution.

Proposed fix
 get-tnf-logs:
-	 @./openshift-clusters/scripts/get-tnf-logs.sh
+	@./openshift-clusters/scripts/get-tnf-logs.sh
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/Makefile` at line 93, The recipe line contains a leading space before
the silent prefix so the '@' in "@./openshift-clusters/scripts/get-tnf-logs.sh"
is treated as part of the command; remove the space so the line is a tab
followed immediately by '@./openshift-clusters/scripts/get-tnf-logs.sh' (i.e.,
ensure the line starts with a real TAB character then
@./openshift-clusters/scripts/get-tnf-logs.sh) to restore Make’s silent prefix
behavior.
🧹 Nitpick comments (1)
deploy/Makefile (1)

84-87: Broaden ssh-node usage/help text to match supported formats.

The docs currently suggest only node=<0|1>, but this feature supports multiple aliases (e.g., master-0, master_0, node0). Updating help text will prevent user confusion.

Proposed wording update
-	`@echo` "Usage: make ssh-node node=<0|1> [cmd=\"command\"]"
+	`@echo` "Usage: make ssh-node node=<master-0|master_0|node0|0|1> [cmd=\"command\"]"
...
-	`@echo` "  ssh-node node=<0|1> [cmd=\"..\"] - SSH into cluster node"
+	`@echo` "  ssh-node node=<node-id> [cmd=\"..\"] - SSH into cluster node (e.g. master-0, node0, 0)"

Also applies to: 131-133

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deploy/Makefile` around lines 84 - 87, The ssh-node target's help text in the
Makefile is too narrow (shows only node=<0|1>); update the echo lines for the
ssh-node target (and the similar block around the second occurrence at lines
131-133) to show supported aliases such as node=<0|1|node0>,
node=<master-0|master_0|node0> or note that node accepts numeric or
master_*/node* forms (e.g., node=0, node=node0, node=master-0); keep the
examples consistent (e.g., make ssh-node node=0 and make ssh-node node=master-0
cmd="...") so users understand alternate accepted formats.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@deploy/Makefile`:
- Line 93: The recipe line contains a leading space before the silent prefix so
the '@' in "@./openshift-clusters/scripts/get-tnf-logs.sh" is treated as part of
the command; remove the space so the line is a tab followed immediately by
'@./openshift-clusters/scripts/get-tnf-logs.sh' (i.e., ensure the line starts
with a real TAB character then @./openshift-clusters/scripts/get-tnf-logs.sh) to
restore Make’s silent prefix behavior.

---

Nitpick comments:
In `@deploy/Makefile`:
- Around line 84-87: The ssh-node target's help text in the Makefile is too
narrow (shows only node=<0|1>); update the echo lines for the ssh-node target
(and the similar block around the second occurrence at lines 131-133) to show
supported aliases such as node=<0|1|node0>, node=<master-0|master_0|node0> or
note that node accepts numeric or master_*/node* forms (e.g., node=0,
node=node0, node=master-0); keep the examples consistent (e.g., make ssh-node
node=0 and make ssh-node node=master-0 cmd="...") so users understand alternate
accepted formats.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e5607f60-af0d-4a0e-98eb-57fbb601f81f

📥 Commits

Reviewing files that changed from the base of the PR and between 174fa33 and bb23774.

📒 Files selected for processing (2)
  • deploy/Makefile
  • deploy/openshift-clusters/scripts/ssh-node.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • deploy/openshift-clusters/scripts/ssh-node.sh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants