Skip to content

Node registration/connection issue on NUC: Apiserver uses wrong node IP from ETCD #391

@akshaylg0314

Description

@akshaylg0314

When setting up the system on a NUC, I encountered a node registration and connection issue between the NodeAgent and the Apiserver. The flow and problem are as follows:

Flow:

Register the node from the NUC by running NodeAgent.
On RHIVOS, the Apiserver receives the node info and saves it to ETCD.
Apply a YAML from the NUC by running ./demo.sh (with RHIVOS IP in the curl command).
The Apiserver reports a nodeagent connection error.
ETCD node info (example):

`cluster/nodes/HPC
{"node_id":"HPC-0.0.0.0","hostname":"HPC","ip_address":"0.0.0.0", ...}

cluster/nodes/localhost.localdomain
{"node_id":"localhost.localdomain","hostname":"localhost.localdomain","ip_address":"192.168.10.100", ...}

nodes/0.0.0.0
HPC
nodes/192.168.10.100
localhost.localdomain
nodes/HPC
0.0.0.0
nodes/localhost.localdomain
192.168.10.100`

Root Cause (code):
The Apiserver fetches the node IP by simply taking the first key with the prefix nodes/:

// Find a node by IP address from simplified node keys pub async fn find_node_by_simple_key() -> Option<String> { ... if let Some(kv) = kvs.first() { let ip_address = kv.key.trim_start_matches("nodes/"); return Some(ip_address.to_string()); } ... }
This means it may use the wrong node IP (e.g., 0.0.0.0 instead of the real node IP), causing connection errors.

Expected:

The Apiserver should select the correct node IP (matching the actual node or the one specified in the YAML/scenario), not just the first one in ETCD.
Actual:

The Apiserver may use 0.0.0.0 or another incorrect IP, leading to nodeagent connection errors.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions