Skip to content

Conversation

@huangmingxia
Copy link
Contributor

@huangmingxia huangmingxia commented Jan 13, 2026

What does this PR do

  • Simplifies ownership detection in the MachinePool controller: an object (e.g. MachineSet / MachineAutoscaler) is treated as controlled by a MachinePool only if its hive.openshift.io/machine-pool label value equals MachinePool.spec.name. The previous name-prefix / hive.openshift.io/managed fallback is removed.
  • Removes unused parameters introduced by the simplification (e.g. dropping cd from syncMachineAutoscalers and updating call sites).

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 13, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 13, 2026

@huangmingxia: This pull request references HIVE-2199 which is a valid jira issue.

Details

In response to this:

Fix isControlledByMachinePool to use infraID when building the prefix for identifying MachineSets. MachineSet names follow the format {infraID}-{poolName}-{zone}, not {clusterName}-{poolName}-{zone}.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@huangmingxia
Copy link
Contributor Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 13, 2026
@openshift-ci openshift-ci bot requested review from 2uasimojo and suhanime January 13, 2026 11:13
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 13, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: huangmingxia
Once this PR has been reviewed and has the lgtm label, please assign 2uasimojo for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@codecov
Copy link

codecov bot commented Jan 13, 2026

Codecov Report

❌ Patch coverage is 80.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 50.39%. Comparing base (b574bb2) to head (bc52047).
⚠️ Report is 14 commits behind head on master.

Files with missing lines Patch % Lines
...g/controller/machinepool/machinepool_controller.go 80.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2829      +/-   ##
==========================================
- Coverage   50.40%   50.39%   -0.01%     
==========================================
  Files         279      279              
  Lines       34194    34189       -5     
==========================================
- Hits        17235    17231       -4     
- Misses      15595    15597       +2     
+ Partials     1364     1361       -3     
Files with missing lines Coverage Δ
...g/controller/machinepool/machinepool_controller.go 63.02% <80.00%> (-0.55%) ⬇️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

// actually managed by Hive. MachineSets managed by Hive (either created by Hive or adopted by Hive)
// will have the HiveManagedLabel.
prefix := strings.Join([]string{cd.Spec.ClusterName, pool.Spec.Name, ""}, "-")
// MachineSet names use infraID, not cluster name (format: {infraID}-{poolName}-{zone})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did you get this from? Assume by looking at the installer code. My concerns:

  • Has it always been this way?
  • Does it apply to all cloud platforms?

I'm also concerned that this can still result in false positives, e.g. if we have pools named worker and worker-2. (Guess I could have caught that on the last PR...)

I wonder if we're trying too hard with the name matching. If we have a "managed" label but no "name" label, that seems like it should be an error. It should be impossible to get into that situation unless the user munged something up.

So here's what I think we should do:

  • Change the spec of this func to include an error return.
  • Error if "managed" is set but "name" is not.
  • I think probably error the other way around as well (if "name" is set but "managed" is not). Can you think of a use case where this would be valid?*
  • Don't attempt to match against the mset name at all, ever.

Thoughts?

*Just to get this on paper: if both labels must either be set or unset, we really don't need the "managed" label at all -- the existence of the "name" label is sufficient to indicate that it's hive-managed. However, for backward-compatibility reasons, speculative/paranoid though they may be, I reckon we should just keep them both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed review! I noticed this when testing the hiveutil machinepool CLI:) I have confirmed from the installer code that all platforms use InfraID instead of ClusterName. All MachineSet names start with {InfraID}-{PoolName}- (except for Nutanix when there is no failure domain, use {InfraID}-{PoolName} ).

When a user manually labels a MachineSet with hive.openshift.io/machine-pool during adoption, there's a brief period before Hive's reconcile logic adds the hive.openshift.io/managed=true label. So name label present but
managed label absent is a valid transitional state.

I agree with your points, I now also feel that the managed label is not necessary - the name label should be sufficient. I'll remove the mset name-matching logic and only check the hive.openshift.io/machine-pool label for control determination. Does this sound good?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a user manually labels a MachineSet with hive.openshift.io/machine-pool during adoption, there's a brief period before Hive's reconcile logic adds the hive.openshift.io/managed=true label. So name label present but managed label absent is a valid transitional state.

This behavior is an artifact of this function, though, right? Since we don't (yet) document that adding the name label is a supported thing, I think we have the flexibility to decide what the UX should be.

I agree with your points, I now also feel that the managed label is not necessary - the name label should be sufficient. I'll remove the mset name-matching logic and only check the hive.openshift.io/machine-pool label for control determination. Does this sound good?

Yes.
I've now been through all the possible permutations in my head :)
I think we should continue adding the "managed" label, just in case anyone is using it. It's also used for other spoke artifacts, like syncset resources -- in that regard it's a good signal for the spoke admin that they shouldn't muck with the object.
BUT
I think we can stop reading it in our code. (AFAICS this is the only spot we were doing that.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior is an artifact of this function, though, right? Since we don't (yet) document that adding the name label is a supported thing, I think we have the flexibility to decide what the UX should be.

Yes, we can update the UX to avoid this issue with the function.
For the adopt process: I've implemented both labels being added together in the hiveutil machinepool adopt CLI, and tests are passing. This avoids the transition state. I'm still thinking through whether this approach is reasonable, but I haven't identified any issues yet :)

For the control logic: remove the MachineSet name matching and check the hive.openshift.io/machine-pool label.

@huangmingxia huangmingxia force-pushed the HIVE-2199 branch 2 times, most recently from 1424218 to f8ab8d0 Compare January 15, 2026 08:44
@2uasimojo
Copy link
Member

This is looking good. Please remember to update the PR title/description and commit message.

@huangmingxia
Copy link
Contributor Author

huangmingxia commented Jan 15, 2026

/retitle HIVE-2199: Drop prefix-based ownership checks in MachinePool controller

@openshift-ci openshift-ci bot changed the title HIVE-2199: Use infraID for MachineSet identification HIVE-2199: Drop prefix-based ownership checks in MachinePool controller Jan 15, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 15, 2026

@huangmingxia: This pull request references HIVE-2199 which is a valid jira issue.

Details

In response to this:

What does this PR do

  • Simplifies ownership detection in the MachinePool controller: an object (e.g. MachineSet / MachineAutoscaler) is treated as controlled by a MachinePool only if its hive.openshift.io/machine-pool label value equals MachinePool.spec.name. The previous name-prefix / hive.openshift.io/managed fallback is removed.
  • Removes unused parameters introduced by the simplification (e.g. dropping cd from syncMachineAutoscalers and updating call sites).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@huangmingxia
Copy link
Contributor Author

@2uasimojo Thank you very much for your review :)
I would appreciate it if you could help fix the security job issue.

@2uasimojo
Copy link
Member

@huangmingxia Almost there. Need to update the commit message as well (which will unfortunately require a fresh push and CI run).

The security issue is being tracked via HIVE-3053. As long as the failure matches, and is limited to, what's in that card, we can /override the job in PRs. You may not have the power to do that, but I can take care of it when I do the final review.

The isControlledByMachinePool function previously used a two-step
approach: first checking for the machinePoolNameLabel, then falling
back to prefix-based matching with HiveManagedLabel. This fallback
logic was error-prone and could cause false positives.

This commit simplifies the ownership detection to only check the
machinePoolNameLabel, making it explicit and safer. Objects are
now considered controlled by a MachinePool only if they have the
hive.openshift.io/machine-pool label matching the pool name.

Changes:
- Remove prefix-based fallback logic from isControlledByMachinePool
- Remove unused cd parameter from syncMachineAutoscalers function
- Update all call sites to match the simplified function signatures
- Update unit tests to reflect the label-only ownership model

This ensures that MachineSets and MachineAutoscalers are only
managed when they are explicitly labeled, preventing accidental
modification or deletion of objects that happen to match naming
patterns but aren't actually managed by Hive.
@huangmingxia
Copy link
Contributor Author

@2uasimojo Ah, my bad - I missed that. Thanks for the reminder!

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 16, 2026

@huangmingxia: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/security bc52047 link true /test security

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants