Skip to content

Add Machine API Integration for bootstrap Path#133

Open
wenxuan0923 wants to merge 7 commits intomainfrom
wenx/machine-inte
Open

Add Machine API Integration for bootstrap Path#133
wenxuan0923 wants to merge 7 commits intomainfrom
wenx/machine-inte

Conversation

@wenxuan0923
Copy link
Copy Markdown
Collaborator

@wenxuan0923 wenxuan0923 commented Mar 23, 2026

Introduces a new aksmachine component that registers the local machine with the AKS control plane during bootstrap using the ARM Machine and AgentPool APIs.

ensureMachine will:

  1. Create an aksflexnodes pool with mode "Machines" — if it already exists, return early (idempotent).
  2. Create a machine with "aks-flex-node" tag and Kubernetes profile (version, max pods, labels, taints, kubelet config) into the aksflexnodes pool. If machine already exists, return early. Any update of the machine property should be done through AKS RP API.

This action will be added into bootstrap steps once RP side of the change is released, right now it is commented out.

Testing:
image

@wenxuan0923 wenxuan0923 changed the title [WIP] Add Machine Integration for Create and Delete Path Add Machine Integration for bootstrap Path Mar 25, 2026
@wenxuan0923 wenxuan0923 marked this pull request as ready for review March 25, 2026 19:26
Copilot AI review requested due to automatic review settings March 25, 2026 19:26
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new aksmachine component (action + server implementation) intended to register the local host as an AKS “Machine” during bootstrap, and upgrades the Azure ContainerService ARM SDK dependency to v8 to access the required APIs. The machine-registration bootstrap step is present but intentionally not yet wired into the bootstrap sequence (commented out) until the RP side is live.

Changes:

  • Upgrade armcontainerservice SDK usage across the repo from v5 to v8 (plus corresponding go.mod/go.sum updates).
  • Introduce the new components/aksmachine action proto, generated code, redaction, and v20260301 server implementation + unit tests.
  • Add an enabled flag to the Arc install action and skip Arc installation when disabled; tighten cluster-config enricher completion criteria to require both ServerURL and CA data.

Reviewed changes

Copilot reviewed 20 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pkg/spec/collector.go Switch ContainerService SDK import to armcontainerservice/v8.
pkg/spec/collector_test.go Switch ContainerService SDK import to armcontainerservice/v8.
pkg/kube/client.go Switch ContainerService SDK import to armcontainerservice/v8.
pkg/components/arc/arc_base.go Switch ContainerService SDK import to armcontainerservice/v8.
pkg/bootstrapper/components.go Add ensureMachine action resolver + AzureCredential builder; pass enabled into Arc action.
pkg/bootstrapper/cluster_config_enricher.go Switch SDK import to v8; consider step complete only when both ServerURL + CA are present.
pkg/bootstrapper/bootstrapper.go Add (commented) TODO/bootstrap hook for ensure-machine step.
go.mod Bump ContainerService SDK dependency to armcontainerservice/v8.
go.sum Update sums for ContainerService v8 and transitive ARM deps.
components/exports.go Register new aksmachine/v20260301 component via blank import.
components/arc/action.proto Add enabled field to InstallArcSpec.
components/arc/action.pb.go Regenerated code for the new enabled field.
components/arc/v20260301/install_arc.go Skip Arc installation when spec enabled=false.
components/arc/v20260301/arc_registration.go Switch ContainerService SDK import to armcontainerservice/v8.
components/arc/v20260301/arc_helpers.go Switch ContainerService SDK import to armcontainerservice/v8.
components/aksmachine/action.proto New proto defining EnsureMachine action and credential types.
components/aksmachine/action.pb.go Generated Go code for the new aksmachine proto.
components/aksmachine/redact.go Redact SP client secret from EnsureMachine action.
components/aksmachine/v20260301/exports.go Register EnsureMachine action server.
components/aksmachine/v20260301/ensure_machine.go Implement agent-pool + machine ensure logic using ARM clients.
components/aksmachine/v20260301/ensure_machine_test.go Unit tests for helper functions in ensure-machine implementation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 25, 2026 20:21
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 21 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +266 to +283
func buildARMClientOptions(endpointOverride string) *arm.ClientOptions {
if endpointOverride == "" {
return nil
}
return &arm.ClientOptions{
ClientOptions: azcore.ClientOptions{
Cloud: cloud.Configuration{
Services: map[cloud.ServiceName]cloud.ServiceConfiguration{
cloud.ResourceManager: {
Endpoint: endpointOverride,
// No audience needed for local servers that don't validate tokens.
Audience: endpointOverride,
},
},
},
InsecureAllowCredentialWithHTTP: true,
},
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildARMClientOptions enables InsecureAllowCredentialWithHTTP for any non-empty endpointOverride. If this override is ever made configurable (env/flag), it could accidentally allow sending ARM credentials over plaintext HTTP to a non-local endpoint. Consider restricting the override to loopback (localhost/127.0.0.1) and/or guarding this behind a test-only build tag to prevent insecure production use.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for development testing only, will never be made configurable.

Comment on lines +5 to +6
// Redact the service principal client secret carried in the spec.
if sp := x.GetSpec().GetAzureCredential().GetServicePrincipal(); sp != nil {
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnsureMachine.Redact can panic when Spec or AzureCredential is nil (e.g., when using Azure CLI credential fallback). Hub.ApplyAction always calls Redact() on the response action, so this needs to be nil-safe before dereferencing nested fields; only redact the SP secret when all intermediate pointers are non-nil and the service principal oneof is set.

Suggested change
// Redact the service principal client secret carried in the spec.
if sp := x.GetSpec().GetAzureCredential().GetServicePrincipal(); sp != nil {
// If the receiver or any nested fields are nil, there's nothing to redact.
if x == nil {
return
}
spec := x.GetSpec()
if spec == nil {
return
}
cred := spec.GetAzureCredential()
if cred == nil {
return
}
// Redact the service principal client secret carried in the spec, if present.
if sp := cred.GetServicePrincipal(); sp != nil {

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope it won't panic. GetXXX() has nil pointer protection internally

Comment on lines +61 to +66
spec := action.GetSpec()

// Skip all Azure operations when drift detection/remediation is disabled.
if !spec.GetEnabled() {
a.logger.Info("EnsureMachine: drift detection and remediation is disabled, skipping")
item, err := anypb.New(action)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ApplyAction assumes action.Spec is non-nil (spec := action.GetSpec(); then spec.GetEnabled()/GetSubscriptionId/...); a malformed request without spec will cause a nil-pointer panic and crash the action server. Consider validating spec != nil and returning codes.InvalidArgument when missing.

Copilot uses AI. Check for mistakes.
@wenxuan0923 wenxuan0923 changed the title Add Machine Integration for bootstrap Path Add Machine API Integration for bootstrap Path Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants