diff --git a/CHANGELOG.md b/CHANGELOG.md index 40d60ff..b753d2c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - Optional Helm [Config Connector](https://docs.cloud.google.com/config-connector/docs/overview) resources for GKE - Determine cloud provider from pv csi driver name +- Optional Helm [ASO](https://azure.github.io/azure-service-operator) Azure Service Operator resources for AKS +- Azure Disk labelling and a testing guide for AKS ## [0.2.1] - 2026-02-26 diff --git a/Cargo.toml b/Cargo.toml index 5737c82..21529f5 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -23,6 +23,7 @@ serde_json = "1.0.149" reqwest = { version = "0.13", default-features = false, features = [ "json", "query", + "form", "rustls-no-provider", "http2", ] } diff --git a/README.md b/README.md index fcc78ce..04f703c 100644 --- a/README.md +++ b/README.md @@ -79,6 +79,8 @@ helm template k8s-cloud-tagger helm/k8s-cloud-tagger/ --set serviceMonitor.enabl ## Label sanitisation +### GCP + Kubernetes label keys and values can contain characters that are not valid in GCP labels. GCP labels only allow lowercase letters, digits, hyphens, and underscores (`[a-z0-9_-]`), with keys limited to 63 characters and required to start with a lowercase letter. @@ -96,6 +98,22 @@ For more detail on GCP label requirements, see the [Google Cloud labeling best p | `upgrades.dev/managed-by: k8s-cloud-tagger` | `upgrades-dev-managed-by: k8s-cloud-tagger` | | `Team: Platform` | `team: platform` | +### Azure + +Azure resource tag keys may contain any Unicode character except `<`, `>`, `%`, `&`, `\`, `?`, and `/`. +Keys are limited to 512 characters and values to 256 characters. +k8s-cloud-tagger replaces each disallowed character in a key with a hyphen, and truncates keys and values to their respective limits. +Unlike GCP, Azure tags are not lowercased — tag names are case-insensitive in Azure but case is preserved as supplied, and tag values are case-sensitive. +For more detail on Azure tag requirements, see the [Azure tag limitations](https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/tag-resources). + +| Kubernetes label | Azure tag | +| --- | --- | +| `app.kubernetes.io/name: frontend` | `app.kubernetes.io-name: frontend` | +| `helm.sh/chart: myapp-1.2.0` | `helm.sh-chart: myapp-1.2.0` | +| `env: production` | `env: production` | +| `upgrades.dev/managed-by: k8s-cloud-tagger` | `upgrades.dev-managed-by: k8s-cloud-tagger` | +| `Team: Platform` | `Team: Platform` | + ## Release 1. Check out a new branch diff --git a/docs/azure.md b/docs/azure.md new file mode 100644 index 0000000..7e5e960 --- /dev/null +++ b/docs/azure.md @@ -0,0 +1,211 @@ +# Azure (AKS) + +This document covers deploying `k8s-cloud-tagger` on AKS using +[Workload Identity](https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview) for +authentication and [Azure Service Operator](https://azure.github.io/azure-service-operator/) (ASO) +to manage the required Azure resources. + +## How it works + +The chart sets the `azure.workload.identity/use: "true"` label on the pod template and the +`azure.workload.identity/client-id` annotation on the ServiceAccount. At pod creation time the AKS +Workload Identity webhook injects `AZURE_CLIENT_ID`, `AZURE_TENANT_ID`, `AZURE_AUTHORITY_HOST`, and +`AZURE_FEDERATED_TOKEN_FILE` into the pod. The controller uses these to obtain an ARM bearer token +and call the Tags API. + +When `azure.serviceOperator.enabled=true`, ASO creates and manages: +- A `UserAssignedIdentity` (the managed identity) +- A `FederatedIdentityCredential` (the OIDC trust binding between the identity and the ServiceAccount) +- A `RoleAssignment` granting [Tag Contributor](https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/management-and-governance#tag-contributor) at subscription scope + +All ASO resources are `detach-on-delete` — `helm uninstall` will not delete them in Azure. + +> **Note:** The managed identity must be pre-created before `helm install` because its `clientId` +> must be known at install time to annotate the ServiceAccount. ASO will adopt and manage the +> identity going forward. + +## 1. Set environment variables + +```bash +export RESOURCE_GROUP= +export LOCATION= +export CLUSTER_NAME= +export SUBSCRIPTION_ID= +export TAG= # image tag, e.g. sha-63d1b9b +``` + +## 2. Create the AKS cluster + +```bash +az group create \ + --name $RESOURCE_GROUP \ + --location $LOCATION + +az aks create \ + --resource-group $RESOURCE_GROUP \ + --name $CLUSTER_NAME \ + --location $LOCATION \ + --node-count 1 \ + --node-vm-size Standard_D2s_v3 \ + --enable-oidc-issuer \ + --enable-workload-identity \ + --generate-ssh-keys + +az aks get-credentials \ + --resource-group $RESOURCE_GROUP \ + --name $CLUSTER_NAME +``` + +## 3. Install cert-manager + +Required by ASO. + +```bash +kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml + +kubectl wait --namespace cert-manager \ + --for=condition=Ready pod --all --timeout=300s +``` + +## 4. Install Azure Service Operator + +Scoped to only the CRD groups this project needs. + +```bash +helm repo add aso2 https://raw.githubusercontent.com/Azure/azure-service-operator/main/v2/charts + +helm upgrade --install aso2 aso2/azure-service-operator \ + --create-namespace \ + --namespace azureserviceoperator-system \ + --set crdPattern='resources.azure.com/*;managedidentity.azure.com/*;authorization.azure.com/*' + +kubectl wait --namespace azureserviceoperator-system \ + --for=condition=Ready pod --all --timeout=300s +``` + +## 5. Create a service principal for ASO + +ASO uses this credential to manage Azure resources on your behalf. + +> **Note:** `Owner` is used here for convenience. Fine-grained permissions should be configured +> before production use. + +```bash +ASO_SP=$(az ad sp create-for-rbac \ + --name "aso-${CLUSTER_NAME}" \ + --role Owner \ + --scopes "/subscriptions/${SUBSCRIPTION_ID}" \ + --output json) + +export ASO_CLIENT_ID=$(echo $ASO_SP | jq -r .appId) +export ASO_CLIENT_SECRET=$(echo $ASO_SP | jq -r .password) +export TENANT_ID=$(echo $ASO_SP | jq -r .tenant) +``` + +## 6. Configure ASO credentials + +ASO uses per-namespace credentials. Create the secret in the same namespace as the chart resources. + +```bash +kubectl create namespace k8s-cloud-tagger + +cat < - # Azure AKS (Workload Identity): - # azure.workload.identity/client-id: + # Azure AKS (Workload Identity): use azure.clientId value instead # -- Labels to add to the ServiceAccount labels: {} # -- Disable automatic API token mounting at the SA level @@ -84,6 +83,24 @@ serviceMonitor: service: annotations: {} +# -- Azure +azure: + # Client ID of the user-assigned managed identity used for AKS Workload Identity. + # Required when cloudProvider=azure so the webhook can inject AZURE_CLIENT_ID into the pod. + # Obtain with: az identity show --resource-group --name k8s-cloud-tagger --query clientId -o tsv + clientId: "" + # Name of the Azure user-assigned managed identity resource (and the ConfigMap ASO writes its IDs into). + # Change this if the name conflicts with an existing identity in your resource group. + identityName: "k8s-cloud-tagger-identity" + serviceOperator: + enabled: false + resourceGroup: "" + location: "" + subscriptionId: "" + # OIDC issuer URL of the AKS cluster + # (az aks show --query oidcIssuerProfile.issuerUrl) + oidcIssuerUrl: "" + # -- Google Cloud gcp: projectId: "" @@ -92,4 +109,4 @@ gcp: configConnector: enabled: false # Name of the role in the cloud provider: [a-zA-Z0-9_\.]{3,64} - customRoleName: "k8s_cloud_tagger" \ No newline at end of file + customRoleName: "k8s_cloud_tagger" diff --git a/src/cloud/azure.rs b/src/cloud/azure.rs new file mode 100644 index 0000000..83b9363 --- /dev/null +++ b/src/cloud/azure.rs @@ -0,0 +1,281 @@ +use crate::cloud::{CloudClient, Labels}; +use crate::error::Error; +use crate::tls::http_client; +use async_trait::async_trait; +use reqwest::Client; +use serde::Serialize; +use std::collections::BTreeMap; + +const TAGS_API_VERSION: &str = "2021-04-01"; +const DEFAULT_AUTHORITY_HOST: &str = "https://login.microsoftonline.com/"; +const ARM_SCOPE: &str = "https://management.azure.com/.default"; +const CLIENT_ASSERTION_TYPE: &str = "urn:ietf:params:oauth:client-assertion-type:jwt-bearer"; + +/// A parsed Azure Managed Disk ARM resource ID. +/// +/// The CSI volume handle for `disk.csi.azure.com` is the full ARM resource ID: +/// `/subscriptions//resourceGroups//providers/Microsoft.Compute/disks/` +pub struct AzureDisk { + pub resource_id: String, +} + +impl AzureDisk { + /// Validate and wrap an ARM resource ID for a managed disk. + /// + /// Expected shape (case-insensitive provider segment): + /// `/subscriptions//resourceGroups//providers/Microsoft.Compute/disks/` + pub fn parse(resource_id: &str) -> Option { + // Split on '/' — leading '/' yields an empty first element. + let parts: Vec<&str> = resource_id.split('/').collect(); + // ["", "subscriptions", sub, "resourceGroups", rg, + // "providers", "Microsoft.Compute", "disks", name] + if parts.len() != 9 { + return None; + } + if !parts[0].is_empty() + || !parts[1].eq_ignore_ascii_case("subscriptions") + || !parts[3].eq_ignore_ascii_case("resourceGroups") + || !parts[5].eq_ignore_ascii_case("providers") + || !parts[6].eq_ignore_ascii_case("Microsoft.Compute") + || !parts[7].eq_ignore_ascii_case("disks") + { + return None; + } + Some(Self { + resource_id: resource_id.to_string(), + }) + } + + /// Build the ARM Tags API URL for this disk. + pub fn tags_url(&self) -> String { + format!( + "https://management.azure.com{}/providers/Microsoft.Resources/tags/default?api-version={}", + self.resource_id, TAGS_API_VERSION + ) + } +} + +/// Sanitise a string for use as an Azure resource tag key or value. +/// +/// Azure tag constraints: +/// - Keys: max 512 chars; must not contain `<`, `>`, `%`, `&`, `\`, `?`, `/` +/// - Values: max 256 chars +/// +/// We replace disallowed characters with `-` and truncate to the limit. +fn sanitise_azure_tag_key(input: &str) -> String { + input + .chars() + .map(|c| match c { + '<' | '>' | '%' | '&' | '\\' | '?' | '/' => '-', + _ => c, + }) + .take(512) + .collect() +} + +fn sanitise_azure_tag_value(input: &str) -> String { + input.chars().take(256).collect() +} + +fn sanitise_tags(labels: &Labels) -> BTreeMap { + labels + .iter() + .map(|(k, v)| (sanitise_azure_tag_key(k), sanitise_azure_tag_value(v))) + .collect() +} + +#[derive(serde::Deserialize)] +struct TokenResponse { + access_token: String, +} + +#[derive(Serialize)] +struct TagsPatch { + operation: &'static str, + properties: TagsProperties, +} + +#[derive(Serialize)] +struct TagsProperties { + tags: BTreeMap, +} + +pub struct AzureClient { + http: Client, + client_id: String, + tenant_id: String, + authority_host: String, + federated_token_file: String, +} + +impl AzureClient { + pub fn new() -> Result { + let client_id = std::env::var("AZURE_CLIENT_ID") + .map_err(|_| Error::Azure("AZURE_CLIENT_ID not set".into()))?; + let tenant_id = std::env::var("AZURE_TENANT_ID") + .map_err(|_| Error::Azure("AZURE_TENANT_ID not set".into()))?; + let authority_host = std::env::var("AZURE_AUTHORITY_HOST") + .unwrap_or_else(|_| DEFAULT_AUTHORITY_HOST.to_string()); + let federated_token_file = std::env::var("AZURE_FEDERATED_TOKEN_FILE") + .map_err(|_| Error::Azure("AZURE_FEDERATED_TOKEN_FILE not set".into()))?; + Ok(Self { + http: http_client()?, + client_id, + tenant_id, + authority_host, + federated_token_file, + }) + } + + /// Obtain a bearer token using AKS Workload Identity. + /// + /// The AKS Workload Identity webhook injects four environment variables: + /// - `AZURE_FEDERATED_TOKEN_FILE` — path to the projected K8s service account token + /// - `AZURE_CLIENT_ID` — the managed identity's client ID + /// - `AZURE_TENANT_ID` — the Azure AD tenant ID + /// - `AZURE_AUTHORITY_HOST` — AAD endpoint (defaults to https://login.microsoftonline.com/) + /// + /// The K8s token is exchanged for an ARM bearer token via the OAuth 2.0 + /// client credentials flow with a federated assertion. + async fn workload_identity_token(&self) -> Result { + let assertion = std::fs::read_to_string(&self.federated_token_file).map_err(|e| { + Error::Azure(format!("Failed to read {}: {e}", self.federated_token_file)) + })?; + + let url = format!( + "{}{}/oauth2/v2.0/token", + self.authority_host, self.tenant_id + ); + + let resp: TokenResponse = self + .http + .post(&url) + .form(&[ + ("grant_type", "client_credentials"), + ("client_assertion_type", CLIENT_ASSERTION_TYPE), + ("client_assertion", assertion.trim()), + ("client_id", &self.client_id), + ("scope", ARM_SCOPE), + ]) + .send() + .await? + .error_for_status()? + .json() + .await?; + + Ok(resp.access_token) + } +} + +#[async_trait] +impl CloudClient for AzureClient { + fn provider_name(&self) -> &'static str { + "azure" + } + + async fn set_tags(&self, resource_id: &str, labels: &Labels) -> Result<(), Error> { + let disk = AzureDisk::parse(resource_id) + .ok_or_else(|| Error::CloudApi(format!("Invalid Azure resource ID: {resource_id}")))?; + + let token = self.workload_identity_token().await?; + + let sanitised = sanitise_tags(labels); + let body = TagsPatch { + operation: "Merge", + properties: TagsProperties { + tags: sanitised.clone(), + }, + }; + + self.http + .patch(disk.tags_url()) + .bearer_auth(&token) + .json(&body) + .send() + .await? + .error_for_status()?; + + tracing::debug!( + disk = %resource_id, + tags = ?sanitised, + "Azure: tags merged" + ); + + Ok(()) + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn parse_valid_disk() { + let id = + "/subscriptions/sub-id/resourceGroups/my-rg/providers/Microsoft.Compute/disks/my-disk"; + let disk = AzureDisk::parse(id).unwrap(); + assert_eq!(disk.resource_id, id); + assert_eq!( + disk.tags_url(), + format!( + "https://management.azure.com{}/providers/Microsoft.Resources/tags/default?api-version={}", + id, TAGS_API_VERSION + ) + ); + } + + #[test] + fn parse_invalid() { + assert!(AzureDisk::parse("not-a-resource-id").is_none()); + assert!(AzureDisk::parse("").is_none()); + // Wrong provider + assert!( + AzureDisk::parse( + "/subscriptions/s/resourceGroups/rg/providers/Microsoft.Storage/storageAccounts/sa" + ) + .is_none() + ); + // Too few segments + assert!( + AzureDisk::parse( + "/subscriptions/s/resourceGroups/rg/providers/Microsoft.Compute/disks" + ) + .is_none() + ); + } + + #[test] + fn sanitise_tag_key_replaces_disallowed() { + assert_eq!( + sanitise_azure_tag_key("app.kubernetes.io/name"), + "app.kubernetes.io-name" + ); + assert_eq!( + sanitise_azure_tag_key("keybad%chars"), + "key-with-bad-chars" + ); + assert_eq!(sanitise_azure_tag_key("normal-key"), "normal-key"); + } + + #[test] + fn sanitise_tag_key_truncates() { + let long = "a".repeat(600); + assert_eq!(sanitise_azure_tag_key(&long).len(), 512); + } + + #[test] + fn sanitise_tag_value_truncates() { + let long = "v".repeat(300); + assert_eq!(sanitise_azure_tag_value(&long).len(), 256); + } + + #[test] + fn sanitise_labels() { + let mut labels = BTreeMap::new(); + labels.insert("app.kubernetes.io/name".to_string(), "frontend".to_string()); + labels.insert("env".to_string(), "prod".to_string()); + let result = sanitise_tags(&labels); + assert_eq!(result["app.kubernetes.io-name"], "frontend"); + assert_eq!(result["env"], "prod"); + } +} diff --git a/src/cloud/mod.rs b/src/cloud/mod.rs index c55c1e2..aed154e 100644 --- a/src/cloud/mod.rs +++ b/src/cloud/mod.rs @@ -1,8 +1,10 @@ +mod azure; mod gcp; mod mock; pub use mock::MockClient; +use crate::cloud::azure::AzureClient; use crate::cloud::gcp::GcpClient; use crate::error::Error; use crate::metrics::API_CALL_DURATION; @@ -65,7 +67,7 @@ pub async fn create_client(provider: &CloudProvider) -> Result Ok(Box::new(MockClient::default())), CloudProvider::Aws => Err(Error::Config("not implemented".into())), - CloudProvider::Azure => Err(Error::Config("not implemented".into())), + CloudProvider::Azure => Ok(Box::new(AzureClient::new()?)), CloudProvider::Gcp => Ok(Box::new(GcpClient::new().await?)), CloudProvider::Other => Err(Error::Config( "cloudProvider 'other' is not a valid configuration value".into(), diff --git a/src/error.rs b/src/error.rs index b574e81..68e5a46 100644 --- a/src/error.rs +++ b/src/error.rs @@ -17,6 +17,9 @@ pub enum Error { #[error("GCP auth error: {0}")] Gcp(#[from] gcp_auth::Error), + + #[error("Azure auth error: {0}")] + Azure(String), } impl Error { @@ -28,6 +31,7 @@ impl Error { Error::CloudApi(_) => "cloud_api", Error::Config(_) => "config", Error::Gcp(_) => "gcp", + Error::Azure(_) => "azure", Error::Reqwest(_) => "http", } } diff --git a/src/tls.rs b/src/tls.rs index fa41ef6..f03abcb 100644 --- a/src/tls.rs +++ b/src/tls.rs @@ -32,14 +32,12 @@ pub fn client_config() -> ClientConfig { } /// Build a `reqwest::Client` using our TLS configuration. -#[allow(dead_code)] // TODO(afharvey) issue 14 will use this pub fn http_client() -> reqwest::Result { reqwest::Client::builder() .use_preconfigured_tls(client_config()) .build() } -#[allow(dead_code)] // TODO(afharvey) issue 14 will use this fn root_cert_store() -> RootCertStore { let mut roots = RootCertStore::empty(); diff --git a/src/traits.rs b/src/traits.rs index 53b53b9..07865fb 100644 --- a/src/traits.rs +++ b/src/traits.rs @@ -11,7 +11,7 @@ use std::str::FromStr; #[derive(Debug, Clone)] pub struct CloudResource { /// The cloud provider that owns this resource. - pub provider: CloudProvider, + pub provider: CloudProvider, // TODO https://github.com/upgrades-dev/k8s-cloud-tagger/issues/85 /// Provider-specific resource identifier (e.g. `vol-0abc123`). pub resource_id: String, /// Labels to propagate from Kubernetes to the cloud resource.