Context
Phase A (Kamaji + CAPI + Karpenter on Scaleway VMs) is shipping. The next step is bare-metal autoscaling so that sustained workloads (>12h/day) can migrate automatically from cloud VMs to Scaleway Elastic Metal via Karpenter consolidation.
No OSS CAPI provider supports Scaleway Elastic Metal today (CAPS v0.2.1 is VM-only). Matchbox cannot reach Scaleway EM's managed network. The cleanest path is a native Karpenter CloudProvider that wraps the Scaleway EM API directly — no CAPI indirection.
See ADR-025 §3.5 and ADR-024 for the architectural rationale.
Scope
New repo (or subfolder): providers/karpenter-scaleway-em/
providers/karpenter-scaleway-em/
├── cmd/karpenter-scaleway-em/ # main.go — registers with karpenter core
├── pkg/
│ ├── apis/v1alpha1/ # ScalewayElasticMetalNodeClass CRD
│ ├── cloudprovider/ # 7 methods of pkg/cloudprovider/types.go
│ │ ├── create.go
│ │ ├── delete.go
│ │ ├── get_list.go
│ │ ├── instance_types.go # EM offer catalog → InstanceType map
│ │ ├── drift.go
│ │ └── repair.go
│ └── talos/ # rescue + dd orchestration (reusable package)
│ ├── install.go # SSH rescue → curl | xz | dd
│ ├── config.go # render machine config per NodeClaim
│ └── wait.go # poll Scaleway + Talos API states
├── charts/karpenter-scaleway-em/ # Helm chart
└── Makefile + go.mod + Dockerfile
Implementation approach
Chosen path: no-preinstalled-image + rescue + dd Talos RAW
Rationale (vs cloud-init+kexec and BMC+ISO): fastest (~10-12 min vs ~14/30), fully API-driven (SSH key auto-injected in rescue), atomic disk write, debuggable via SSH fallback.
Create(NodeClaim) flow
POST /baremetal/v1/zones/{zone}/servers with install: null
- Poll
GET /servers/{id} until status: ready (~3-5 min)
POST /servers/{id}/reboot with boot_type: rescue (~3-5 min)
- SSH
rescue@<ip>:
- Auto-detect disk:
lsblk -dno NAME,TYPE | awk '$2=="disk"{print "/dev/"$1; exit}'
curl -fsSL <talos.raw.xz> | xz -d | dd of=$DISK bs=4M oflag=direct
POST /servers/{id}/reboot with boot_type: normal
- Poll Talos API :50000 (maintenance mode) for ~3 min
talosctl apply-config with rendered config (kubelet join token, labels, taints, hostname)
- Return
NodeClaim hydrated with providerID: scaleway-em://{zone}/{server_id}
Karpenter core takes over — watches Node Ready, binds NodeClaim.
Delete/List/Drift/InstanceTypes
Delete: DELETE /servers/{id} + optional cooldown (EM billed monthly — disruption.consolidateAfter: 1h minimum)
List/Get: filter by tag karpenter.sh/nodepool=<name>
IsDrifted: compare live server image hash (stored in tags) vs NodeClass
GetInstanceTypes: static catalog mapping Scaleway EM offers (EM-I220E, EM-L520E, etc.) to Karpenter InstanceType resources (CPU, memory, zones, price)
Key gotchas
Estimated effort
| Task |
Days |
Skeleton (copy from kwok provider) + CRD + main.go |
2-3 |
| Scaleway SDK wrappers (7 methods) |
3-4 |
| Rescue+dd orchestration (SSH robust, retry, timeouts) |
3-5 |
| Talos machine config rendering per-node |
2 |
| Out-of-stock + drift + pricing catalog |
2 |
| E2E tests on fr-par-2 + doc + OSS release |
3-4 |
| Total |
15-20 days (3-4 weeks) |
Prerequisites (to do before starting)
Alternatives considered
| Approach |
Effort |
Verdict |
Upstream PR to CAPS adding ScalewayElasticMetalMachine |
25-35 days + Scaleway review cycle |
Slower, uncertain timeline, more CAPI ceremony for no functional gain |
| Siderolabs Omni BMIP (commercial) |
Negligible |
License cost, vendor lock-in |
| Matchbox |
N/A |
Incompatible — Scaleway EM doesn't expose DHCP/PXE control |
| Cloud-init + kexec on Debian |
Similar effort |
Slower (~14 min), riskier (kexec fails = rescue mandatory) |
| BMC + ISO manual |
N/A |
Not scriptable (HTML5/Java KVM) |
| Native Karpenter + rescue + dd |
15-20 days |
✅ Winner |
References
Related tasks: supersedes task #20 (bare-metal CAPI path decision) — outcome: path D (custom Karpenter provider) chosen over A (CAPS upstream) and B (Matchbox, which is incompatible with Scaleway EM).
Context
Phase A (Kamaji + CAPI + Karpenter on Scaleway VMs) is shipping. The next step is bare-metal autoscaling so that sustained workloads (>12h/day) can migrate automatically from cloud VMs to Scaleway Elastic Metal via Karpenter consolidation.
No OSS CAPI provider supports Scaleway Elastic Metal today (CAPS v0.2.1 is VM-only). Matchbox cannot reach Scaleway EM's managed network. The cleanest path is a native Karpenter CloudProvider that wraps the Scaleway EM API directly — no CAPI indirection.
See ADR-025 §3.5 and ADR-024 for the architectural rationale.
Scope
New repo (or subfolder):
providers/karpenter-scaleway-em/Implementation approach
Chosen path:
no-preinstalled-image + rescue + dd Talos RAWRationale (vs cloud-init+kexec and BMC+ISO): fastest (~10-12 min vs ~14/30), fully API-driven (SSH key auto-injected in rescue), atomic disk write, debuggable via SSH fallback.
Create(NodeClaim) flow
POST /baremetal/v1/zones/{zone}/serverswithinstall: nullGET /servers/{id}untilstatus: ready(~3-5 min)POST /servers/{id}/rebootwithboot_type: rescue(~3-5 min)rescue@<ip>:lsblk -dno NAME,TYPE | awk '$2=="disk"{print "/dev/"$1; exit}'curl -fsSL <talos.raw.xz> | xz -d | dd of=$DISK bs=4M oflag=directPOST /servers/{id}/rebootwithboot_type: normaltalosctl apply-configwith rendered config (kubelet join token, labels, taints, hostname)NodeClaimhydrated withproviderID: scaleway-em://{zone}/{server_id}Karpenter core takes over — watches Node Ready, binds NodeClaim.
Delete/List/Drift/InstanceTypes
Delete:DELETE /servers/{id}+ optional cooldown (EM billed monthly —disruption.consolidateAfter: 1hminimum)List/Get: filter by tagkarpenter.sh/nodepool=<name>IsDrifted: compare live server image hash (stored in tags) vs NodeClassGetInstanceTypes: static catalog mapping Scaleway EM offers (EM-I220E,EM-L520E, etc.) to Karpenter InstanceType resources (CPU, memory, zones, price)Key gotchas
nvme0n1(most offers),sda(legacy A-series). Auto-detect vialsblk.eth0) and Talos kernel (enpXsY). Need to detect viatalosctl --insecure get linkspost-boot or use MAC-based match in machine config.UnavailableOfferingso Karpenter falls back to VM pools.disruption.consolidateAfter: 1h.stacks/storage) for the.raw.xzfiles. One per schematic-sha7.Estimated effort
kwokprovider) + CRD +main.goPrerequisites (to do before starting)
Alternatives considered
ScalewayElasticMetalMachineReferences
Related tasks: supersedes task #20 (bare-metal CAPI path decision) — outcome: path D (custom Karpenter provider) chosen over A (CAPS upstream) and B (Matchbox, which is incompatible with Scaleway EM).