Skip to content

Phase B — Karpenter-native provider for Scaleway Elastic Metal (no-image + rescue + dd) #1

@Destynova2

Description

@Destynova2

Context

Phase A (Kamaji + CAPI + Karpenter on Scaleway VMs) is shipping. The next step is bare-metal autoscaling so that sustained workloads (>12h/day) can migrate automatically from cloud VMs to Scaleway Elastic Metal via Karpenter consolidation.

No OSS CAPI provider supports Scaleway Elastic Metal today (CAPS v0.2.1 is VM-only). Matchbox cannot reach Scaleway EM's managed network. The cleanest path is a native Karpenter CloudProvider that wraps the Scaleway EM API directly — no CAPI indirection.

See ADR-025 §3.5 and ADR-024 for the architectural rationale.

Scope

New repo (or subfolder): providers/karpenter-scaleway-em/

providers/karpenter-scaleway-em/
├── cmd/karpenter-scaleway-em/       # main.go — registers with karpenter core
├── pkg/
│   ├── apis/v1alpha1/               # ScalewayElasticMetalNodeClass CRD
│   ├── cloudprovider/               # 7 methods of pkg/cloudprovider/types.go
│   │   ├── create.go
│   │   ├── delete.go
│   │   ├── get_list.go
│   │   ├── instance_types.go        # EM offer catalog → InstanceType map
│   │   ├── drift.go
│   │   └── repair.go
│   └── talos/                       # rescue + dd orchestration (reusable package)
│       ├── install.go               # SSH rescue → curl | xz | dd
│       ├── config.go                # render machine config per NodeClaim
│       └── wait.go                  # poll Scaleway + Talos API states
├── charts/karpenter-scaleway-em/    # Helm chart
└── Makefile + go.mod + Dockerfile

Implementation approach

Chosen path: no-preinstalled-image + rescue + dd Talos RAW

Rationale (vs cloud-init+kexec and BMC+ISO): fastest (~10-12 min vs ~14/30), fully API-driven (SSH key auto-injected in rescue), atomic disk write, debuggable via SSH fallback.

Create(NodeClaim) flow

  1. POST /baremetal/v1/zones/{zone}/servers with install: null
  2. Poll GET /servers/{id} until status: ready (~3-5 min)
  3. POST /servers/{id}/reboot with boot_type: rescue (~3-5 min)
  4. SSH rescue@<ip>:
    • Auto-detect disk: lsblk -dno NAME,TYPE | awk '$2=="disk"{print "/dev/"$1; exit}'
    • curl -fsSL <talos.raw.xz> | xz -d | dd of=$DISK bs=4M oflag=direct
  5. POST /servers/{id}/reboot with boot_type: normal
  6. Poll Talos API :50000 (maintenance mode) for ~3 min
  7. talosctl apply-config with rendered config (kubelet join token, labels, taints, hostname)
  8. Return NodeClaim hydrated with providerID: scaleway-em://{zone}/{server_id}

Karpenter core takes over — watches Node Ready, binds NodeClaim.

Delete/List/Drift/InstanceTypes

  • Delete: DELETE /servers/{id} + optional cooldown (EM billed monthly — disruption.consolidateAfter: 1h minimum)
  • List/Get: filter by tag karpenter.sh/nodepool=<name>
  • IsDrifted: compare live server image hash (stored in tags) vs NodeClass
  • GetInstanceTypes: static catalog mapping Scaleway EM offers (EM-I220E, EM-L520E, etc.) to Karpenter InstanceType resources (CPU, memory, zones, price)

Key gotchas

  • Disk name variabilitynvme0n1 (most offers), sda (legacy A-series). Auto-detect via lsblk.
  • Interface name mismatch between rescue (eth0) and Talos kernel (enpXsY). Need to detect via talosctl --insecure get links post-boot or use MAC-based match in machine config.
  • Out-of-stock — surface as UnavailableOffering so Karpenter falls back to VM pools.
  • Quota — default 5 servers/project, needs ticket to raise.
  • IPv6 — Scaleway gives /128, not /64. Use static config.
  • Monthly billing — no refund on Delete. Enforce min-lifetime via NodePool disruption.consolidateAfter: 1h.
  • Talos image hosting — use Garage S3 bucket (already in stacks/storage) for the .raw.xz files. One per schematic-sha7.

Estimated effort

Task Days
Skeleton (copy from kwok provider) + CRD + main.go 2-3
Scaleway SDK wrappers (7 methods) 3-4
Rescue+dd orchestration (SSH robust, retry, timeouts) 3-5
Talos machine config rendering per-node 2
Out-of-stock + drift + pricing catalog 2
E2E tests on fr-par-2 + doc + OSS release 3-4
Total 15-20 days (3-4 weeks)

Prerequisites (to do before starting)

  • Decision trigger: Phase A stable in prod for 2-3 months
  • Cost justification: measure ≥5 nodes saturated >12h/day, OR massive workloads (DB, Spark, ML training) that justify I/O on bare-metal
  • Stock / quota check: raise EM quota via Scaleway support ticket
  • 1-2 test servers pre-provisioned manually to validate rescue+dd flow before coding

Alternatives considered

Approach Effort Verdict
Upstream PR to CAPS adding ScalewayElasticMetalMachine 25-35 days + Scaleway review cycle Slower, uncertain timeline, more CAPI ceremony for no functional gain
Siderolabs Omni BMIP (commercial) Negligible License cost, vendor lock-in
Matchbox N/A Incompatible — Scaleway EM doesn't expose DHCP/PXE control
Cloud-init + kexec on Debian Similar effort Slower (~14 min), riskier (kexec fails = rescue mandatory)
BMC + ISO manual N/A Not scriptable (HTML5/Java KVM)
Native Karpenter + rescue + dd 15-20 days ✅ Winner

References


Related tasks: supersedes task #20 (bare-metal CAPI path decision) — outcome: path D (custom Karpenter provider) chosen over A (CAPS upstream) and B (Matchbox, which is incompatible with Scaleway EM).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions