Summary
The CPU Overprovisioning and Resource Planning page covers VM-level CPU oversubscription thoroughly but does not explicitly address tenant node CPU oversubscription. Support needs authoritative wording it can cite in customer RCAs.
Type
Conceptual (with a small Reference section on enforcement points)
Problem
A customer asked for documentation backing up the statement "you can't oversubscribe CPU at the tenant node level." The current docs do not say this anywhere. In fact, the FAQ on the same page acknowledges the opposite:
"When you oversubscribe CPU resources, you are sharing that pool of physical cores with other VMs and tenants."
This leaves support without a citable rule when customers ask whether the platform prevents, allows, or recommends against summing tenant node cores beyond physical cores on the host.
Suggested Content
Audience: System administrators planning multi-tenant deployments, support engineers writing RCAs.
Prerequisites: Reader should already understand VM vCPU vs. physical core scheduling (already covered earlier on the page).
Key sections to add:
-
Aggregate tenant node CPU oversubscription — Whether 5 × 16-core tenant nodes can be placed on a 16-core physical node. What the platform allows vs. what it recommends. Whether there is a cluster-aggregate cap or only the per-machine cap (Max cores per machine).
-
How tenant CPU oversubscription differs from VM oversubscription — Tenant nodes are LXC containers running KVM workloads. Both layers participate in CFS scheduling. Cover whether oversubscribing at the tenant-node layer compounds with oversubscription inside the tenant (VMs the tenant admin creates).
-
The cgroup2 cpu.max ceiling — Each tenant container has a hard CPU bandwidth ceiling (lxc.cgroup2.cpu.max = cores × 1,000,000 1,000,000). Explain what this means in practice for the tenant admin: the tenant cannot exceed its allocation in aggregate, but vCPUs inside the tenant can still time-share among the tenant's VMs.
-
Recommendation — State clearly whether VergeOS recommends against oversubscribing tenant node cores, and why (scheduling fairness across tenants, customer-perceived "stolen time" inside the tenant, predictability of multi-tenant SLAs, etc.).
-
Enforcement summary table — What is hard-prevented (e.g., per-tenant-node cap via Max cores per machine, RAM availability check) vs. what is permitted but discouraged (e.g., aggregate tenant node cores exceeding host physical cores).
Context
Requested via support interaction while writing a customer RCA. The customer is auditing post-incident provisioning decisions and wants to cite official VergeOS documentation. Without this section, support cannot point to anything authoritative on tenant CPU oversubscription.
Similar gaps exist on these pages and likely need cross-references once the canonical content lands here:
Summary
The CPU Overprovisioning and Resource Planning page covers VM-level CPU oversubscription thoroughly but does not explicitly address tenant node CPU oversubscription. Support needs authoritative wording it can cite in customer RCAs.
Type
Conceptual (with a small Reference section on enforcement points)
Problem
A customer asked for documentation backing up the statement "you can't oversubscribe CPU at the tenant node level." The current docs do not say this anywhere. In fact, the FAQ on the same page acknowledges the opposite:
This leaves support without a citable rule when customers ask whether the platform prevents, allows, or recommends against summing tenant node cores beyond physical cores on the host.
Suggested Content
Audience: System administrators planning multi-tenant deployments, support engineers writing RCAs.
Prerequisites: Reader should already understand VM vCPU vs. physical core scheduling (already covered earlier on the page).
Key sections to add:
Aggregate tenant node CPU oversubscription — Whether 5 × 16-core tenant nodes can be placed on a 16-core physical node. What the platform allows vs. what it recommends. Whether there is a cluster-aggregate cap or only the per-machine cap (Max cores per machine).
How tenant CPU oversubscription differs from VM oversubscription — Tenant nodes are LXC containers running KVM workloads. Both layers participate in CFS scheduling. Cover whether oversubscribing at the tenant-node layer compounds with oversubscription inside the tenant (VMs the tenant admin creates).
The cgroup2
cpu.maxceiling — Each tenant container has a hard CPU bandwidth ceiling (lxc.cgroup2.cpu.max = cores × 1,000,000 1,000,000). Explain what this means in practice for the tenant admin: the tenant cannot exceed its allocation in aggregate, but vCPUs inside the tenant can still time-share among the tenant's VMs.Recommendation — State clearly whether VergeOS recommends against oversubscribing tenant node cores, and why (scheduling fairness across tenants, customer-perceived "stolen time" inside the tenant, predictability of multi-tenant SLAs, etc.).
Enforcement summary table — What is hard-prevented (e.g., per-tenant-node cap via Max cores per machine, RAM availability check) vs. what is permitted but discouraged (e.g., aggregate tenant node cores exceeding host physical cores).
Context
Requested via support interaction while writing a customer RCA. The customer is auditing post-incident provisioning decisions and wants to cite official VergeOS documentation. Without this section, support cannot point to anything authoritative on tenant CPU oversubscription.
Similar gaps exist on these pages and likely need cross-references once the canonical content lands here: