Skip to content

rfcs: add RFC-18 link classification flex-algo#3288

Open
ben-malbeclabs wants to merge 2 commits intomainfrom
bc/link-classification-flex-algo
Open

rfcs: add RFC-18 link classification flex-algo#3288
ben-malbeclabs wants to merge 2 commits intomainfrom
bc/link-classification-flex-algo

Conversation

@ben-malbeclabs
Copy link
Contributor

@ben-malbeclabs ben-malbeclabs commented Mar 16, 2026

Summary

RFC-18 introduces a link classification model for DoubleZero using IS-IS
Flexible Algorithm (flex-algo). DZF assigns named color labels to links
onchain; the controller translates these into IS-IS TE admin-groups and
flex-algo topology definitions on Arista EOS devices. BGP color extended
communities steer VPN unicast traffic onto constrained topologies, while
multicast continues to use all links via IS-IS algo 0.

What this RFC specifies:

  • LinkColorInfo onchain account — defines a color with auto-assigned
    admin-group bit (from a new AdminGroupBits ResourceExtension),
    flex-algo number, EOS color value, and include/exclude constraint
  • link_colors: Vec<Pubkey> on the Link account — assigns one or more
    colors to a link; controller renders all assigned colors as a single
    overwrite traffic-engineering administrative-group command
  • include_topology_colors: Vec<Pubkey> on the Tenant account — assigns
    specific topology colors to a tenant; defaults to color 1
    (UNICAST-DEFAULT) if unset
  • Controller features.yaml — gates flex-algo topology config, link
    tagging, and BGP color community stamping independently for staged rollout
  • Full revert: enabled: false removes all flex-algo config from all devices
  • Migration command for existing Vpn4v loopbacks to allocate
    flex_algo_node_segment_idx; controller blocks enablement until complete

Introduces onchain link color model using IS-IS Flexible Algorithm
(RFC 9350) to separate VPN unicast and multicast forwarding topologies.
Defines LinkColorInfo PDA, link_color field on Link, FlexAlgo feature
flag, and controller changes for admin-group tagging, flex-algo
definitions, system-colored-tunnel-rib BGP resolution, and per-tunnel
color extended community stamping.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://datatracker.ietf.org/doc/html/rfc2119).

DoubleZero contributors operate links with different physical characteristics — low latency, high bandwidth, or both. Today all traffic uses the same IS-IS topology, so every service follows the same paths regardless of what those paths are optimized for. This RFC introduces a link classification model that allows DZF to assign named color labels to links onchain and use IS-IS Flexible Algorithm (flex-algo) to compute separate constraint-based forwarding topologies per color. Different traffic classes — VPN unicast and IP multicast — can then use different topologies.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about MTU?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you just mean that MTU could be used to defined a topology, rather than concerns about MTU size based on additional labels?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I meant mtu as one of the link characteristics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are multiple potential use-cases, I am not sure they are worth explicitly calling out in this RFC.

DZF creates a `LinkColorInfo` PDA per color. It stores the color's name and auto-assigned routing parameters. The program MUST auto-assign the next available admin-group bit (starting at 0) and the corresponding flex-algo number and EOS color value using the formula:

```
admin_group_bit = next available bit in 0–127
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to define a tracking mechanism for admin-group bits; a persistent bitmap or counter on GlobalState or something. Otherwise we'd have to scan all existing LinkColorInfo accounts at instruction time.


The program MUST validate `admin_group_bit <= 127` on `create` and MUST return an explicit error if all 128 slots are exhausted. This is a hard constraint: EOS supports bits 0–127 only, and `128 + 127 = 255` is the maximum representable value in `flex_algo_number: u8`.

Admin-group bits from deleted colors MUST NOT be reused. Color deletion is not supported in this RFC, so this constraint applies to any future deletion implementation: reusing a bit before all devices have had their config updated would cause those devices to apply the new color's constraints to interfaces still carrying the old bit's admin-group. At current scale (128 available slots), exhaustion is not a practical concern.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once delete removes the PDA (as in line#160), we cannot enforce the no-reuse requirement without a persistent record of previously allocated bits that would survive PDA deletion.

**Scope:**
- Delivers traffic-class-level segregation: multicast vs. VPN unicast at the network level
- All unicast tenants share a single constrained topology today — the architecture is forward-compatible with per-tenant path differentiation without rework
- Per-tenant steering (directing one tenant to a different constrained topology) requires adding a `topology_color` field to the `Tenant` account — deferred to a future RFC that builds on the link color model defined here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If in the future we want to allow a tenant to have their traffic avoid a link color, then topology_color should maybe be called include_topology_colors, and then in the future we could add exclude_topology_colors. Note the plural since we should make these vectors in case we want to allow multiple colors in the future.

#[derive(BorshSerialize, BorshDeserialize, Debug)]
pub struct LinkColorInfo {
pub name: String, // e.g. "unicast-default"
pub admin_group_bit: u8, // auto-assigned, 0–127
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto-assigned from global ResourceExtension "AdminGroupBits"

…ulti-color, cleanup

- Replace onchain feature flag with controller features.yaml config file
- Add LinkColorInfo account with AdminGroupBits ResourceExtension for
  persistent bit allocation; bits never reused after deletion
- Change link_color: Pubkey to link_colors: Vec<Pubkey> (cap 8)
- Add include_topology_colors: Vec<Pubkey> on Tenant for per-tenant
  color assignment; defaults to UNICAST-DEFAULT (color 1)
- Redesign interface admin-group cleanup: overwrite remaining colors
  on deletion rather than targeted named no command
- Add full revert: enabled: false removes all flex-algo config
- Pin UNICAST-DEFAULT as protocol invariant (bit 0, first color created)
- Add controller startup check blocking enabled: true if any Vpn4v
  loopback has unset flex_algo_node_segment_idx
- Clarify clear sweep atomicity and idempotency
- Address all PR review comments (nikw9944, vihu, elitegreg)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants