Skip to content

feat: add CSIL schemas for Tailscale integration#22

Open
Soypete wants to merge 2 commits intocatalystcommunity:mainfrom
Soypete:feat/tailscale-schemas
Open

feat: add CSIL schemas for Tailscale integration#22
Soypete wants to merge 2 commits intocatalystcommunity:mainfrom
Soypete:feat/tailscale-schemas

Conversation

@Soypete
Copy link
Copy Markdown
Contributor

@Soypete Soypete commented Mar 27, 2026

Summary

  • Add tailscale.csil schema for Tailscale component configuration
  • Add allow_cgnat_vip field to k3s and network-simple configs
  • Add VIP validation support for RFC6598 (100.64.0.0/10) CGNAT range used by Tailscale
  • Fix bug: ParseConfig now properly parses allow_cgnat_vip field

@Soypete Soypete marked this pull request as draft March 27, 2026 18:04
@Soypete Soypete force-pushed the feat/tailscale-schemas branch 2 times, most recently from cfe9ac9 to cd8c85c Compare April 16, 2026 03:07
@Soypete Soypete marked this pull request as ready for review April 16, 2026 03:08
Copy link
Copy Markdown
Contributor

@todpunk todpunk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The review from Claude that I'm leaning on for this one is below. I think there's some overlap with PR 23 in changes, so that's probably an issue that needs to be looked at in the workflow here.

--- 21:56 [26/3559] Blocking Issues 1. use_tailscale is dead code with unclear future semantics UseTailscale *bool is added to ClusterConfig in the schema and generated types, but nothing in the codebase reads it. No install path, no validation, no conditional logic. This is a concern coupling problem — it embeds awareness of a specific overlay network technology into the core cluster config layer. The existing allow_cgnat_vip is technology-agnostic (it permits a broader IP range). use_tailscale is not. Questions for the author: - What is use_tailscale supposed to control? If it will eventually auto-install the Tailscale operator, that should be a component config, not a cluster-level flag. - If it's meant to alter VIP validation or kube-vip behavior, that couples cluster setup to a specific networking product. What happens when someone uses Nebula or ZeroTier with CGNAT IPs?

Recommendation: Remove use_tailscale from this PR until there's code that consumes it and a design decision about whether it belongs in ClusterConfig vs. component config.

  1. Tailscale component is schema-only — no Component interface implementation

The tailscale package has generated types and serialization tests but no: - types.go implementing the component.Component interface

  • install.go with install/upgrade logic
  • Registration in registry/init.go
  • Entry in any k8sComponents map

Other components (openbao, velero, grafana, etc.) follow a clear pattern: CSIL schema → generated types → Component struct with Name(), Install(), Status(), Dependencies(), Uninstall() → registration. This PR stops at step 2. A user seeing csil/v1/components/tailscale.csil might reasonably expect foundry component
install tailscale to work.

Recommendation: Either add a stub Component implementation that returns a clear "not yet implemented" error, or hold the schema until the component is ready. At minimum, document the intent (is this schema-first design, or incomplete work?).

  1. The etcd_args CSIL addition is fine but should be called out

The diff shows etcd_args added to k3s.csil, but types.gen.go already has EtcdArgs on main — meaning someone previously added the field to the generated code without updating the schema. This PR brings the schema back into sync, which is correct. But the PR description doesn't mention this, which makes it look like a
new feature rather than a schema sync fix. Minor, but worth noting in the commit message.

--- Non-Blocking Issues

  1. Error message change introduces product-specific language
    In vip.go, the hint changes from:
    "hint: set allow_cgnat_vip: true to use CGNAT IPs in the 100.64.0.0/10 range"
    to:
    "hint: set allow_cgnat_vip: true to use Tailscale/CGNAT IPs"

The original message was technically accurate and product-neutral. The new message assumes CGNAT = Tailscale. This is fine today but if someone uses a different overlay (Nebula, ZeroTier, Headscale), the mention of "Tailscale" is misleading. The RFC range was actually more helpful — it told users which IPs qualify.

Recommendation: Keep the RFC range reference, optionally add "e.g. Tailscale" in parentheses:
"hint: set allow_cgnat_vip: true to use CGNAT IPs in the 100.64.0.0/10 range (e.g. Tailscale)"

  1. DetermineVIPConfig behavior change is subtle but safe

The old code always set AllowCGNATVIP to a pointer (even when false). The new code only sets it when true, leaving it nil otherwise. Since all consumers use cfg.AllowCGNATVIP != nil && *cfg.AllowCGNATVIP, the behavior is identical. This is actually more consistent with the "nil means not specified" pointer pattern.
Fine.

  1. Whitespace-only changes in vip.go and k3sComponents map

The CIDR alignment change in isPrivateIP() (removing trailing spaces) and the k3sComponents map realignment are cosmetic. Same feedback as PR 23 — minimize whitespace churn on lines you're not otherwise changing. Not a big deal here since the lines are closely related to the functional changes.

  1. Doc changes introduce "Option B" (VIP = control plane IP) without guardrails

The updated tailscale-integration.md now suggests using the control plane's Tailscale IP directly as the VIP for single control plane setups. But the existing validateK8sVIPUniqueness() in config/types.go explicitly validates that the VIP is NOT any host's IP. So this documented "Option B" would actually fail
validation.

Either:

  • The validation needs to be updated to allow VIP = control plane IP when use_tailscale is true (but use_tailscale isn't wired up), or
  • The docs are documenting a workflow that doesn't work
  1. Tailscale tests only cover YAML serialization

types_test.go tests marshal/unmarshal and pointer semantics of the generated struct. This is fine for verifying codegen output, but the tests don't exercise any business logic because there isn't any. Once component logic exists, real tests will be needed. --- Impact on Non-Tailscale Installs Low risk overall. The changes that touch the install path are: - allow_cgnat_vip parsing bug fix in k3s ParseConfig — this is additive. Before this fix, the field was silently ignored in ParseConfig even though the CSIL schema defined it. Non-Tailscale installs don't set this field, so they won't be affected. - DetermineVIPConfig change — nil vs &false, functionally identical for all consumers. - UseTailscale in ClusterConfig — nothing reads it, so it's inert. Will deserialize from YAML if present, otherwise nil. No impact. The only concern is the doc update suggesting "Option B" (VIP = host IP), which contradicts existing validation logic. If someone follows the docs, they'll hit a validation error. ---
Summary

┌───────────────────────────────────────────┬──────────┬────────────────────────────────┐
│ Item │ Severity │ Status │
├───────────────────────────────────────────┼──────────┼────────────────────────────────┤
│ use_tailscale dead field in ClusterConfig │ Blocking │ Should remove or justify │
├───────────────────────────────────────────┼──────────┼────────────────────────────────┤
│ No Component implementation for tailscale │ Blocking │ Stub or defer the schema │
├───────────────────────────────────────────┼──────────┼────────────────────────────────┤ │ etcd_args schema sync │ Fine │ Note in commit message │
├───────────────────────────────────────────┼──────────┼────────────────────────────────┤ │ Error message product-specificity │ Minor │ Keep RFC range │
├───────────────────────────────────────────┼──────────┼────────────────────────────────┤ │ DetermineVIPConfig change │ Fine │ Safe │ ├───────────────────────────────────────────┼──────────┼────────────────────────────────┤ │ Whitespace churn │ Minor │ Minimize │ ├───────────────────────────────────────────┼──────────┼────────────────────────────────┤ │ "Option B" docs vs validation conflict │ Medium │ Docs promise what code rejects │ ├───────────────────────────────────────────┼──────────┼────────────────────────────────┤ │ Test coverage │ Minor │ Adequate for schema-only PR │ └───────────────────────────────────────────┴──────────┴────────────────────────────────┘

- Add allow_cgnat_vip field to ClusterConfig for Tailscale CGNAT IP support (100.64.0.0/10)
- Add use_tailscale flag to ClusterConfig (reserved for future operator integration)
- Create tailscale.csil component schema with OAuth credentials, operator image, advertise routes
- Fix VIP validation to accept RFC6598 CGNAT range when allow_cgnat_vip is true
- Fix error message to include RFC range reference
- Update docs: remove VIP = host IP recommendation (contradicts validation)

Code Review Changes:
- Remove use_tailscale from schema (dead code with no implementation)
- Remove tailscale component (schema-only, no Component interface implementation)
- Fix docs: VIP must be different from any host IP
- Fix error message: use RFC range instead of product-specific language

Testing:
- Config validates successfully with CGNAT VIP
- Build passes
- Unit tests pass
@Soypete Soypete force-pushed the feat/tailscale-schemas branch from cd8c85c to 9edde40 Compare April 22, 2026 02:49
@Soypete
Copy link
Copy Markdown
Contributor Author

Soypete commented Apr 22, 2026

Testing performed

  1. Config validation - Validated the cluster's stack.yaml with CGNAT VIP (100.81.89.100) and allow_cgnat_vip: true:

    ✓ Configuration is valid
    Cluster: pedro-ops-cluster (soypetetech.local)
    Hosts: 3
    Components: 9
    
  2. Build - Code builds successfully

  3. Unit tests - All tests pass in config and k3s packages

Code review changes made

  • Removed use_tailscale field from ClusterConfig (dead code - no implementation)
  • Removed tailscale component (schema-only, no Component interface)
  • Fixed docs: removed "Option B" VIP = control plane IP (contradicts validateK8sVIPUniqueness())
  • Fixed error message to use RFC range (100.64.0.0/10) instead of product-specific language
  • Cleaned up user's config file to remove deprecated fields

@Soypete Soypete requested a review from todpunk April 22, 2026 03:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants