Skip to content

docs: FIP draft — testing & acceptance criteria for onboarding new validators#910

Open
manan19 wants to merge 2 commits into
mainfrom
docs/validator-onboarding-testing-fip
Open

docs: FIP draft — testing & acceptance criteria for onboarding new validators#910
manan19 wants to merge 2 commits into
mainfrom
docs/validator-onboarding-testing-fip

Conversation

@manan19

@manan19 manan19 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

What

Adds a draft FIP / RFC at docs/proposals/fip-validator-onboarding-testing.md defining the testing and acceptance bar that should be met before a new validator is appended to validators.toml.

This is a documentation-only change (no code). It's intended for review here, then posting to farcasterxyz/protocol Discussions.

Why

Two upcoming changes raise the risk of adding a validator:

  1. Geo/datacenter diversity (primary) — a validator further from the existing (largely us-east) cluster stresses the latency-sensitive BFT timing.
  2. Node (consensus-client) diversity (secondary) — interest in a validator built from a different codebase (reimplementation or fork), which can break consensus determinism.

There's currently no documented, staged acceptance procedure. Adding a validator is a one-line effective_at edit, but with equal voting power and a fault budget of 1 on a 6-node shard, a faulty/slow/divergent validator can stall a shard.

What the doc covers

  • Three cumulative risk tiers — A (stock binary, new geo) → B (fork) → C (independent reimplementation).
  • Failure modes — consensus/determinism, networking/liveness, operational/security, each tagged by tier.
  • Layered test taxonomy with gates — L0 determinism vectors → L1 unit/validation parity → L2 multi-node (tests/consensus_test.rs) → L3 production-like full-network testnet (real QUIC gossip from the target DC, prod timing, perf load).
  • Geo-specific tests — RTT budget, multi-day soak from the real DC, partition drills, NTP.
  • Staged rollout — read-node → testnet → mainnet one shard at a time, with a pre-staged rollback entry.
  • Copy-pasteable go/no-go acceptance checklist.

Review asks

A few claims I'd most like a consensus engineer to sanity-check:

  • Because signatures are computed over encode_to_vec() bytes and headers are BLAKE3-hashed, a non-snapchain client must be byte-for-byte deterministic — hence the Tier C L0 vector suite. Is this framing correct?
  • test_validator_set_rotation currently exercises the remove path because the add path is flaky in CI; the doc treats hardening the add path as a prerequisite. Agree?
  • Should a Byzantine/equivocation fault-injection harness be required for Tier C?

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings June 9, 2026 20:22
@vercel

vercel Bot commented Jun 9, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
snapchain-docs Ready Ready Preview, Comment Jun 15, 2026 3:53pm

Request Review

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a draft FIP/RFC document that defines a staged testing and acceptance bar for onboarding new Snapchain validators before appending them to validators.toml, with emphasis on cross-geo latency risk and client determinism risk (forks/reimplementations).

Changes:

  • Introduces a 3-tier risk model (A/B/C) and maps validator failure modes to each tier.
  • Defines layered acceptance gates (L0 determinism vectors → L1 unit/validation parity → L2 multi-node harness → L3 production-like testnet soak) plus geo/DC-specific tests.
  • Proposes an operational rollout/rollback procedure and a copy-paste go/no-go checklist.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/proposals/fip-validator-onboarding-testing.md Outdated
Comment thread docs/proposals/fip-validator-onboarding-testing.md Outdated
Comment thread docs/proposals/fip-validator-onboarding-testing.md Outdated
Comment thread docs/proposals/fip-validator-onboarding-testing.md Outdated
Comment thread docs/proposals/fip-validator-onboarding-testing.md Outdated

## Risks & open questions

- **Flaky validator-add test.** `test_validator_set_rotation` currently relies on the *remove* path

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to be leaning on this pretty heavily

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — reframed in f52ee60. The open question no longer leans on CI flakiness; it now states the underlying point: the validator add path has thinner end-to-end coverage than remove, and strengthening it is a prerequisite.

Comment thread docs/proposals/fip-validator-onboarding-testing.md Outdated
Comment thread docs/proposals/fip-validator-onboarding-testing.md Outdated
Comment thread docs/proposals/fip-validator-onboarding-testing.md Outdated
Comment thread docs/proposals/fip-validator-onboarding-testing.md Outdated
topocount
topocount previously approved these changes Jun 10, 2026
@manan19

manan19 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

Filed the test-coverage gaps this FIP surfaces as tracking issues: #924 (umbrella) covering #917#923. Highest-leverage prerequisites for alt-client onboarding are #917 (golden determinism vectors) and #919 (validator-add path).

Comment thread docs/proposals/fip-validator-onboarding-testing.md
Draft RFC defining the requirements and testing criteria for admitting a
new validator to the Farcaster consensus set — including alternative client
implementations (forks and independent reimplementations) and validators in
new geographies/datacenters.

Covers the alternative-client determinism contract (R1–R7), the verification
layers that prove it (L0 conformance vectors → L3 production-like testnet),
deployment requirements (latency budget, soak, reachability), operator
requirements (incident-response collaboration, repo maintainership), a staged
read-node → testnet → mainnet rollout with rollback, and a go/no-go
acceptance checklist. Open testing gaps are tracked in #924.

Intended for posting to farcasterxyz/protocol Discussions.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@manan19 manan19 force-pushed the docs/validator-onboarding-testing-fip branch from 1804705 to e890a9d Compare June 15, 2026 15:35
@manan19

manan19 commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Posted to farcasterxyz/protocol Discussions (FIP Stage 1: Ideas): farcasterxyz/protocol#272 — links in that copy are absolute so they resolve from the protocol repo. This PR remains the canonical source/review thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants