docs: FIP draft — testing & acceptance criteria for onboarding new validators#910
docs: FIP draft — testing & acceptance criteria for onboarding new validators#910manan19 wants to merge 2 commits into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
Adds a draft FIP/RFC document that defines a staged testing and acceptance bar for onboarding new Snapchain validators before appending them to validators.toml, with emphasis on cross-geo latency risk and client determinism risk (forks/reimplementations).
Changes:
- Introduces a 3-tier risk model (A/B/C) and maps validator failure modes to each tier.
- Defines layered acceptance gates (L0 determinism vectors → L1 unit/validation parity → L2 multi-node harness → L3 production-like testnet soak) plus geo/DC-specific tests.
- Proposes an operational rollout/rollback procedure and a copy-paste go/no-go checklist.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| ## Risks & open questions | ||
|
|
||
| - **Flaky validator-add test.** `test_validator_set_rotation` currently relies on the *remove* path |
There was a problem hiding this comment.
seems to be leaning on this pretty heavily
There was a problem hiding this comment.
Agreed — reframed in f52ee60. The open question no longer leans on CI flakiness; it now states the underlying point: the validator add path has thinner end-to-end coverage than remove, and strengthening it is a prerequisite.
Draft RFC defining the requirements and testing criteria for admitting a new validator to the Farcaster consensus set — including alternative client implementations (forks and independent reimplementations) and validators in new geographies/datacenters. Covers the alternative-client determinism contract (R1–R7), the verification layers that prove it (L0 conformance vectors → L3 production-like testnet), deployment requirements (latency budget, soak, reachability), operator requirements (incident-response collaboration, repo maintainership), a staged read-node → testnet → mainnet rollout with rollback, and a go/no-go acceptance checklist. Open testing gaps are tracked in #924. Intended for posting to farcasterxyz/protocol Discussions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
1804705 to
e890a9d
Compare
|
Posted to farcasterxyz/protocol Discussions (FIP Stage 1: Ideas): farcasterxyz/protocol#272 — links in that copy are absolute so they resolve from the protocol repo. This PR remains the canonical source/review thread. |
What
Adds a draft FIP / RFC at
docs/proposals/fip-validator-onboarding-testing.mddefining the testing and acceptance bar that should be met before a new validator is appended tovalidators.toml.This is a documentation-only change (no code). It's intended for review here, then posting to farcasterxyz/protocol Discussions.
Why
Two upcoming changes raise the risk of adding a validator:
There's currently no documented, staged acceptance procedure. Adding a validator is a one-line
effective_atedit, but with equal voting power and a fault budget of 1 on a 6-node shard, a faulty/slow/divergent validator can stall a shard.What the doc covers
tests/consensus_test.rs) → L3 production-like full-network testnet (real QUIC gossip from the target DC, prod timing, perf load).Review asks
A few claims I'd most like a consensus engineer to sanity-check:
encode_to_vec()bytes and headers are BLAKE3-hashed, a non-snapchain client must be byte-for-byte deterministic — hence the Tier C L0 vector suite. Is this framing correct?test_validator_set_rotationcurrently exercises the remove path because the add path is flaky in CI; the doc treats hardening the add path as a prerequisite. Agree?🤖 Generated with Claude Code