Extend /eth/v1/node/peers with peer scoring and disconnect reasons#606
Extend /eth/v1/node/peers with peer scoring and disconnect reasons#606barnabasbusa wants to merge 8 commits into
Conversation
Introduces a new debug endpoint that returns the consensus client's current per-peer scoring snapshot. Each entry includes the client- native blended score, the per-subsystem score components the client chooses to expose, the most recent score-affecting action (if any), and the most recent disconnect (if any). The schema is intentionally permissive about what each client surfaces, because the visibility eth2 clients have into their own scoring varies widely: - `components` is a flexible map; `gossipsub` is optional because some clients (notably Nimbus) have no application-layer access to the underlying libp2p scores. - `last_action` and `last_disconnect` are both optional - gossipsub- driven disconnects in particular often bypass the client's reason- capture path. - `score_range` is required so consumers can normalize across clients whose native score ranges differ wildly: [-100, +100] for Lighthouse / Lodestar / Grandine, [-10, +20] for Teku, [0, 1000] for Nimbus, [-100, +1] for Prysm. `PeerScoreReason` and `PeerDisconnectReason` are controlled vocabularies that group the common cross-client causes; the original client-side string is preserved in `native_reason` so consumers can distinguish e.g. multiple `rpc_*` flavors that map to the same controlled code. Prior art motivating this proposal: - Lighthouse `GET /lighthouse/peers` - Lodestar `GET /eth/v1/lodestar/lodestar_peer_score_stats` - Teku `GET /teku/v1/nodes/peer_scores` - Prysm internal `ScoreInfo` proto + the WIP REST endpoint on `OffchainLabs/prysm:peer-scores-ui`
Matches the existing flat snake_case URL convention used by /eth/v1/node/peer_count and the file name peer_scores.yaml. Prior revision used /eth/v1/debug/node/peers/scores which mixed subdirectory-style with the rest of the API.
Drop the separate /eth/v1/debug/node/peer_scores endpoint and the
PeerScore/PeerScoreRange/PeerScoreComponents/PeerScoreAction/
PeerDisconnectAction object hierarchy in favour of four optional
fields on the existing Peer schema: agent_version, score,
disconnect_reason, downscore_reasons.
Trim PeerScoreReason from 51 to 15 controlled values and
PeerDisconnectReason from 20 to 8, keeping the cross-client realistic
union rather than the full taxonomy. Implementations that compute
finer-grained internal tags are expected to map them onto the closest
listed value.
Strictly additive change to /eth/v1/node/peers and /eth/v1/node/peers/
{peer_id}: all new fields are optional so existing consumers are
unaffected.
… state Per the proposed beacon-API spec (ethereum/beacon-APIs#606), `disconnect_reason` MUST only be populated when the peer's `state` is `disconnected` or `disconnecting`. Wrap the existing `last_disconnect()` lookup in both the single-peer and list handlers so the field is omitted (None) for connected/connecting peers.
… state Per the proposed beacon-API spec (ethereum/beacon-APIs#606), `disconnect_reason` MUST only be populated when the peer's `state` is `disconnected` or `disconnecting`. Compute the node-peer view first and only attach a mapped `disconnectReason` when the resolved state matches.
…ing state Per the proposed beacon-API spec (ethereum/beacon-APIs#606), `disconnect_reason` MUST only be populated when the peer's `state` is `disconnected` or `disconnecting`. Teku exposes only `connected`/`disconnected` via `Eth2Peer#isConnected()`, so suppress the field for connected peers.
Per the proposed beacon-API spec (ethereum/beacon-APIs#606), `disconnect_reason` MUST only be populated when the peer's `state` is `disconnected` or `disconnecting`. Only look up the last goodbye when the (lowercased) peer state matches one of those values.
Per the proposed beacon-API spec (ethereum/beacon-APIs#606), `disconnect_reason` MUST only be populated when the peer's `state` is `disconnected` or `disconnecting`. Gate the `peer.lastDisconnectReason` lookup in both the list and single-peer handlers on `peer.connectionState` so the field is omitted (Opt.none) for connected/connecting peers.
Per the proposed beacon-API spec (ethereum/beacon-APIs#606), `disconnect_reason` MUST only be populated when the peer's `state` is `disconnected` or `disconnecting`. Only map `last_disconnect()` when the resolved `PeerState` matches one of those variants.
| $ref: "./p2p.yaml#/PeerConnectionState" | ||
| direction: | ||
| $ref: "./p2p.yaml#/PeerConnectionDirection" | ||
| agent_version: |
There was a problem hiding this comment.
we might wanna do a v2 instead of adding new fields, but curious what others think
There was a problem hiding this comment.
im happy with new fields as long as they're not mandatory.
There was a problem hiding this comment.
yeah none of this should be mandatory.
if we ever do a v2 we should have these mandatory.
There was a problem hiding this comment.
for this specific api it's fine to add new optional fields, it's also more of a debug api, but mostly it's fine because we don't or rather can't support ssz here because of the format/return values, I just wanna make it clear that this is not a general pattern we wanna have, or encourage
|
I'm not sure how common it is, but we only appear to have 'large_penalty' and 'small_penalty' for downscore reasons, so an initial implementation from teku may not have that - are the other fields still useful if we can't easily provider downscore reason or disconnect reason @barnabasbusa ? |
| enum: | ||
| - rpc_invalid_request | ||
| - rpc_invalid_response | ||
| - rpc_rate_limited | ||
| - rpc_timeout | ||
| - rpc_io_error | ||
| - rpc_bad_blocks_by_range | ||
| - rpc_bad_blocks_by_root | ||
| - gossip_invalid_block | ||
| - gossip_invalid_attestation | ||
| - gossip_invalid_blob_sidecar | ||
| - gossip_invalid_data_column_sidecar | ||
| - sync_bad_batch | ||
| - status_unviable_fork | ||
| - behaviour_penalty | ||
| - unknown |
There was a problem hiding this comment.
any reason why these are enum instead of just example? it seems to me that we wanna keep this list open to extend and not restrict to these values, also not all clients might return all the examples from here
also, why is there no gossip_invalid_payload_envelope for example? and many others are missing, so if it's not an exhaustive list might as well limit it to just a few examples as guidance for implementers
There was a problem hiding this comment.
they wanted a limited list i think so its semi-standard across all clients...
There was a problem hiding this comment.
we can make it a limited list sure, but then someone has to take time and make it useful and relevant
there are 0 gloas related gossip errors besides the data column one, nothing related to bid, payload envelopes or ptc attestations
also why is there gossip_invalid_blob_sidecar? that doesn't make any sense, if we want a limited list, then it's needs to be well-defined and useful
| - rate_limited | ||
| - io_error | ||
| - client_shutdown | ||
| - unknown |
There was a problem hiding this comment.
same as above, consider using example
| Client-native peer score. OPTIONAL. The scale and meaning is | ||
| implementation-defined - consumers SHOULD treat it as a relative | ||
| signal within a single client, not directly comparable across | ||
| clients. Lower values indicate worse standing. Clients that do | ||
| not maintain a per-peer score MAY omit this field. |
There was a problem hiding this comment.
are we fine with having all these vibe coded descriptions? I definitely don't have time to polish this but if someone can it would be great
@barnabasbusa can you add some rationale to the PR body/description why we want/need this change for posterity this would be useful to have/persist here |
| downscore_reasons: | ||
| type: array | ||
| description: | | ||
| Reasons that the client has been down scored in their current session. OPTIONAL. |
There was a problem hiding this comment.
Is this supposed to be added per occurrence, or once if it occurred at least once?
E.g., if gossip_invalid_block happened twice, should we return
["gossip_invalid_block", "gossip_invalid_block"]or
["gossip_invalid_block"]I assume the latter, but it might be worth clarifying this in the description.
There was a problem hiding this comment.
i would expect the latter. The description was a mess so i reduced it substantially because it was ai word slop... can potentially describe it as a distinct set...

Summary
Extends the existing
/eth/v1/node/peersand/eth/v1/node/peers/{peer_id}responses with four optional fields exposing per-peer scoring and disconnect information:agent_version— the peer's libp2p identify agent string (e.g.Lighthouse/v8.1.3-...)score— the client-native peer scoredisconnect_reason— why the client last disconnected from the peer (controlled vocabulary)downscore_reasons— the distinct set of reasons the peer was downscored this session (controlled vocabulary)Two new string enums,
PeerScoreReasonandPeerDisconnectReason, define the controlled vocabularies.Motivation
Peer scoring is one of the least observable parts of a running consensus node, yet it drives most peer churn. Every client already computes rich scoring internally but exposes it only through divergent, non-standard endpoints:
GET /lighthouse/peersGET /eth/v1/lodestar/lodestar_peer_score_statsGET /teku/v1/nodes/peer_scoresScoreInfoproto (WIP REST endpoint)This fragmentation means operators, monitoring tools, and testing frameworks (e.g. Kurtosis / interop debugging) can't ask "why is this node dropping peers?" in a client-agnostic way. Standardizing these fields lets tooling diagnose connectivity, fork/network mismatches, and rate-limiting uniformly across the network.
Design notes
/eth/v1/node/peersare unaffected./eth/v1/debug/node/peer_scoresendpoint; feedback moved it onto the existingPeerobject to avoid a parallel peer-listing API.scoreis only meaningful relative to other peers on the same client.unknownfallback. The reason enums capture the realistic cross-client union rather than a full taxonomy; clients map finer-grained internal tags onto the closest listed value, and consumers are told to tolerate unknown values for forward compatibility.