feat(tonic-xds): add OutlierDetectionConfig types (gRFC A50)#2604
feat(tonic-xds): add OutlierDetectionConfig types (gRFC A50)#2604LYZJU2019 wants to merge 2 commits intohyperium:masterfrom
Conversation
Define the validated config types consumed by the outlier-detection algorithm: OutlierDetectionConfig with the global timing/percentage parameters, plus SuccessRateConfig and FailurePercentageConfig for the two ejection algorithms. This PR contains only the type definitions. Proto parsing from envoy.config.cluster.v3.OutlierDetection and the ClusterResource field land in a follow-up PR alongside the load-balancing-pipeline wiring, keeping the algorithm PR self-contained and easy to review. Refs: https://github.com/grpc/proposal/blob/master/A50-xds-outlier-detection.md
e58a837 to
9229eaa
Compare
| /// for ejection. | ||
| pub threshold: u32, | ||
| /// Probability (0-100) that a candidate is actually ejected. | ||
| pub enforcement_percentage: u32, |
There was a problem hiding this comment.
maybe also add validation logic? also u16 should suffice here? as max is 100? same with other config values
There was a problem hiding this comment.
^, better enforce invariants through type system for all Resouce types since we have the validation and NACK workflow
There was a problem hiding this comment.
Done. Added a Percentage(u8) newtype with a fallible new(value: u32) -> Result<Self, PercentageError> constructor (returns the proto wire type u32 so the parser doesn't have to cast at every site, then narrows to u8 once validated). The four percentage fields now hold Percentage instead of u32: max_ejection_percent, enforcing_success_rate, threshold, enforcing_failure_percentage. Once constructed, the algorithm never has to re-validate.
| pub stdev_factor: u32, | ||
| /// Probability (0-100) that a candidate is actually ejected. | ||
| pub enforcement_percentage: u32, | ||
| /// Minimum number of candidate endpoints required to run the algorithm. |
There was a problem hiding this comment.
what about enforcing_success_rate as per the spec?
| /// Failure-percentage ejection parameters. `None` if the algorithm is | ||
| /// disabled. | ||
| pub failure_percentage: Option<FailurePercentageConfig>, | ||
| } |
There was a problem hiding this comment.
Discussed offline with @ankurmittal. For now we only support P2C balancer so omit child_policy in this PR. We can add this field later if needed.
There was a problem hiding this comment.
we will still need child policy to wrap the balancer so that we are not picking failed endpoint again and again, but can be done in future PR
| //! | ||
| //! These are the validated config inputs consumed by the outlier-detection | ||
| //! algorithm. Parsing them from `envoy.config.cluster.v3.OutlierDetection` | ||
| //! and exposing them on `ClusterResource` lands in a follow-up PR alongside |
There was a problem hiding this comment.
For "progress tracking" comments like these better put them in TODO comments so they can be cleaned up later, or just leave them as PR comments instead.
- Trim module docstring; drop "lands in a follow-up PR" framing. - Note in the docstring why there is no `child_policy` yet (tonic-xds has only one balancer; the field will land alongside more balancers). - Rename `enforcement_percentage` → `enforcing_success_rate` / `enforcing_failure_percentage` to match the Envoy proto field names. - Introduce a local `Percentage(u8)` newtype with a fallible constructor and use it for `max_ejection_percent`, `enforcing_success_rate`, `threshold`, and `enforcing_failure_percentage` so the 0..=100 invariant is enforced through the type system. Add tests covering the constructor's range checks.
Summary
First of a 4-PR series implementing gRFC A50: xDS Outlier Detection in
tonic-xds. This PR is type definitions only — no behavior change.Defines three validated config types that the algorithm will consume:
OutlierDetectionConfig— sweep interval, base/max ejection time, max ejection percent, plus the two optional sub-configs.SuccessRateConfig—stdev_factor,enforcement_percentage,minimum_hosts,request_volume.FailurePercentageConfig—threshold,enforcement_percentage,minimum_hosts,request_volume.OutlierDetectionConfig::is_enabled()— true iff at least one ejection algorithm is configured.Sub-configs are
Option<_>so the absence of either algorithm is part of the type, not a sentinel value.Why split this out
The full A50 implementation pulls in (a) proto parsing, (b) the statistical algorithms + sweep engine, (c) per-RPC outcome interception, and (d) wiring into the load-balancing pipeline. Reviewing them in one PR is hard. This series:
Servicewrapper).ClusterResource, wire-up inXdsClusterDiscovery(preserving connections on ejection), end-to-end tests, and thelib.rsfeature-table update.Test plan
cargo test -p tonic-xds --lib outlier— 3 tests foris_enabled().cargo fmt -p tonic-xds --checkclean.cargo clippy -p tonic-xds --lib --all-features -- -D warningsclean.