diff --git a/CHANGELOG.md b/CHANGELOG.md index 5fc5a75..ec3a120 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -11,6 +11,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Changed - OpenAPI schema is now sourced from the versioned `hyperfleet-api-spec` Go module instead of being downloaded from `hyperfleet-api` main branch +- Documented single-instance deployment limitation — running multiple replicas with overlapping resource selectors causes duplicate events. Added recommended deployment configuration and scaling guidance ### Deprecated diff --git a/charts/README.md b/charts/README.md index 5a728dd..30ab868 100644 --- a/charts/README.md +++ b/charts/README.md @@ -26,7 +26,7 @@ helm install hyperfleet-sentinel oci://quay.io/redhat-services-prod/hyperfleet-t | Key | Type | Default | Description | |-----|------|---------|-------------| -| replicaCount | int | `1` | Number of sentinel replicas | +| replicaCount | int | `1` | Number of sentinel replicas. Setting >1 duplicates events; scale via separate Helm releases with non-overlapping resourceSelector values instead. See docs/multi-instance-deployment.md. | | image.registry | string | `"CHANGE_ME"` | Container image registry (no default — must be set) | | image.repository | string | `"CHANGE_ME"` | Container image repository (no default — must be set) | | image.pullPolicy | string | `"Always"` | Image pull policy | diff --git a/charts/values.yaml b/charts/values.yaml index 85f4a41..813c904 100644 --- a/charts/values.yaml +++ b/charts/values.yaml @@ -1,7 +1,9 @@ # Default values for sentinel. # This is a YAML-formatted file. -# -- Number of sentinel replicas +# -- Number of sentinel replicas. Setting >1 duplicates events; scale via +# separate Helm releases with non-overlapping resourceSelector values instead. +# See docs/multi-instance-deployment.md. replicaCount: 1 image: diff --git a/docs/multi-instance-deployment.md b/docs/multi-instance-deployment.md index 9568e9d..f864608 100644 --- a/docs/multi-instance-deployment.md +++ b/docs/multi-instance-deployment.md @@ -4,7 +4,7 @@ Sentinel supports horizontal scaling through multiple dimensions: by resource type (separate instances for clusters vs nodepools) and by label-based resource filtering within the same resource type. Deploy multiple Sentinel instances with different `resource_selector` values to distribute the workload. -> **Important**: There is no coordination between Sentinel instances. Operators must ensure all resources are covered by the combined selectors to avoid gaps. +> **Important**: There is no coordination between Sentinel instances. Operators must ensure selectors are **non-overlapping** (to avoid duplicate events) and that all resources are covered by the combined selectors (to avoid gaps). See [Known Limitations](#known-limitations) for details. ## Using Helm for Multi-Instance Deployment @@ -117,6 +117,48 @@ Scale to multiple instances as your cluster count grows or when you need regiona --- +## Known Limitations + +### No Built-In Deduplication or Leader Election + +Sentinel has no inter-instance coordination. Running multiple replicas with the same or overlapping resource selector (`resource_selector` in Sentinel config YAML, `resourceSelector` in Helm values) produces proportionally more duplicate events on the broker. Each replica independently polls the API and publishes events for every matching resource, resulting in: + +- Increased load on the API, PostgreSQL, broker, and adapters — without benefit +- Adapters processing the same cluster multiple times per poll cycle +- No deduplication at the Sentinel or broker layer + +> **Important**: Do not increase `replicaCount` to scale Sentinel. Multiple replicas with the same selector will duplicate events. Scale by deploying separate Sentinel instances with **non-overlapping** `resource_selector` values instead. + +This is an architectural decision documented in ADR-0004 (Sentinel as a Stateless Polling Reconciliation Loop). Sentinel is intentionally stateless with at-least-once delivery semantics — adapters are expected to be idempotent. + +### Recommended Deployment Configuration + +Deploy **one Sentinel instance per distinct resource partition** (label selector subset). Scale horizontally by adding instances with non-overlapping selectors: + +```yaml +# Instance A: watches region=us-east +config: + resourceType: clusters + resourceSelector: + - label: region + value: us-east + +# Instance B: watches region=us-west (no overlap with Instance A) +config: + resourceType: clusters + resourceSelector: + - label: region + value: us-west +``` + +Do **not** run multiple replicas of the same Sentinel configuration. If you need high availability for a single partition, rely on Kubernetes restart policies and `PodDisruptionBudget` (see below) rather than replica scaling. + +### Future: Automated Partitioning + +The current label-based partitioning model is a known MVP limitation. The architecture repo (sentinel.md, Technical Debt section) documents a planned remediation path: automated shard coverage validation or coordinated sharding with a registry. A future Epic will address both automatic partition assignment and gap detection (resources not matched by any Sentinel instance). + +--- + ## PodDisruptionBudget **What**: Ensures minimum Sentinel availability during cluster maintenance. @@ -226,7 +268,7 @@ Memory: 64Mi + (5 × 32Mi) + 16Mi = 240Mi └───────────────────┘ ``` -**Important**: This is **NOT leader election**. Multiple Sentinels can overlap resource selectors if needed. Operators must ensure appropriate coverage. +**Important**: This is **NOT leader election**. Multiple Sentinels with overlapping resource selectors will produce duplicate events. Ensure selectors are **non-overlapping** to avoid event duplication. See [Known Limitations](#known-limitations). #### Resource Selector Strategies diff --git a/docs/sentinel-operator-guide.md b/docs/sentinel-operator-guide.md index b30c6fa..5322fcd 100644 --- a/docs/sentinel-operator-guide.md +++ b/docs/sentinel-operator-guide.md @@ -227,8 +227,9 @@ graph TB 1. **No Coordination**: Sentinel instances operate independently with no coordination 2. **Coverage Responsibility**: Operators must ensure all resources are covered by selectors -3. **Overlap Allowed**: Multiple instances can watch the same resource (events will be duplicated) -4. **Gaps Dangerous**: Resources not matching any selector will never reconcile +3. **Gaps Dangerous**: Resources not matching any selector will never reconcile + +> **Important**: Multiple instances watching the same resource **will** produce duplicate events. Always use non-overlapping `resource_selector` values across instances. Do not increase `replicaCount` to scale — deploy separate instances with distinct selectors instead. See the [Known Limitations](multi-instance-deployment.md#known-limitations) section for details. **Broker Topic Isolation:**