Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changed
- OpenAPI schema is now sourced from the versioned `hyperfleet-api-spec` Go module instead of being downloaded from `hyperfleet-api` main branch
- Documented single-instance deployment limitation — running multiple replicas with overlapping resource selectors causes duplicate events. Added recommended deployment configuration and scaling guidance

### Deprecated

Expand Down
2 changes: 1 addition & 1 deletion charts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ helm install hyperfleet-sentinel oci://quay.io/redhat-services-prod/hyperfleet-t

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| replicaCount | int | `1` | Number of sentinel replicas |
| replicaCount | int | `1` | Number of sentinel replicas. Setting >1 duplicates events; scale via separate Helm releases with non-overlapping resourceSelector values instead. See docs/multi-instance-deployment.md. |
| image.registry | string | `"CHANGE_ME"` | Container image registry (no default — must be set) |
| image.repository | string | `"CHANGE_ME"` | Container image repository (no default — must be set) |
| image.pullPolicy | string | `"Always"` | Image pull policy |
Expand Down
4 changes: 3 additions & 1 deletion charts/values.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# Default values for sentinel.
# This is a YAML-formatted file.

# -- Number of sentinel replicas
# -- Number of sentinel replicas. Setting >1 duplicates events; scale via
# separate Helm releases with non-overlapping resourceSelector values instead.
# See docs/multi-instance-deployment.md.
replicaCount: 1

image:
Expand Down
46 changes: 44 additions & 2 deletions docs/multi-instance-deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

Sentinel supports horizontal scaling through multiple dimensions: by resource type (separate instances for clusters vs nodepools) and by label-based resource filtering within the same resource type. Deploy multiple Sentinel instances with different `resource_selector` values to distribute the workload.

> **Important**: There is no coordination between Sentinel instances. Operators must ensure all resources are covered by the combined selectors to avoid gaps.
> **Important**: There is no coordination between Sentinel instances. Operators must ensure selectors are **non-overlapping** (to avoid duplicate events) and that all resources are covered by the combined selectors (to avoid gaps). See [Known Limitations](#known-limitations) for details.

## Using Helm for Multi-Instance Deployment

Expand Down Expand Up @@ -117,6 +117,48 @@ Scale to multiple instances as your cluster count grows or when you need regiona

---

## Known Limitations

### No Built-In Deduplication or Leader Election

Sentinel has no inter-instance coordination. Running multiple replicas with the same or overlapping resource selector (`resource_selector` in Sentinel config YAML, `resourceSelector` in Helm values) produces proportionally more duplicate events on the broker. Each replica independently polls the API and publishes events for every matching resource, resulting in:

- Increased load on the API, PostgreSQL, broker, and adapters — without benefit
- Adapters processing the same cluster multiple times per poll cycle
- No deduplication at the Sentinel or broker layer

> **Important**: Do not increase `replicaCount` to scale Sentinel. Multiple replicas with the same selector will duplicate events. Scale by deploying separate Sentinel instances with **non-overlapping** `resource_selector` values instead.

This is an architectural decision documented in ADR-0004 (Sentinel as a Stateless Polling Reconciliation Loop). Sentinel is intentionally stateless with at-least-once delivery semantics — adapters are expected to be idempotent.

### Recommended Deployment Configuration

Deploy **one Sentinel instance per distinct resource partition** (label selector subset). Scale horizontally by adding instances with non-overlapping selectors:

```yaml
# Instance A: watches region=us-east
config:
resourceType: clusters
resourceSelector:
- label: region
value: us-east

# Instance B: watches region=us-west (no overlap with Instance A)
config:
resourceType: clusters
resourceSelector:
- label: region
value: us-west
```

Do **not** run multiple replicas of the same Sentinel configuration. If you need high availability for a single partition, rely on Kubernetes restart policies and `PodDisruptionBudget` (see below) rather than replica scaling.
Comment thread
kuudori marked this conversation as resolved.

### Future: Automated Partitioning

The current label-based partitioning model is a known MVP limitation. The architecture repo (sentinel.md, Technical Debt section) documents a planned remediation path: automated shard coverage validation or coordinated sharding with a registry. A future Epic will address both automatic partition assignment and gap detection (resources not matched by any Sentinel instance).

---

## PodDisruptionBudget

**What**: Ensures minimum Sentinel availability during cluster maintenance.
Expand Down Expand Up @@ -226,7 +268,7 @@ Memory: 64Mi + (5 × 32Mi) + 16Mi = 240Mi
└───────────────────┘
```

**Important**: This is **NOT leader election**. Multiple Sentinels can overlap resource selectors if needed. Operators must ensure appropriate coverage.
**Important**: This is **NOT leader election**. Multiple Sentinels with overlapping resource selectors will produce duplicate events. Ensure selectors are **non-overlapping** to avoid event duplication. See [Known Limitations](#known-limitations).

#### Resource Selector Strategies

Expand Down
5 changes: 3 additions & 2 deletions docs/sentinel-operator-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -227,8 +227,9 @@ graph TB

1. **No Coordination**: Sentinel instances operate independently with no coordination
2. **Coverage Responsibility**: Operators must ensure all resources are covered by selectors
3. **Overlap Allowed**: Multiple instances can watch the same resource (events will be duplicated)
4. **Gaps Dangerous**: Resources not matching any selector will never reconcile
3. **Gaps Dangerous**: Resources not matching any selector will never reconcile

> **Important**: Multiple instances watching the same resource **will** produce duplicate events. Always use non-overlapping `resource_selector` values across instances. Do not increase `replicaCount` to scale — deploy separate instances with distinct selectors instead. See the [Known Limitations](multi-instance-deployment.md#known-limitations) section for details.

**Broker Topic Isolation:**

Expand Down