Add service-pool / load-balancing layer for multiple identical service instances

## Summary

Add a **service-pool / load-balancing** layer so a user or org can register several interchangeable instances of the *same* logical service and have the NyxID proxy spread traffic across them, with health-aware failover. Today a `UserService` binds to exactly one endpoint (+ optional one node), so running N identical backends (e.g. several GPU/compute workers, several upstream API instances, mirrored regions) means manually picking one slug per call — there is no way to say "balance across these."

This is the generic counterpart to what PR #969 (`integrations/compute-pool-service`) deliberately keeps *out* of core. That PR's own framing:

> This is not a NyxID service-pool framework. Cross-service counting, quotas, metering, and load balancing should be handled by a future generic NyxID service-pool design rather than by a compute-specific core API. The compute service exposes `/v1/status` as a capacity signal that such a layer could use later.

So #969 ships the data-plane queue, and this issue tracks the NyxID-side control-plane feature it points at.

## Motivation / use case

- Operator stands up 3 instances of the same backend (same auth, same API shape) for capacity or redundancy.
- Agents / org members keep calling **one stable slug**; NyxID decides which instance serves each request.
- When one instance is unhealthy or at capacity, traffic shifts to the others automatically instead of erroring.

## Existing precedent to build on (not from scratch)

NyxID already does **node-level** failover, just not service-level load balancing:

- `services/node_routing_service.rs` → `resolve_node_route()` returns `NodeRoute { fallback_node_ids: Vec<String> }`, and the proxy already walks primary → fallback when a node is offline (test: `resolve_node_route_fails_over_from_offline_node_to_online_fallback`).
- `models/user_service.rs` binds a single `endpoint_id` + optional `node_id`.
- `services/proxy_service.rs` → `resolve_proxy_target_from_user_service()` resolves one target.

The ask is to **generalize that failover into capacity-aware balancing across multiple service instances**, not just node fallback within one service.

## Proposed scope (for architect review — not final)

1. **Pool model** — a `ServicePool` grouping N member `UserService`s (or N endpoints under one service) that share a stable slug, owned by the same person/org user (reuse `org_service::resolve_owner_access` for ACL, consistent with Node / UserService).
2. **Balancing strategies** — at minimum round-robin and least-in-flight; ideally weighted and **capacity-aware** using a pluggable health/status signal (a member can expose something like #969's `GET /v1/status` → `{queued, dispatched, active_workers}`).
3. **Health checks + failover** — periodic or proxy-time health probe; skip/deprioritize unhealthy members; generalize `fallback_node_ids` to fallback *members*.
4. **Proxy integration** — pool resolution slots into `proxy_service` target resolution; a slug can resolve to a pool, then to a concrete member per-request.
5. **Sticky routing (optional)** — affinity by session/`client_ref` for multi-turn/stateful backends.
6. **Observability** — per-member request counts / errors for audit + the metering story below.

## Explicitly out of scope here (track separately if wanted)

- Org/agent **quotas, usage counting, and metering** across pool members. #969 notes these belong to the same future layer; they're a distinct, larger workstream and shouldn't block basic balancing.

## Open questions for architecture

- Pool as a **new model** vs. extending `UserService` with `member_service_ids` / a `pool_id`?
- Health signal: **standardized contract** (a `/status`-style endpoint NyxID polls) vs. passive (infer health from proxy error rates)? #969's `/v1/status` is a candidate shape.
- Does balancing run **only over node-routed members**, direct-HTTP members, or both?
- Interaction with existing `fallback_node_ids` — does the node fallback collapse into the pool layer, or stay underneath it?

## References

- PR #969 — `integrations/compute-pool-service` (the data-plane service that defers to this)
- Closed prototype PR #967 (proved the worker-pull UX / safety boundary)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add service-pool / load-balancing layer for multiple identical service instances #974

Summary

Motivation / use case

Existing precedent to build on (not from scratch)

Proposed scope (for architect review — not final)

Explicitly out of scope here (track separately if wanted)

Open questions for architecture

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add service-pool / load-balancing layer for multiple identical service instances #974

Description

Summary

Motivation / use case

Existing precedent to build on (not from scratch)

Proposed scope (for architect review — not final)

Explicitly out of scope here (track separately if wanted)

Open questions for architecture

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions