-
Notifications
You must be signed in to change notification settings - Fork 20
finding your way in COS applications is too complicated and unreliable #403
Description
I'm filing this here because it's a general issue with COS, not limited to a specific operator. Sorry fort the wall of text, but it represents quite well our lives as SREs with COS.
Once you have COS deployed, in production, monitoring workloads, you're going to 1. get alerts, and want to see details about them 2. want to inspect metrics (due to an alert or not).
- Alert details
The URL to the COS instance firing the alert isn't included in the alert details (tracked in canonical/mimir-operators#16).
So we'll want to access COS directly and click our way to alertmanager. We'll remember that we have to go to the juju environment and run the show-proxied-endpoints on two different services (actually we likely won't remember that for a good time, and curse the universe until we do).
juju run traefik-internal/leader show-proxied-endpoints, to get a link to alertmanager, mimir and/or loki (whose links are currently broken, this is tracked in broken links on catalogue (CLI edition) catalogue-k8s-operator#200)- run
juju run traefik-external/leader show-proxied-endpointsand use the link for Grafana
- Browse metrics and logs for an environment
Say I'm investigating something in a juju model (e.g. performance of a postgresql DB). There's no alert here, so solving canonical/mimir-operators#16 won't help. I can see that the app I'm interested in has grafana-agent subordinates, so I'll run:
juju controllersjuju find-offers <controller_name>:- Follow the same process as above to get to prometheus/loki/grafana
For a team managing tens of controllers, hundreds (thousands probably) models, and several COS instances across multiple clouds, this is a lot of trouble, and very irritating when paged on-duty.
There's no silver bullet for the fact that if you have several COS stacks, you'll need to record the URL to them somewhere, and a juju action is as much as the charms can do for us here.
However, one major improvement would be to have all links available in one page, which would be https://<juju config traefik external_hostname>, and that's it.
Given that the ship seems to have sailed and there are now 2 traefik applications (internal and external), I can't see why both of them couldn't both display the full list of links in their root path.
Thanks !