NOTE: this blog post is still work in progress, manifests can be found here for now https://github.com/konstfish/shoal/tree/main/gitops/infra/observability/base/prometheus
Quick Overview
Built with the Prometheus Operator. Metrics are collected from every namespace by a Cluster Scope Prometheus Instance. Tenants query metrics through central multi-tenancy enabled query endpoint built with kube-rbac-proxy & prom-label-proxy.
flowchart LR A[Tenant] -->|1| grafana(Grafana) subgraph Tenant Namespace sm([ServiceMonitor]) --> svc(Services) qsa([Query Service Account]) <-.-> grafana end prom -.-> sm subgraph Monitoring Namespace subgraph Thanos Query Frontend grafana -->|2| krp(kube-rbac-proxy) krp -->|6| plp(prom-label-proxy) end plp -->|7| thanos subgraph Prometheus thanos(Thanos) --> prom(Prometheus) end end subgraph Kubernetes krp -->|3| sar{{SubjectAccessReview}} sar <-->|4| qsa sar -->|5| krp end
Architecture Decisions
There’s multiple ways to set up multi-tenant monitoring, most of which are covered in this great talk. This blog post covers the setup of a central Prometheus which scrapes every tenant on a single Kubernetes Cluster.
Super frustrated with GET/POST because of https://github.com/brancz/kube-rbac-proxy/blob/master/pkg/proxy/proxy.go#L48-L60, but only downside is smaller requests (look at Info on button in grafana)