NOTE: this blog post is still work in progress, manifests can be found here for now https://github.com/konstfish/shoal/tree/main/gitops/infra/observability/base/prometheus

Quick Overview

Built with the Prometheus Operator. Metrics are collected from every namespace by a Cluster Scope Prometheus Instance. Tenants query metrics through central multi-tenancy enabled query endpoint built with kube-rbac-proxy & prom-label-proxy.

flowchart LR
    A[Tenant] -->|1| grafana(Grafana)
    subgraph Tenant Namespace
    sm([ServiceMonitor]) --> svc(Services)
    qsa([Query Service Account]) <-.-> grafana
    end
    prom -.-> sm
    subgraph Monitoring Namespace
    subgraph Thanos Query Frontend
    grafana -->|2| krp(kube-rbac-proxy)
    krp -->|6| plp(prom-label-proxy)
    end
    plp -->|7| thanos
    subgraph Prometheus
    thanos(Thanos) --> prom(Prometheus)
    end
    end
    subgraph Kubernetes
    krp -->|3| sar{{SubjectAccessReview}}
    sar <-->|4| qsa
    sar -->|5| krp
    end

Architecture Decisions

There’s multiple ways to set up multi-tenant monitoring, most of which are covered in this great talk. This blog post covers the setup of a central Prometheus which scrapes every tenant on a single Kubernetes Cluster.

Super frustrated with GET/POST because of https://github.com/brancz/kube-rbac-proxy/blob/master/pkg/proxy/proxy.go#L48-L60, but only downside is smaller requests (look at Info on button in grafana)