Kubernetes Observability, Part 3 – Dashboards with Grafana

This post is part of a series on observability in Kubernetes clusters:

Part 1 – Collecting Logs with Loki
Part 2 – Collecting Metrics with Prometheus
Part 3 – Dashboards with Grafana (this post)
Part 4 – Using Linkerd for Service Observability
Part 5 – Using Mimir for long-term metric storage

What good is Loki’s log collection or Prometheus’ metrics scraping without a way to see it all? Both Loki and Prometheus are products born from Grafana Labs, which is building itself as an observability stack similar to Elastic. I have used both, and Grafana’s stack is much easier to get started with than Elastic. For my home lab, it is perfect: simple start and config, fairly easy monitoring, and no real administrative work to do.

Installing Grafana

The Helm chart provided for Grafana was very easy to use, and the “out-of-box” configuration was sufficient to get started. I configured a few more features to get the most out of my instance:

Using an existing secret for admin credentials: this secret is created by an ExternalSecret resource that pulls secrets from my local Hashicorp Vault instance.
Configuration for Azure AD users. Grafana’s documentation details additional actions that need done in your Azure AD instance. Note that the envFromSecret is the Kubernetes Secret that gets expanded to the environment, storing my Azure AD ClientID and Client Secret.
Added the grafana-piechart-panel plugin, as some of the dashboards I downloaded referenced that.
Enabled a Prometheus ServiceMonitor resource to scrape Grafana metrics.
Annotated the services and pods for Linkerd.

Adding Data Sources

With Grafana up and running, I added 5 data sources. Each of my clusters has their own Prometheus instance, and I added my Loki instance for logs. Eventually, Thanos will aggregate my Prometheus metrics into a single data source, but that is a topic for another day.

Exploring

The Grafana interface is fairly intuitive. The Explore section lets you poke around your data sources to review the data you are collecting. There are “builders” available to help you construct your PromQL (Prometheus Query Language) or LogQL (Log Query Language) queries. Querying Metrics automatically displays a line chart with your values, making it pretty easy to review your query results and prepare the query for inclusion in a dashboard.

When debugging my applications, I use the Explore section almost exclusively to review incoming logs. A live log view and search with context makes it very easy to find warnings and errors within the log entries and determine issues.

Building Dashboards

In my limited use of the ELK stack, one thing that always got me was the barrier of entry into Kibana. I have always found that having examples to tinker with is a much easier way for me to learn, and I could never find good examples of Kibana dashboards that I could make my own.

Grafana, however, has a pretty extensive list of community-built dashboards from which I could begin my learning. I started with some of the basics, like Cluster Monitoring for Kubernetes. And, well, even in that one, I ran into some issues. The current version in the community uses kubernetes_io_hostname as a label for the node name. It would seem that label has changed to node in kube-state-metrics, so I had to import that dashboard and make changes to the queries in order for the data to show.

I found a few other dashboards that illustrated what I could do with Grafana:

ArgoCD – If you add a ServiceMonitor to ArgoCD, you can collect a number of metrics around application status and synchronizations, and this dashboard gives a great view of how ArgoCD is running.
Unifi Poller Dashboards – Unifi Poller is an application that polls the Unifi controller to pull metrics and expose them to Prometheus in the OpenTelemetry standard. It includes dashboards for a number of different metrics.
NGINX Ingress – If you use NGINX for your Ingress controller, you can add a ServiceMonitor to scrape metrics on incoming network traffic.

With these examples in hand, I have started to build out my own dashboards for some of my internal applications.