Kubernetes Observability, Part 5 – Using Mimir for long-term metric storage

This post is part of a series on observability in Kubernetes clusters:

Part 1 – Collecting Logs with Loki
Part 2 – Collecting Metrics with Prometheus
Part 3 – Dashboards with Grafana
Part 4 – Using Linkerd for Service Observability
Part 5 – Using Mimir for long-term metric storage (this post)

For anyone who actually reads through my ramblings, I am sure they are asking themselves this question: “Didn’t he say Thanos for long-term metric storage?” Yes, yes I did say Thanos. Based on some blog posts from VMWare that I had come across, my initial thought was to utilize Thanos as my long term metrics storage solution. Having worked with Loki first, though, I became familiar with the distributed deployment and storage configuration. So, when I found that Mimir has similar configuration and compatibility, I thought I’d give it a try.

Additionally, remember, this is for my home lab: I do not have a whole lot of time for configuration and management. With that in mind, using the “Grafana Stack” seemed an expedited solution.

Getting Started with Mimir

As with Loki, I started with the mimir-distributed Helm chart that Grafana provides. The charts are well documented, including a Getting Started guide on their website. The Helm chart includes a Minio dependency, but, as I already setup Minio, I disabled the included chart and configured a new bucket for Mimir.

As Mimir has all the APIs to stand in for Prometheus, the changes were pretty easy:

Get an instance of Mimir installed in my internal cluster
Configure my current Prometheus instances to remote-write to Mimir
Add Mimir as a data source in Grafana.
Change my Grafana dashboards to use Mimir, and modify those dashboards to filter based on the cluster label that is added.

As my GitOps repositories are public, have a look at my Mimir-based chart for details on my configuration and deployment.

Labels? What labels?

I am using the Bitnami kube-prometheus Helm chart. That particular chart allows you to define external labels using the prometheus.externalLabels value. In my case, I created a cluster label with unique values for each of my clusters. This allows me to create dashboards with a single data source which can be filtered based on each cluster using dashboard variables.

Well…. that was easy

All in all, it took me probably two hours to get Mimir running and collecting data from Prometheus. It took far less time than I anticipated, and opened up some new doors to reduce my observability footprint, such as:

I immediately reduced the data retention on my local Prometheus installations to three days. This reduced my total disk usage in persistent volumes by about 40 GB.
I started researching using the Grafana Agent as a replacement for a full Prometheus instance in each cluster. Generally, the agent should use less CPU and storage on each cluster.
Grafana Agent is also a replacement for Promtail, meaning I could remove both my kube-prometheus and promtail tools and replace them with an instance of the Grafana Agent. This greatly simplifies the configuration of observability within the cluster.

So, what’s the catch? Well, I’m tying myself to an opinionated stack. Sure, it’s based on open standards and has the support of the Prometheus community, but, it remains to be seen what features will become “pay to play” within the stack. Near the bottom of this blog post is some indication of what I am worried about: while the main features are licensed under AGPLv3, there are additional features that come with proprietary licensing. In my case, for my home lab, these features are of no consequence, but, when it comes to making decisions for long term Kubernetes Observability at work, I wonder what proprietary features we will require and how much it will cost us in the long run.