Tag: Mimir

  • A Lesson in Occam’s Razor: Configuring Mimir Ruler with Grafana

    Occam’s Razor posits “Of two competing theories, the simpler explanation is to be preferred.” I believe my high school biology teacher taught the “KISS” method (Keep It Simple, Stupid) to convey a similar principle. As I was trying to get alerts set up in Mimir using the Grafana UI, I came across an issue that was only finally solved by going back to the simplest answer.

    The Original Problem

    One of the reasons I leaned towards Mimir as a single source of metrics is that it has its own ruler component. This would allow me to store alerts in, well, two places: the Mimir ruler for metrics rules, and the Loki ruler for log rules. Sure, I could use Grafana alerts, but I like being able to run the alerts in the native system.

    So, after getting Mimir running, I went to add a new alert in Grafana. Neither my Mimir nor Loki data source showed up.

    Now, as it turns out, the Loki one was easy: I had not enabled the Loki ruler component in the Helm chart. So, a quick update of the Loki configuration in my GitOps repo, and Loki was running. However, my Mimir data source was still missing.

    Asking for help

    I looked for some time at possible solutions, and posted to the GitHub discussion looking for assistance. Dimitar Dimitrov was kind enough to step and and give me some pointers, including looking at the Chrome network tab to figure out if there were any errors.

    At that point, I first wanted to bury my head in the sand, as, of all the things I looked at, that was not one of the things I had done. But, after getting over my embarrassment, I went about debugging. It looked like some errors around the Access-Control-Allow-Origin header not being available, although, there was also a 405 when attempting to make a preflight OPTIONS request.

    Dimitar suggested that I add that Access-Control-Allow-Origin via the Mimir Helm chart’s nginx.nginxConfig section. So I did, and but still had the same problems.

    I had previously tried this on Microsoft Edge and it had worked, and had been assuming it was just Chrome being overly strict. However, just for fun, I tried in a different Chrome profile, and it worked…

    Applying Occam’s Razor

    It was at this point that I thought “There is no way that something like this would not have come up with Grafana/Mimir on Chrome.” I would have expected a quick note around the Access-Control-Allow-Origin or something similar when hosting Grafana in a different subdomain than Mimir. So I took a longer look at the network log in my instance of Chrome that was still throwing errors.

    There was a redirect in there that was listed as “cached.” I thought, well, that’s odd, why would it cache that redirect? So, as a test, I disabled the cache in the Chrome debugging tools, refreshed the page, and viola! Mimir showed as a data source for alerts.

    Looking at the resulting success calls, I noted that ALL of the calls were proxied through grafana.mydomain.local, which made me think “Do I really even need the Access-Control-Allow-Origin headers? So, I removed those headers from my Mimir configuration, re-deployed, and tested with the caching disabled. It worked like a champ.

    What happened?

    The best answer I can come up with is that, at some point in my clicking around Grafana with Mimir as a data source, Chrome got a 301 redirect response from https://grafana.mydomain.local/api/datasources/proxy/14/api/v1/status/buildinfo, cached it, and used it in perpetuity. Disabled the response cache fixed everything, without the need to further configure Mimir’s Nginx proxy to return special CORS headers.

    So, with many thanks to Dimitar for being my rubber duck, I am now able to start adding new rules for monitoring my cluster metrics.

  • Kubernetes Observability, Part 5 – Using Mimir for long-term metric storage

    This post is part of a series on observability in Kubernetes clusters:

    For anyone who actually reads through my ramblings, I am sure they are asking themselves this question: “Didn’t he say Thanos for long-term metric storage?” Yes, yes I did say Thanos. Based on some blog posts from VMWare that I had come across, my initial thought was to utilize Thanos as my long term metrics storage solution. Having worked with Loki first, though, I became familiar with the distributed deployment and storage configuration. So, when I found that Mimir has similar configuration and compatibility, I thought I’d give it a try.

    Additionally, remember, this is for my home lab: I do not have a whole lot of time for configuration and management. With that in mind, using the “Grafana Stack” seemed an expedited solution.

    Getting Started with Mimir

    As with Loki, I started with the mimir-distributed Helm chart that Grafana provides. The charts are well documented, including a Getting Started guide on their website. The Helm chart includes a Minio dependency, but, as I already setup Minio, I disabled the included chart and configured a new bucket for Mimir.

    As Mimir has all the APIs to stand in for Prometheus, the changes were pretty easy:

    1. Get an instance of Mimir installed in my internal cluster
    2. Configure my current Prometheus instances to remote-write to Mimir
    3. Add Mimir as a data source in Grafana.
    4. Change my Grafana dashboards to use Mimir, and modify those dashboards to filter based on the cluster label that is added.

    As my GitOps repositories are public, have a look at my Mimir-based chart for details on my configuration and deployment.

    Labels? What labels?

    I am using the Bitnami kube-prometheus Helm chart. That particular chart allows you to define external labels using the prometheus.externalLabels value. In my case, I created a cluster label with unique values for each of my clusters. This allows me to create dashboards with a single data source which can be filtered based on each cluster using dashboard variables.

    Well…. that was easy

    All in all, it took me probably two hours to get Mimir running and collecting data from Prometheus. It took far less time than I anticipated, and opened up some new doors to reduce my observability footprint, such as:

    1. I immediately reduced the data retention on my local Prometheus installations to three days. This reduced my total disk usage in persistent volumes by about 40 GB.
    2. I started researching using the Grafana Agent as a replacement for a full Prometheus instance in each cluster. Generally, the agent should use less CPU and storage on each cluster.
    3. Grafana Agent is also a replacement for Promtail, meaning I could remove both my kube-prometheus and promtail tools and replace them with an instance of the Grafana Agent. This greatly simplifies the configuration of observability within the cluster.

    So, what’s the catch? Well, I’m tying myself to an opinionated stack. Sure, it’s based on open standards and has the support of the Prometheus community, but, it remains to be seen what features will become “pay to play” within the stack. Near the bottom of this blog post is some indication of what I am worried about: while the main features are licensed under AGPLv3, there are additional features that come with proprietary licensing. In my case, for my home lab, these features are of no consequence, but, when it comes to making decisions for long term Kubernetes Observability at work, I wonder what proprietary features we will require and how much it will cost us in the long run.