A Lesson in Occam’s Razor: Configuring Mimir Ruler with Grafana

Occam’s Razor posits “Of two competing theories, the simpler explanation is to be preferred.” I believe my high school biology teacher taught the “KISS” method (Keep It Simple, Stupid) to convey a similar principle. As I was trying to get alerts set up in Mimir using the Grafana UI, I came across an issue that was only finally solved by going back to the simplest answer.

The Original Problem

One of the reasons I leaned towards Mimir as a single source of metrics is that it has its own ruler component. This would allow me to store alerts in, well, two places: the Mimir ruler for metrics rules, and the Loki ruler for log rules. Sure, I could use Grafana alerts, but I like being able to run the alerts in the native system.

So, after getting Mimir running, I went to add a new alert in Grafana. Neither my Mimir nor Loki data source showed up.

Now, as it turns out, the Loki one was easy: I had not enabled the Loki ruler component in the Helm chart. So, a quick update of the Loki configuration in my GitOps repo, and Loki was running. However, my Mimir data source was still missing.

Asking for help

I looked for some time at possible solutions, and posted to the GitHub discussion looking for assistance. Dimitar Dimitrov was kind enough to step and and give me some pointers, including looking at the Chrome network tab to figure out if there were any errors.

At that point, I first wanted to bury my head in the sand, as, of all the things I looked at, that was not one of the things I had done. But, after getting over my embarrassment, I went about debugging. It looked like some errors around the Access-Control-Allow-Origin header not being available, although, there was also a 405 when attempting to make a preflight OPTIONS request.

Dimitar suggested that I add that Access-Control-Allow-Origin via the Mimir Helm chart’s nginx.nginxConfig section. So I did, and but still had the same problems.

I had previously tried this on Microsoft Edge and it had worked, and had been assuming it was just Chrome being overly strict. However, just for fun, I tried in a different Chrome profile, and it worked…

Applying Occam’s Razor

It was at this point that I thought “There is no way that something like this would not have come up with Grafana/Mimir on Chrome.” I would have expected a quick note around the Access-Control-Allow-Origin or something similar when hosting Grafana in a different subdomain than Mimir. So I took a longer look at the network log in my instance of Chrome that was still throwing errors.

There was a redirect in there that was listed as “cached.” I thought, well, that’s odd, why would it cache that redirect? So, as a test, I disabled the cache in the Chrome debugging tools, refreshed the page, and viola! Mimir showed as a data source for alerts.

Looking at the resulting success calls, I noted that ALL of the calls were proxied through grafana.mydomain.local, which made me think “Do I really even need the Access-Control-Allow-Origin headers? So, I removed those headers from my Mimir configuration, re-deployed, and tested with the caching disabled. It worked like a champ.

What happened?

The best answer I can come up with is that, at some point in my clicking around Grafana with Mimir as a data source, Chrome got a 301 redirect response from https://grafana.mydomain.local/api/datasources/proxy/14/api/v1/status/buildinfo, cached it, and used it in perpetuity. Disabled the response cache fixed everything, without the need to further configure Mimir’s Nginx proxy to return special CORS headers.

So, with many thanks to Dimitar for being my rubber duck, I am now able to start adding new rules for monitoring my cluster metrics.


Posted

in

,

by