Maturing my Grafana setup

I may have lost some dashboards and configuration recently, and it got me thinking about how to mature my Grafana setup for better persistence.

Initial Setup

When I first got Grafana running, it was based on the packaged Grafana Helm chart. As such, my Grafana instance was using SQLite database file stored in the persistent volume. This limits me to a single Grafana pod, since the volume is not setup to shared across pods. Additionally, that SQL database file is to the lifecycle of the claim associated with the volume.

And, well, at home, this is not a huge deal because of how the lab is setup for persistent volume claims. Since I use the nfs-subdir-external-provisioner, PVCs in my clusters automatically generate a subfolder in my NFS share. When the PVC is deleted, the subdir gets renamed with an archive- prefix, so I can usually dig through the folder to find the old database file.

However, using the default Azure persistence, Azure Disks are provisioned. When a PVC gets deleted, so to does the disk, or, well, I think it does. I have not had the opportunity to dig in to the Azure Disk PVC provisioning to understand how that data is handled when PVCs go away. It is sufficient to say that I lost our Grafana settings because of this.

The New Setup

The new plan is to utilize MySQL to store my Grafana dashboards and data stores. The configuration seems simple enough: add the appropriate entries in the grafana.ini file. I already know how to expand secrets, so getting the database secrets into the configuration was easy using the grafana.ini section of the Helm chart.

For my home setup, I felt it was ok to run MySQL as another dependent chart for Grafana. Now, from the outside, you should be saying “But Matt, that only moves your persistence issues from the Grafana chart to the MySQL chart!” That is absolutely true. But, well, I have a pretty solid backup plan for those NFS shares, so for a home lab that should be fine. Plus I figured out how to backup and restore Grafana (see below).

The real reason is that, for the instance I am running in Azure at work, I want to provision an Azure MySQL instance. This will allow me to have much better backup retention that inside the cluster, but the configuration at work will match the configuration at home. Home lab in action!

Want to check out my home lab configuration? Check out my ops-internal infrastructure repository.

Backup and Restore for Grafana

As part of this move, I did not want to lose the settings I had in Grafana. This mean finding a backup/restore procedure that worked. An internet search lead me to the Grafana Backup Tool. The tool provides backup and restore capabilities through Grafana’s APIs.

That said, it is written in Python, so my recent foray into Python coding served me well to get this tool up and running. Once I generated an API Key, I was off and running.

There really isn’t much to it: after configuring the URL and API Token, I ran a backup to get a .tar.gz file with my Grafana contents. Did I test the backup? No. It’s the home lab, worst that could happen is I have to re-import some dashboards and re-create some others.

After that, I updated my Grafana instance to include the MySQL instance and updated Grafana’s configuration to use the new MySQL service. As expected, all my dashboards and data sources disappeared.

I ran the restore function using my backup, refreshed Grafana in my browser, and I was back up and running! Testing, schmesting….

What’s Next?

I am going to take my newfound learnings and apply them at work:

Get a new MySQL instance provisioned.
Backup Grafana.
Re-configure Grafana to use the new MySQL instance.
Restore Grafana.

Given the ease with which the home lab went, I cannot imagine I will run into much issue.