Automating Grafana Backups

After a few data loss events, I took the time to automate my Grafana backups.

A bit of instability

It has been almost a year since I moved to a MySQL backend for Grafana. In that year, I’ve gotten a corrupted MySQL database twice now, forcing me to restore from a backup. I’m not sure if it is due to my setup or bad luck, but twice in less than a year is too much.

In my previous post, I mentioned the Grafana backup utility as a way to preserve this data. My short-sightedness prevented me from automating those backups, however, so I suffered some data loss. After the most recent event, I revisited the backup tool.

Keep your friends close…

My first thought was to simply write a quick Azure DevOps pipeline to pull the tool down, run a backup, and copy it to my SAN. I would have also had to have included some scripting to clean up old backups.

As I read through the grafana-backup-tool documents, though, I came across examples of running the tool as a Job in Kubernetes via a CronJob. This presented a very unique opportunity: configure the backup job as part of the Helm chart.

What would that look like? Well, I do not install any external charts directly. They are configured as dependencies for charts of my own. Now, usually, that just means a simple values file that sets the properties on the dependency. In the case of Grafana, though, I’ve already used this functionality to add two dependent charts (Grafana and MySQL) to create one larger application.

This setup also allows me to add additional templates to the Helm chart to create my own resources. I added two new resources to this chart:

grafana-backup-cron – A definition for the cronjob, using the ysde/grafana-backup-tool image.
grafana-backup-secret-es – An ExternalSecret definition to pull secrets from Hashicorp Vault and create a Secret for the job.

Since this is all built as part of the Grafana application, the secrets for Grafana were already available. I went so far as to add a section in the values file for the backup. This allowed me to enable/disable the backup and update the image tag easily.

Where to store it?

The other enhancement I noticed in the backup tool was the ability to store files in S3 compatible storage. In fact, their example showed how to connect to a MinIO instance. As fate would have it, I have a MinIO instance running on my SAN already.

So I configured a new bucket in my MinIO instance, added a new access key, and configured those secrets in Vault. After committing those changes and synchronizing in ArgoCD, the new resources were there and ready.

Can I test it?

Yes I can. Google, once again, pointed me to a way to create a Job from a CronJob:

kubectl create job --from=cronjob/<cronjob-name> <job-name> -n <namespace-name>

I ran the above command to create a test job. And, viola, I have backup files in MinIO!

Cleaning up

Unfortunately, there doesn’t seem to be a retention setting in the backup tool. It looks like I’m going to have to write some code to clean up my Grafana backups bucket, especially since I have daily backups scheduled. Either that, or look at this issue and see if I can add it to the tool. Maybe I’ll brush off my Python skills…