I have had this post in a draft for almost a month now. I had planned to include statistics around the amount of data that humans are generating (it is a lot) and how we as are causing some of own problems by having too much data at our fingertips.
What I realized is, a lengthy post about information overload is, well, somewhat oxymoronic. If you would like to learn about the theory, check it out. We are absolutely generating more data than could possibly be used. This came to the forefront as I investigated my metrics storage in my Grafana Mimir instance.
I got a lot of… data
Right now, I’m collecting over 300,000 series worth of data. That means, there are about 300,000 unique streams of data for which I have a data point roughly every 30 seconds. On average, it is taking up 35 GB worth of disk space per month.
How many of those do I care about? Well, as of this moment, about 7. I have some alerts to monitor when applications are degraded, when I’m dropping logs, when some system temperatures go to high, and when my Ring Doorbell battery is low.
Now, I continue to find alerts to write that are helpful, so I anticipate expanding beyond 7. However, there is almost no way that I am going to have alerts across 300,000 series: I simply do not care about some of this data. And yet, I am storing it, to the tune of about 35 GB worth of data every month.
What to do?
For my home lab, the answer is relatively easy: I do not care about data outside of 3 months, so I can setup retention rules and clean some of this up. But, in business, retention rules become a question around legal and contractual obligations.
In other words, in business, not only are we generating a ton of data, but we can be penalized for not having the data that we generated, or even, not generating the appropriate data, such as audit histories. It is very much a downward spiral: the more we generate, the more we must store, which leads to larger and larger data stores.
Where do we go from here?
We are overwhelming ourselves with data, and it is arguably causing problems across business, government, and general interpersonal culture. The problem is not getting any better, and there really is not a clear solution. All we can do is attempt to be smart data consumers. So before you take that random Facebook ad as fact, maybe do a little more digging to corroborate. In the age where anyone can be a journalist, everyone has to be a journalist.