My off-hours time this week has been consumed by writing some Python scripts to help monitor uptime for some of my sites.
Build or Buy?
At this point in my career, “build or buy” is a question I ask more often than not. As a software engineer, there is no shortage of open source and commercial solutions for almost any imaginable task. Web Site monitoring is no different. Tools such as StatusCake, Pingdom, and LogicMonitor offer hosted platforms, while tools like Nagios and PRTG offer on-premise installations, there is so much to choose from, it’s hard to decide.
I had a few simple requirements:
- Simple at first, but expandable as needed.
- Runs on my own network so that I can monitor sites that are not available outside of my firewall.
- Since most of my servers are virtual machines consolidated on the one lab server, well, it does not make much sense to put it on that server. I needed something I could run on easily with little to no power.
Build it is!
I own a few Raspberry Pis, but the Model 3B and 4B are currently in use. The lone unused Pi is an old Model B (i.e., model 1B), so installing something like Nagios would be, well, not usable when it was all said and done. Given that the Raspberry Pi is at home with Python, I thought I would dust off my “language learning skills” and figure out how to make something useful.
As I started, though, I remembered my free version of Atlassian’s Status Page. Although the free version is limited in the number of subscribers and no text subscriptions, for my usage, it’s perfect. And, near and dear to my developer heart, it has a very well defined set of APIs for managing statuses and incidents.
So, with Python and some additional modules, I created a project that lets me do a quick request on a desired website. If the website is down, the Status Page status for the associated component is changed and an incident is created. If/when it comes back up, any open incidents associated with that component are closed.
After a few evening hours tinkering, I have the scripts doing some basic work. For now, a
cron job executes the script every 5 minutes, and if a site goes down it is reported to my statuspage site.
Long term, I plan on adding support for more in-depth checks of my own projects, which utilized .Net’s HealthChecks namespace to report service health automatically. I may also look into setting up the scripts as a service running on the Pi.
If you are interested, the code is shared on Github.