I spent the better part of the weekend recovering from crashing my RKE clusters last Friday. This put me on a path towards researching new Kubernetes clusters and determining the best path forward for my home lab.
Let me be clear: This is a home lab, whose purpose is not to help be build bulletproof, corporate production-ready clusters. I also do not want to run Minikube on a box somewhere. So, when I approached my “research” (you will see later why I put that term in quotes), I wanted to make sure I did not get bogged down in the minutia of different Kubernetes installs or details. I stuck with Rancher Kubernetes Engine (RKE1) for a long time because it was quick to stand up, relatively stable, and easy to manage.
So, when I started looking for alternatives, my first research was into whether Rancher had any updated offerings. And, with that, I found RKE2.
RKE2, aka RKE Government
I already feel safer knowing that RKE2’s alter ego is
RKE Government. All joking aside, as I dug into RKE2, it seemed a good mix of RKE1, which I am used to, and K3s, a lightweight implementation of Kubernetes. The RKE2 documentation was, frankly, much more intuitive and easy to navigate than the RKE1 documentation. I am not sure if it is because the documentation is that much better or if because RKE2 is that much easier to configure.
I could spend pages upon pages explaining the experiments I ran over the last few evenings, but the proof is in the pudding, as they say. My provisioning-projects repository has a new Powershell script (
Create-Rke2Cluster.ps1) that outlines the steps needed to get a cluster configured. My work, then, came down to how I wanted to configure the cluster.
RKE1 Roles vs RKE2 Server/Agent
RKE1 had a notion of node roles which were divided into three categories:
- controlplane – Nodes with this role hosts the Kubernetes APIs
- etcd – Nodes with this role host the etcd storage containers. There should be an odd number, at least 3 is a good choice.
- worker – Nodes with this role can run workloads within the cluster.
My RKE1 clusters typically have the following setup:
- One node with
- Two nodes with
- If needed, additional nodes with just the
This seemed to work well: I had proper redundancy with
etcd and enough workers to host all of my workloads. Sure, I only had one control plane, so if that node went down, well, the cluster would be in trouble. However, I usually did not have much problem with keeping the nodes running so I left it as it stood.
With RKE2, there is simply a notion of
server node runs
etcd and the control plane components, while agents run only user defined workloads. So, when I started planning my RKE2 clusters, I figured I would run one
server and two
agents. The lack of
etcd redundancy would not have me losing sleep at night, but I really did not want to run 3 servers and then more agents for my workloads.
As I started down this road, I wondered how I would be able to cycle nodes. I asked the #rke2 channel on rancher-users.slack.com, and got an answer from Brad Davidson: I should always have at least 2 available servers, even when cycling. However, he did mention something that was not immediately apparent: the
server can and will run user-defined workloads unless the appropriate taints have been applied. So, in that sense, an RKE2 server acts similarly to my “all roles” node, where it functions as a control plane, etcd, and worker node.
Once I saw a path forward with RKE2, I really have not looked back. I have put considerable time into my provisioning projects scripts, as well as creating a new API wrapper for Windows DNS management (post to follow).
“But Matt, you haven’t considered Kuberenetes X or Y?”
I know. There are a number of flavors of Kubernetes that can run your bare metal servers. I spent a lot of time and energy in learning RKE1, and I have gotten very good at managing those clusters. RKE2 is familiar, with improvements in all the right places. I can see automating not only machine provisioning, but the entire process of node replacement. I would love nothing more than to come downstairs on a Monday morning and see newly provisioned cluster nodes humming away after my automated process ran.
So, yes, maybe I skipped a good portion of that “research” step, but I am ok with it. After all, it is my home lab: I am more interested in re-visiting Gravitee.io for API management and starting to put some real code services out in the world.