Tag: RKE2

  • Tech Tip – Interacting with ETCD in Rancher Kubernetes Engine 2

    Since cycling my cluster nodes is a “fire script and wait” operation, I kicked one off today. I ended up running into an issue that required me to dig a bit into ETCD in RKE2, and could not find direct help, so this is as much my own reference as it is a guide for others.

    I broke it…

    When provisioning new machines, I still have some odd behaviors when it comes to IP address assignment. I do not set the IP address manually: I use a static MAC address on the VM and then create a fixed IP for that MAC address. About 90% of the time, that works great. Every so often, though, in the provisioning process, the VM picks up an IP address from the DHCP instead of the fixed IP, and that wrecks stuff, especially around ETCD.

    This happened today: In standing up a replacement, the new machine picked up a DHCP IP. Unfortunately, I didn’t remove the machine properly, which caused my ETCD cluster to still see the node as a member. When I deleted the node and tried to re-provision, I got ETCD errors because I was trying to add a node name that already exists.

    Getting in to ETCD

    RKE2’s docs are a little quiet on actually viewing what’s in ETCD. Through some googling, I figured out that I could use etcdctl to show and manipulate members, but I couldn’t figure out how to actually run the command.

    As it turns out, the easiest way to run it is to run it on one of the ETCD pods itself. I came across this bug report in RKE2 that indirectly showed me how to run etcdctl commands from my machine through the ETCD pods. The member list command is

    kubectl -n kube-system exec <etcd_pod_name> -- sh -c "ETCDCTL_ENDPOINTS='https://127.0.0.1:2379' ETCDCTL_CACERT='/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/rke2/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/rke2/server/tls/etcd/server-client.key' ETCDCTL_API=3 etcdctl member list"

    Note all the credential setting via environment variables. In theory, I could “jump in” to the etcd pod using a simple sh command and run a session, but keeping it like this forces me to be judicious in my execution of etcdctl commands.

    I found the offending entry and removed it from the list, and was able to run my cycle script again and complete my updates.

  • Tech Tip – Configuring RKE2 Nginx Ingress using a HelmChartConfig Resource

    The RKE2 documentation is there, but, well, it is not quite as detailed as I have seen in other areas. This is a quick tip for customizing your Nginx Ingress controllers when using RKE2

    Using Nginx Ingress in RKE2

    By default, an RKE2 cluster deploys the nginx-ingress Helm chart. That’s great, except that you may need to customize that chart. This is where the HelmChartConfig resource is used.

    RKE2 uses HelmChartConfig custom resource definitions (CRDs) to allow you to set configuration options for their default Helm deployments. This is pretty useful, and seemed straightforward, except I had a hard time figuring out HOW to set the options.

    Always easier than I expect

    The RKE2 documentation points you to the nginx-ingress chart, but it took me a bit to realize that the connection was as simple as setting the valuesContent value in the HelmChartConfig spec to whatever values I wanted to pass in to Nginx.

    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
      name: rke2-ingress-nginx
      namespace: kube-system
    spec:
      valuesContent: |-
        controller:
          config:
            use-forwarded-headers: "true"
            proxy-buffer-size: "256k"
            proxy-buffer-number: "4"
            large-client-header-buffers: "4 16k"
          metrics:
            enabled: true
            serviceMonitor:
              enabled: true
              additionalLabels:
                cluster: nonproduction

    The above sets some configuration values in the controller AND enables metrics collection using the ServiceMonitor object. For Nginx, valid values for valuesContent are the same as values in the chart’s values.yaml file.

    Works with other charts

    RKE2 provides additional charts that can be deployed and customized with similar methods. There are charts which are deployed by default, and they provide instructions on disabling them. However, the same HelmChartConfig method above can be used to customize the chart installs as well.

  • Speeding up Packer Hyper-V Provisioning

    I spent a considerable amount of time working through the provisioning scripts for my RKE2 nodes. Each node took between 25-30 minutes to provision. I felt like I could do better.

    Check the tires

    A quick evaluation of the process quickly made me realize that most of the time is spent in the full install of Ubuntu. Using the hyperv-iso builder plugin from Packer, the machine would be provisioned from scratch. The installer took about 18-20 minutes to provision the VM fully. After that, the RKE2 install took about 1-2 minutes.

    Speaking with my colleague Justin, it occurred to me that I could probably get away with building out a base image using the ISO provisioner and then using the hyperv-vmcx provisioner to copy that base and create a new machine. In theory, that would cut the 18-20 minutes down to a copy job.

    Test Flight Alpha: Initial Cluster Provisioning

    A quick copy of my existing full provisioner and some judicious editing got me to the point where the hyperv-vmcx provisioner was running great and producing a VM. I had successfully cut my provisioning time down to under 5 minutes!

    I started editing my Rke2-Provisioning Powershell module to utilize the quick provisioning rather than the full provisioning. So I spun up a test cluster with 4 nodes (3 servers and one agent) to make sure everything came up correctly. And within about 30 minutes, that four node cluster was humming along in a quarter of the time it had taken me before.

    Test Flight Beta: Node Replacement

    The next bit of testing was to ensure that as I ran the replacement script, new machines were provisioned correctly and old machines were torn down. This is where I ran into a snag, but it was a bit difficult to detect at first.

    During the replacement, the first new node would come up fine, and the old node was properly removed and deleted. So, after the first cycle, I had one new node and one old node removed. However, I was getting somewhat random problems with the second, third, and fourth cycles. Most of the time, it was that the ETCD server, during Rancher provisioning, was picking up an IP address from the DHCP range, instead of using the fixed range tied to the MAC address.

    Quick Explanation

    I use the Unifi Controller to run my home network (Unifi Security Gateway and several access points). Through the Unifi APIs, and a wrapper API I wrote, I am able to generate a valid Hyper-V MAC address and associate it with a fixed IP on the Unifi before the Hyper-V is ever configured. When I create a new machine, I assign it the MAC address that was generated, and my DHCP server always assigns it the same address. This IP is outside of the allocated DHCP range for normal clients. I am working on publishing the Unifi IP Wrapper in a public repository for consumption.

    Back to it..

    As I was saying, even though I was assigning a MAC address that had an associated fixed IP, VMs provisioned after the first one seemed to be failing to pick up that IP. What was different?

    Well, deleting a node returns its IP to the pool, so the process looks something like this:

    • First new node provisioned (IP .45 assigned)
    • First old node deleted (return IP .25 to the pool)
    • Second new node provisioned (IP .25 assigned)

    My assumption is that the Unifi does not like such a quick reassignment of a static IP to a new MAC Address. To test this, I modified the provisioner to first create ALL the new nodes before deleting nodes.

    In that instance, the nodes provisioned correctly using their newly assigned IP. However, from a resource perspective, I hate the though of having to run 2n nodes during provisioning, when really all I need is n + 1.

    Test Flight Charlie: Changing IP assignments

    I modified my Unifi Wrapper API to cycle through the IP block I have assigned to my VMs instead of simply always using the lowest available IP. This allows me to go back to replacement one by one, without worrying about IP/MAC Address conflicts on the Unifi.

    Landing it…

    With this improvement, I have fewer qualms about configuring provisioning to run in the evenings. Most likely, I will build the base Ubuntu image weekly or bi-weekly to ensure I have the latest updates. From there, I can use the replacement scripts to replace old nodes with new nodes in the cluster.

    I have not decided if I’m going to use a simple task scheduler in Windows, or use an Azure DevOps build agent on my provisioner… Given my recent miscue when installing the Azure DevOps Build Agent, I may opt for the former.

  • Automated RKE2 Cluster Management

    One of the things I like about cloud-hosted Kubernetes solutions is that they take the pain out of node management. My latest home lab goal was to replicate some of that functionality with RKE2.

    Did I do it? Yes. Is there room for improvement? Of course, its a software project.

    The Problem

    With RKE1, I have a documented and very manual process for replacing nodes in my clusters. For RKE1, it shapes up like this:

    1. Provision a new VM.
    2. Add a DNS Entry for the new VM.
    3. Edit the cluster.yml file for that cluster, adding the new VM with the appropriate roles to match the outgoing node.
    4. Run rke up
    5. Edit the cluster.yml file for that cluster to remove the old VM.
    6. Run rke up
    7. Modify the cluster’s ingress-nginx settings, adding the new external IP and removing the old one.
    8. Modify my reverse proxy to reflect the IP Changes
    9. Delete the old VM and its DNS entries.

    Repeat the above process for every node in the cluster. Additionally, because the nodes could have slightly different docker versions or updates, I often found myself provisioning a whole set of VMs at a time and going through this process for all the existing nodes at once. The process was fraught with problems, not the least of which is me remembering things that I had to do.

    A DNS Solution

    I wrote a wrapper API to manage Windows DNS settings, and built calls to that wrapper into my Unifi Controller API so that, when I provision a new machine or remove an old one, it will add or remove the fixed IP from Unifi AND add or remove the appropriate DNS record for the machine.

    Since I made DNS entries easier to manage, I also came up with a DNS naming scheme to help manage cluster traffic:

    1. Every control plane node gets an A record with cp-<cluster name>.gerega.net. This lets my kubeconfig files remain unchanged, and traffic is distributed across the control plane nodes via round robin DNS.
    2. Every node gets an A record with tfx-<cluster name>.gerega.net. This allows me to configure my external reverse proxy to use this hostname instead of an individual IP list. See below for more on this from a reverse proxy perspective.

    That solved most of my DNS problems, but I still had issues with the various rke up runs and compatibility worries.

    Automating with RKE2

    The provisioning process for RKE2 is much simpler than that for RKE1. I was able to shift the cluster configuration into the Packer provisioning scripts, which allowed me to do more within the associated Powershell scripts. This, coupled with the DNS standards above, mean that I could run one script and end up with a completely provisioned RKE2 cluster.

    I quickly realized that adding and removing clusters to/from the RKE2 clusters was equally easy. Adding nodes to the cluster simply meant provisioning a new VM with the appropriate scripting to install RKE2 and add it to the existing control plane. Removing nodes from the cluster was simple:

    1. Drain the node (kubectl drain)
    2. Delete the node from the cluster (kubectl delete node/<node name>.
    3. Delete the VM (and its associated DNS).

    As long as I had at least one node with the server role running at all times, things worked fine.

    With RKE2, though, I decided to abandon my ingress-nginx installations in favor of using RKE2’s built-in Nginx Ingress. This allows me to skip managing the cluster’s external IPs, as the RKE cluster’s installer handles that for me.

    Proxying with Nginx

    A little over a year ago I posted my updated network diagram, which introduced a hardware proxy in the form of a Raspberry Pi running Nginx. That little box is a workhorse, and plans are in place for a much needed upgrade. However, in the mean time, it works.

    My configuration was heavily IP based: I would configure upstream blocks with each cluster node’s IP set, and then my sites would be configured to proxy to those IPs. Think something like this:

    upstream cluster1 {
      server 10.1.2.50:80;
      server 10.1.2.51:80;
      server 10.1.2.52:80;
    }
    
    server {
       ## server settings
    
       location / {
         proxy_pass http://cluster1;
         # proxy settings
       }
    }

    The issue here is, every time I add or remove a cluster node, I have to mess with this file. My DNS server is setup for round robin DNS, which means I should be able to add new A records with the same host name, and the DNS will cycle through the different servers.

    My worry, though, was the Nginx reverse proxy. If I configure the reverse proxy to a single DNS, will it cache that IP? Nothing to do but test, right? So I changed my configuration as follows:

    upstream cluster1 {
      server tfx-cluster1.gerega.net:80;
    }
    
    server {
       ## server settings
    
       location / {
         proxy_pass http://cluster1;
         # proxy settings
       }
    }

    Everything seemed to work, but how can I know it worked? For that, I dug into my Prometheus metrics.

    Finding where my traffic is going

    I spent a bit of time trying to figure out which metrics made the most sense to see the number of requests coming through each Nginx controller. As luck would have it, I always put a ServiceMonitor on my Nginx applications to make sure Prometheus is collecting data.

    I dug around in the in the Nginx metrics and found nginx_ingress_controller_requests. With some experimentation, I found this query:

    sum(rate(nginx_ingress_controller_requests{cluster="internal"}[2m])) by (instance)

    Looks easy, right? Basically, look at the sum of the rate of incoming requests by instance for a given time. Now, I could clean this up a little and add some rounding and such, but I really did not care about the number: I wanted to make sure that the request across the instances were balanced effectively. I was not disappointed:

    Rate of Incoming Request

    Each line is an Nginx controller pod in my internal cluster. Visually, things look to be balanced quite nicely!

    Yet Another Migration

    With the move to RKE2, I made more work for myself: I need to migrate my clusters from RKE1 to RKE2. With Argo, the migration should be pretty easy, but still, more home lab work.

    I also came out of this with a laundry list of tech tips and other long form posts… I will be busy over the next few weeks.

  • Moving On: Testing RKE2 Clusters in the Home Lab

    I spent the better part of the weekend recovering from crashing my RKE clusters last Friday. This put me on a path towards researching new Kubernetes clusters and determining the best path forward for my home lab.

    Intentionally Myopic

    Let me be clear: This is a home lab, whose purpose is not to help be build bulletproof, corporate production-ready clusters. I also do not want to run Minikube on a box somewhere. So, when I approached my “research” (you will see later why I put that term in quotes), I wanted to make sure I did not get bogged down in the minutia of different Kubernetes installs or details. I stuck with Rancher Kubernetes Engine (RKE1) for a long time because it was quick to stand up, relatively stable, and easy to manage.

    So, when I started looking for alternatives, my first research was into whether Rancher had any updated offerings. And, with that, I found RKE2.

    RKE2, aka RKE Government

    I already feel safer knowing that RKE2’s alter ego is RKE Government. All joking aside, as I dug into RKE2, it seemed a good mix of RKE1, which I am used to, and K3s, a lightweight implementation of Kubernetes. The RKE2 documentation was, frankly, much more intuitive and easy to navigate than the RKE1 documentation. I am not sure if it is because the documentation is that much better or if because RKE2 is that much easier to configure.

    I could spend pages upon pages explaining the experiments I ran over the last few evenings, but the proof is in the pudding, as they say. My provisioning-projects repository has a new Powershell script (Create-Rke2Cluster.ps1) that outlines the steps needed to get a cluster configured. My work, then, came down to how I wanted to configure the cluster.

    RKE1 Roles vs RKE2 Server/Agent

    RKE1 had a notion of node roles which were divided into three categories:

    • controlplane – Nodes with this role hosts the Kubernetes APIs
    • etcd – Nodes with this role host the etcd storage containers. There should be an odd number, at least 3 is a good choice.
    • worker – Nodes with this role can run workloads within the cluster.

    My RKE1 clusters typically have the following setup:

    • One node with controlplane, etcd, and worker roles.
    • Two nodes with etcd and worker roles.
    • If needed, additional nodes with just the worker role.

    This seemed to work well: I had proper redundancy with etcd and enough workers to host all of my workloads. Sure, I only had one control plane, so if that node went down, well, the cluster would be in trouble. However, I usually did not have much problem with keeping the nodes running so I left it as it stood.

    With RKE2, there is simply a notion of server and agent. The server node runs etcd and the control plane components, while agents run only user defined workloads. So, when I started planning my RKE2 clusters, I figured I would run one server and two agents. The lack of etcd redundancy would not have me losing sleep at night, but I really did not want to run 3 servers and then more agents for my workloads.

    As I started down this road, I wondered how I would be able to cycle nodes. I asked the #rke2 channel on rancher-users.slack.com, and got an answer from Brad Davidson: I should always have at least 2 available servers, even when cycling. However, he did mention something that was not immediately apparent: the server can and will run user-defined workloads unless the appropriate taints have been applied. So, in that sense, an RKE2 server acts similarly to my “all roles” node, where it functions as a control plane, etcd, and worker node.

    The Verdict?

    Once I saw a path forward with RKE2, I really have not looked back. I have put considerable time into my provisioning projects scripts, as well as creating a new API wrapper for Windows DNS management (post to follow).

    “But Matt, you haven’t considered Kuberenetes X or Y?”

    I know. There are a number of flavors of Kubernetes that can run your bare metal servers. I spent a lot of time and energy in learning RKE1, and I have gotten very good at managing those clusters. RKE2 is familiar, with improvements in all the right places. I can see automating not only machine provisioning, but the entire process of node replacement. I would love nothing more than to come downstairs on a Monday morning and see newly provisioned cluster nodes humming away after my automated process ran.

    So, yes, maybe I skipped a good portion of that “research” step, but I am ok with it. After all, it is my home lab: I am more interested in re-visiting Gravitee.io for API management and starting to put some real code services out in the world.