Tag: Kubernetes

  • Hitting for the cycle…

    Well, I may not be hitting for the cycle, but I am certainly cycling Kubernetes nodes like it is my job. The recent OpenSSL security patches got me thinking that I need to cycle my cluster nodes.

    A Quick Primer

    In Kubernetes, a “node” is, well, a machine performing some bit of work. It could be a VM or a physical machine, but, at its core, it is a machine with a container runtime of some kind that runs Kubernetes workloads.

    Using the Rancher Kubernetes Engine (RKE), each node can have one or more of the following roles

    • worker – The node can host user workloads
    • etcd – The node is a member of the etcd storage cluster
    • controlplane – The node is a member of the control plane

    Cycle? What and why

    When I say “cycling” the nodes, I am actually referring to the process of provisioning a new node, adding the new node to the cluster, and removing an old node from the cluster. I use the term “cycling” because, when the process is complete, my cluster is “back where it started” in terms of resourcing, but with a fresh and updated node.

    But, why cycle? Why not just run the necessary security patches on my existing nodes? In my view, even at the home lab level, there a few reasons for this method.

    A Clean and Consistent Node

    As nodes get older, they incur the costs of age. They may have some old container images from previous workloads, or some cached copies of various system packages. In general, they collect stuff, and a fresh node has none of that cost. By provisioning new nodes, we can ensure that the latest provisioning is run and all the necessary updates are installed.

    By using newly provisioned nodes each time, it prevents me from applying special configuration to nodes. If I need a particular configuration on a node, well, it has to be done in the provisioning scripts. All my cluster nodes are provisioned the same way, which makes them much more like cattle than pets.

    No Downtime or Capacity Loss

    Running node maintenance can potentially require a system reboot or service restarts, which can trigger downtime. In order to prevent downtime, it is recommended that nodes be “cordoned” (prevent new workload from being scheduled on that node) and “drained” (remove workloads from the node).

    Kubernetes is very good at scheduling and moving workloads between nodes, and there is built-in functionality for cordon and draining of nodes. However, to prevent downtime, I have to remove a node for maintenance, which means I lose some cluster capacity during maintenance.

    When you think about it, to do a zero-downtime update, your choices are:

    1. Take a node out of the cluster, upgrade it, then add it back
    2. Add a new node to the cluster (already upgraded) then remove the old node.

    So, applying the “cattle” mentality to our infrastructure, it is preferred to have disposable assets rather than precisely manicured assets.

    RKE performs this process for you when you remove a node from the cluster. That means, running rke up will remove old nodes if they have been removed from your cluster.yml.

    To Do: Can I automate this?

    As of right now, this process is pretty manual, and goes something like this:

    1. Provision a new node – This part is automated with Packer, but I really only like doing 1 at a time. I am already running 20+ VMs on my hypervisor, I do not like the thought of spiking to 25+ just to cycle a node.
    2. Remove nodes, one role at a time – I have found that RKE is most stable when you only remove one role at a time. What does that mean? It means, if I have a node that is running all three nodes, I need to remove the control plane role, then run rke up. Then remove the etcd role, and run rke up again. Then remove the node completely. I have not had good luck simply removing a node with all three roles.
    3. Ingress Changes – I need to change the IPs on my cluster in two places:
      • In my external load balancer, which is a Raspberry Pi running nginx.
      • In my nginx-ingress installation on the cluster. This is done via my GitOps repository.
    4. DNS Changes – I have aliases setup for the control plan nodes so that I can swap them in and out easily without changing other configurations. When I cycle a control plane node, I need to update the DNS.
    5. Delete Old Node – I have a small Powershell script for this, but it is another step.

    It would be wonderful if I could automate this into an Azure DevOps pipeline, but there are some problems with that approach.

    1. Packer’s Hyper-V builder has to run on the host machine, which means I need to be able to execute the Packer build commands on my server. I’m not about about put the Azure DevOps agent directly on my server.
    2. I have not found a good way to automate DNS changes, outside of using the Powershell module.
    3. I need to automate the IP changes in the external proxy and the cluster ingress. Both of which are possible but would require some research on my end.
    4. I would need to automate the RKE actions, specifically, adding new nodes and deleting roles/nodes from the cluster, and then running rke up as needed.

    None of the above is impossible, however, it would require some effort on my part to research some techniques and roll them up into a proper pipeline. For the time being, though, I have set my “age limit” for nodes at 60 days, and will continue the manual process. Perhaps, after a round or two, I will get frustrated enough to start the automation process.

  • What’s in a (Release) Name?

    With Rancher gone, one of my clusters was dedicated to running Argo and my standard cluster tools. Another cluster has now become home for a majority of the monitoring tools, including the Grafana/Loki/Mimir/Tempo stack. That second cluster was running a little hot in terms of memory and CPU. I had 6 machines running what 4-5 machines could be doing, so a consolidation effort was in order. The process went fairly smoothly, but a small hiccup threw me for a frustrating loop.

    Migrating Argo

    Having configured Argo to manage itself, I was hoping that a move to a new cluster would be relatively easy. I was not disappointed.

    For reference:

    • The ops cluster is the cluster that was running Rancher and is now just running Argo and some standard cluster tools.
    • The internal cluster is the cluster that is running my monitoring stack and is the target for Argo in this migration

    Migration Process

    • I provisioned two new nodes for my internal cluster, and added them as workers to the cluster using the rke command line tool.
    • To prevent problems, I updated Argo CD to the latest version and disabled auto-sync on all apps.
    • Within my ops-argo repository, I changed the target cluster of my internal applications to https://kubernetes.default.svc. Why? Argo treats the cluster that it is installed a bit differently than external clusters. Since I was moving Argo on to my internal cluster, the references to my internal cluster needed to change.
    • Exported my ArgoCD resources using the Argo CD CLI. I was not planning on using the import, however, because I wanted to see how this process would work.
    • I modified my external Nginx proxy to point my Argo address from the ops cluster to the internal cluster. This essentially locked everything out of the old Argo instance, but let it run on the cluster should I need to fall back.
    • Locally, I ran helm install argocd . -n argo --create-namespace from my ops-argo repository. This installed Argo on my internal cluster in the argo namespace. I grabbed the newly generated admin password and saved it in my password store.
    • Locally, I ran helm install argocd-apps . -n argo from my ops-argo repository. This installs the Argo Application and Project resources which serve as the “app of apps” to bootstrap the rest of the applications.
    • I re-added my nonproduction and production clusters to be managed by Argo using the argocd cluster add command. As the URLs for the cluster were the same as they were in the old cluster, the apps matched up nicely.
    • I re-added the necessary labels to each cluster’s ArgoCD secret. This allows the cluster generator to create applications for my external cluster tool. I detailed some of this in a previous article, and the ops-argo repository has the ApplicationSet definitions to help.

    And that was it… Argo found the existing applications on the other clusters, and after some re-sync to fix some labels and annotations, I was ready to go. Well, almost…

    What happened to my metrics?

    Some of my metrics went missing. Specifically, many of those that were coming from the various exporters in my Prometheus install. When I looked at Mimir, the metrics just stopped right around the time of my Argo upgrade and move. I checked the local Prometheus install, and noticed that those targets were not part of the service discovery page.

    I did not expect Argo’s upgrade to change much, so I did not take notice of the changes. So, I was left with digging around into my Prometheus and ServiceMonitor instances to figure out why they are not showing.

    Sadly, this took way longer than I anticipated. Why? There were a few reasons:

    1. I have a home improvement project running in parallel, which means I have almost no time to devote to researching these problems.
    2. I did not have Rancher! One of the things I did use Rancher for was to easily view the different resources in the cluster and compare. I finally remembered that I could use Lens for a GUI into my clusters, but sadly, it was a few days into the ordeal.
    3. Everything else was working, I was just missing a few metrics. On my own personal severity level, this was low. On the other hand, it drove my OCD crazy.

    Names are important

    After a few days of hunting around, I realized that the ServiceMonitor’s matchLabels did not match the labels on the Service objects themselves. Which was odd, because, I had not changed anything in those charts, and they are all part of the Bitnami kube-prometheus Helm chart that I am using.

    As I poked around, I realized that I was using the releaseName property on the Argo Application spec. After some searching, I found the disclaimer on Argo’s website about using releaseName. As it turns out, this disclaimer describes exactly the issue that I was experiencing.

    I spent about a minute to see if I could fix it without removing the releaseName properties from my cluster tools, I realized that the easiest path was to remove that releaseName property from the cluster tools that used it. That follows the guidance that Argo suggests, and keeps my configuration files much cleaner.

    After removing that override, the ServiceMonitor resources were able to find their associated services, and showed up in Prometheus’ service discovery. With that, now my OCD has to deal with a gap in metrics…

    Missing Node Exporter Metrics
  • A Lesson in Occam’s Razor: Configuring Mimir Ruler with Grafana

    Occam’s Razor posits “Of two competing theories, the simpler explanation is to be preferred.” I believe my high school biology teacher taught the “KISS” method (Keep It Simple, Stupid) to convey a similar principle. As I was trying to get alerts set up in Mimir using the Grafana UI, I came across an issue that was only finally solved by going back to the simplest answer.

    The Original Problem

    One of the reasons I leaned towards Mimir as a single source of metrics is that it has its own ruler component. This would allow me to store alerts in, well, two places: the Mimir ruler for metrics rules, and the Loki ruler for log rules. Sure, I could use Grafana alerts, but I like being able to run the alerts in the native system.

    So, after getting Mimir running, I went to add a new alert in Grafana. Neither my Mimir nor Loki data source showed up.

    Now, as it turns out, the Loki one was easy: I had not enabled the Loki ruler component in the Helm chart. So, a quick update of the Loki configuration in my GitOps repo, and Loki was running. However, my Mimir data source was still missing.

    Asking for help

    I looked for some time at possible solutions, and posted to the GitHub discussion looking for assistance. Dimitar Dimitrov was kind enough to step and and give me some pointers, including looking at the Chrome network tab to figure out if there were any errors.

    At that point, I first wanted to bury my head in the sand, as, of all the things I looked at, that was not one of the things I had done. But, after getting over my embarrassment, I went about debugging. It looked like some errors around the Access-Control-Allow-Origin header not being available, although, there was also a 405 when attempting to make a preflight OPTIONS request.

    Dimitar suggested that I add that Access-Control-Allow-Origin via the Mimir Helm chart’s nginx.nginxConfig section. So I did, and but still had the same problems.

    I had previously tried this on Microsoft Edge and it had worked, and had been assuming it was just Chrome being overly strict. However, just for fun, I tried in a different Chrome profile, and it worked…

    Applying Occam’s Razor

    It was at this point that I thought “There is no way that something like this would not have come up with Grafana/Mimir on Chrome.” I would have expected a quick note around the Access-Control-Allow-Origin or something similar when hosting Grafana in a different subdomain than Mimir. So I took a longer look at the network log in my instance of Chrome that was still throwing errors.

    There was a redirect in there that was listed as “cached.” I thought, well, that’s odd, why would it cache that redirect? So, as a test, I disabled the cache in the Chrome debugging tools, refreshed the page, and viola! Mimir showed as a data source for alerts.

    Looking at the resulting success calls, I noted that ALL of the calls were proxied through grafana.mydomain.local, which made me think “Do I really even need the Access-Control-Allow-Origin headers? So, I removed those headers from my Mimir configuration, re-deployed, and tested with the caching disabled. It worked like a champ.

    What happened?

    The best answer I can come up with is that, at some point in my clicking around Grafana with Mimir as a data source, Chrome got a 301 redirect response from https://grafana.mydomain.local/api/datasources/proxy/14/api/v1/status/buildinfo, cached it, and used it in perpetuity. Disabled the response cache fixed everything, without the need to further configure Mimir’s Nginx proxy to return special CORS headers.

    So, with many thanks to Dimitar for being my rubber duck, I am now able to start adding new rules for monitoring my cluster metrics.

  • Walking away from Rancher

    I like to keep my home lab running for both hosting sites (like this one) and experimenting on different tools and techniques. It is pretty nice to be able to spin up some containers in a cluster without fear of disturbing other’s work. And, since I do not hold myself to any SLA commitments, it is typically stress-free.

    When I started my Kubernetes learning, having the Rancher UI at my side made it a little easier to see what was going on in the cluster. Sure, I could use kubectl for that, but having a user interface just made things a little nicer while I learned the tricks of kubectl.

    The past few upgrades, though, have not gone smoothly. I believe they changed the way they are handling their CRDs, and that has been messing with some of my Argo management of Rancher. But, as I went about researching a fix, I started to think…

    Do I really need Rancher?

    I came to realize that, well, I do not use Rancher’s UI nearly as often as I had. I rely quite heavily on the Rancher Kubernetes Engine (RKE) command line tool to manage my clusters, but, I do not really use any of the features that Rancher provides. Which features?

    • Ranchers RBAC is pretty useless for me considering that I am the only one connecting to the clusters, and I use direct connections (not through Rancher’s APIs).
    • Sure, I have single sign-on configured using Azure AD. But I do not think I have logged in to the UI in the last 6 months.
    • With my Argo CD setup, I manage resources by storing the definitions in a Git repository. That means I do not use Rancher for the add-on applications for observability and logging.

    This last upgrade cycle took the cake. I did my normal steps: upgrade the Rancher Helm version, commit, and let Argo do its thing. That’s when things blew up.

    First, the new rancher images are so big they cause an image pull error due to timeouts. So, after manually pulling the images to the nodes, the containers started up. The logs showed them working on the CRDs. The node would get through to the “m’s” of the CRD list before timing out and cycling.

    This wreaked havoc on ALL my clusters. I do not know if it was the constant thrashing to the disc or the server CPU, but things just started getting timing out. Once I realized Rancher was thrashing, and, well, I did not feel like fixing it, I started the process of deleting it.

    Removing Rancher

    After uninstalling the Rancher Helm chart, I ran the Rancher cleanup job that they provide. Even though I am using the Bitnami kube-prometheus chart to install Prometheus on each cluster, the cleanup job deleted all of my ServiceMonitor instances in each cluster. I had to go into Argo and re-sync the applications that were not set to auto-sync in order to recreate the ServiceMonitor instances.

    Benchmarking Before and After

    With Mimir in place, I had collected metrics from those clusters in a single place from both before and after the change. That made creating a few Grafana dashboard panels a lot easier.

    Cluster Memory and CPU – 10/18/2022 09:30 EDT to 10/21/2022 09:30 EDT

    The absolute craziness that started around 10:00 EDT on 10/19 and ended around 10:00 EDT on 10/20 can be attributed to me trying to upgrade Rancher, failing, and then starting to delete Rancher from the clusters. We will ignore that time period, as it is caused by human error.

    Looking at the CPU usage first, there were some unexpected behaviors. I expected a drop in the ops cluster, which was running the Rancher UI, but the drop in CPU in the production cluster was larger than that of the internal cluster. Meanwhile, nonproduction went up, which is really strange. I may dig into those numbers a bit more by individual pod later.

    What struck me more, was that, across all clusters, memory usage went up. I have no immediate answers to that one, but, you can be sure I will be digging in a bit more to figure that out.

    So, while Rancher was certainly using some CPU, it would seem that I am seeing increased memory usage without it. When I have answers to that, you’ll have answers to that.

  • Kubernetes Observability, Part 5 – Using Mimir for long-term metric storage

    This post is part of a series on observability in Kubernetes clusters:

    For anyone who actually reads through my ramblings, I am sure they are asking themselves this question: “Didn’t he say Thanos for long-term metric storage?” Yes, yes I did say Thanos. Based on some blog posts from VMWare that I had come across, my initial thought was to utilize Thanos as my long term metrics storage solution. Having worked with Loki first, though, I became familiar with the distributed deployment and storage configuration. So, when I found that Mimir has similar configuration and compatibility, I thought I’d give it a try.

    Additionally, remember, this is for my home lab: I do not have a whole lot of time for configuration and management. With that in mind, using the “Grafana Stack” seemed an expedited solution.

    Getting Started with Mimir

    As with Loki, I started with the mimir-distributed Helm chart that Grafana provides. The charts are well documented, including a Getting Started guide on their website. The Helm chart includes a Minio dependency, but, as I already setup Minio, I disabled the included chart and configured a new bucket for Mimir.

    As Mimir has all the APIs to stand in for Prometheus, the changes were pretty easy:

    1. Get an instance of Mimir installed in my internal cluster
    2. Configure my current Prometheus instances to remote-write to Mimir
    3. Add Mimir as a data source in Grafana.
    4. Change my Grafana dashboards to use Mimir, and modify those dashboards to filter based on the cluster label that is added.

    As my GitOps repositories are public, have a look at my Mimir-based chart for details on my configuration and deployment.

    Labels? What labels?

    I am using the Bitnami kube-prometheus Helm chart. That particular chart allows you to define external labels using the prometheus.externalLabels value. In my case, I created a cluster label with unique values for each of my clusters. This allows me to create dashboards with a single data source which can be filtered based on each cluster using dashboard variables.

    Well…. that was easy

    All in all, it took me probably two hours to get Mimir running and collecting data from Prometheus. It took far less time than I anticipated, and opened up some new doors to reduce my observability footprint, such as:

    1. I immediately reduced the data retention on my local Prometheus installations to three days. This reduced my total disk usage in persistent volumes by about 40 GB.
    2. I started researching using the Grafana Agent as a replacement for a full Prometheus instance in each cluster. Generally, the agent should use less CPU and storage on each cluster.
    3. Grafana Agent is also a replacement for Promtail, meaning I could remove both my kube-prometheus and promtail tools and replace them with an instance of the Grafana Agent. This greatly simplifies the configuration of observability within the cluster.

    So, what’s the catch? Well, I’m tying myself to an opinionated stack. Sure, it’s based on open standards and has the support of the Prometheus community, but, it remains to be seen what features will become “pay to play” within the stack. Near the bottom of this blog post is some indication of what I am worried about: while the main features are licensed under AGPLv3, there are additional features that come with proprietary licensing. In my case, for my home lab, these features are of no consequence, but, when it comes to making decisions for long term Kubernetes Observability at work, I wonder what proprietary features we will require and how much it will cost us in the long run.

  • I’ll take ‘Compatibility Matrices’ for $1000, Alex…

    I have posted a lot on Kubernetes over the past weeks. I have covered a lot around tools for observability, monitoring, networking, and general usefulness. But what I ran into over the past few weeks is both infuriating and completely avoidable, with the right knowledge and time to keep up to speed.

    The problem

    I do my best to try and keep things up to date in the lab. Not necessarily bleeding edge, but I make an effort to check for new versions of my tools and update them. This includes the Kubernetes versions of my clusters.

    Using RKE, I am limited to what RKE supported. So I have been running 1.23.6 (or, more specifically, v1.23.6-rancher1-1) for at least a month or so.

    About two weeks ago, I updated my RKE command line and noticed that v1-23.8-rancher1-1 was available. So I changed the cluster.yml in my non-production environment and ran rke up. No errors, no problems, right?? So I made the change to the internal cluster and started the rke up for that cluster. As that was processing, however, I noticed that my non-production cluster was down. Like, the Kube API was running, so I could get the pod list. But every pod was erroring out. I did not notice anything in the pod events that was even remotely helpful, other than it couldn’t be scheduled. As I didn’t have time to properly diagnose, I attempted to roll the cluster back to 1.23.6. That worked, so I downgraded and left it alone.

    I will not let the machines win, so I stepped back into this problem today. I tried upgrading again (both to 1.23.8 and 1.24.2), with the same problems. In digging into the docker container logs on the hosts, I found the smoking gun:

    Unit kubepods-burstable.slice not found.

    Eh? I can say I have never seen that phrase before. But a quick Google search pointed me towards cgroup and, more generally, the docker runtime.

    Compatibility, you fickle fiend

    As it turns out, there is quite a large compatibility matrix between Rancher, RKE, Kubernetes, and Docker itself. My move to 1.23.8 clearly pushed my Kubernetes version past what my Docker version is (which, if you care, was Docker 20.10.12 on Ubuntu 22.04).

    I downgraded the non-production cluster once again, then upgraded Docker on those nodes (a simple sudo apt upgrade docker). Then I tried the upgrade, first to v1.23.8 (baby steps, folks).

    Success! Kubernetes version was upgraded, and all the pods restarted as expected. Throwing caution to the wind, I upgraded the cluster to 1.24.2. And, this time, no problems.

    Lessons Learned

    Kubernetes is a great orchestration tool. Coupled with the host of other CNCF tooling available, one can design and build robust application frameworks which let developers focus more on business code than deployment and scaling.

    But that robustness comes with a side of complexity. The host of container runtimes, Kubernetes versions, and third party hosts means cluster administration is, and will continue to be, a full time job. Just look at the Rancher Manager matrix at SUSE. Thinking about keeping production systems running while jugging the compatibility of all these different pieces makes me glad that I do not have to manage these beasts daily.

    Thankfully, cloud providers like Azure, GCP, and AWS, provide some respite by simplifying some of this. Their images almost force compatibility, so that one doesn’t run into the issues that I ran into on my home cluster. I am much more confident in their ability to run a production cluster than I am in my own skills as a Kubernetes admin.

  • Kubernetes Observability, Part 4 – Using Linkerd for Service Observability

    This post is part of a series on observability in Kubernetes clusters:

    As we start to look at traffic within our Kubernetes clusters, the notion of adding a service mesh crept into our discussions. I will not pretend to be an expert in service meshes, but the folks at bouyant.io (the creators of Linkerd) have done a pretty good job of explaining service meshes for engineers.

    My exercise to install Linkerd as a cluster was an exercise in “can I do it” more than having a need for a service mesh in place. However, Linkerd’s architecture is such that I can have Linkerd installed in the cluster, but only active on services that need it. This is accomplished via pod annotations, and make the system very configurable.

    Installing Linkerd

    With my ArgoCD setup, adding Linkerd as a cluster tool was pretty simple: I added the chart definition to the repository, then added a corresponding ApplicationSet definition. The ApplicationSet defined a cluster generator with a label match, meaning Linkerd would only be installed to clusters where I added spydersoft.io/linkerd=true as a label on the ArgoCD cluster secret.

    The most troublesome part of all of the installation process was figuring out how to manage Linkerd via GitOps. The folks at Linkerd, however, have a LOT of guides to help. You can review my chart definition for my installation methods, however, that was built from the following Linkerd articles:

    Many kudos to the Linkerd team, as their documentation was thorough and easy to follow.

    Adding Linkerd-viz

    Linkerd-viz is an add-on to Linkerd that has its own helm chart. As such, I manage it as a separate cluster tool. The visualization add-on has a dashboard that can be exposed via ingress and provide an overview of Linkerd and the metrics it is collecting. In my case, I tried to expose Linkerd-viz on a subpath (using my cluster’s internal domain name as the host). I ran into some issues (more on that below), but overall it works well.

    I broke it…

    As I started adding podAnnotations to inject Linkerd into my pods, things seemed to be “just working.” I even decorated my Nginx ingress controllers following the Linkerd guide, which meant traffic within my cluster was all going through Linkerd. This seemed to work well, until I tried to access my installation of Home Assistant. I spent a good while trying to debug, but as soon as I removed the pod annotations from Nginx, Home Assistant started working. While I am sure there is a way to fix that, I have not had much time to devout to the home lab recently, so that is on my to do list.

    I also noticed that the Linkerd-viz dashboard does not, at all, like to be hosted in a non-root URL. This has been documented as a bug in Linkerd, but is currently marked with the “help wanted” tag, so I am not expecting it to be fixed anytime soon. However, that bug identifies an ingress configuration snippet that can be added to the ingress definition to provide some basic rewrite functionality. It is a dirty workaround, and does not fix everything, but it is servicable.

    Benefits?

    For the pods that I have marked up, I can glance at the network traffic and latency between the services. I have started to create Grafana dashboards in my external instance to pull those metrics into an easy-to-read graphs for network performance.

    I have a lot more learning to do when it comes to Linkerd. While it is installed and running, I am certainly not using it for any heavy tasking. I hope to make some more time to investigate, but for now, I have some additional network metrics that help me understand what is going on in my clusters.

  • Kubernetes Observability, Part 3 – Dashboards with Grafana

    This post is part of a series on observability in Kubernetes clusters:

    What good is Loki’s log collection or Prometheus’ metrics scraping without a way to see it all? Both Loki and Prometheus are products born from Grafana Labs, which is building itself as an observability stack similar to Elastic. I have used both, and Grafana’s stack is much easier to get started with than Elastic. For my home lab, it is perfect: simple start and config, fairly easy monitoring, and no real administrative work to do.

    Installing Grafana

    The Helm chart provided for Grafana was very easy to use, and the “out-of-box” configuration was sufficient to get started. I configured a few more features to get the most out of my instance:

    • Using an existing secret for admin credentials: this secret is created by an ExternalSecret resource that pulls secrets from my local Hashicorp Vault instance.
    • Configuration for Azure AD users. Grafana’s documentation details additional actions that need done in your Azure AD instance. Note that the envFromSecret is the Kubernetes Secret that gets expanded to the environment, storing my Azure AD ClientID and Client Secret.
    • Added the grafana-piechart-panel plugin, as some of the dashboards I downloaded referenced that.
    • Enabled a Prometheus ServiceMonitor resource to scrape Grafana metrics.
    • Annotated the services and pods for Linkerd.

    Adding Data Sources

    With Grafana up and running, I added 5 data sources. Each of my clusters has their own Prometheus instance, and I added my Loki instance for logs. Eventually, Thanos will aggregate my Prometheus metrics into a single data source, but that is a topic for another day.

    Exploring

    The Grafana interface is fairly intuitive. The Explore section lets you poke around your data sources to review the data you are collecting. There are “builders” available to help you construct your PromQL (Prometheus Query Language) or LogQL (Log Query Language) queries. Querying Metrics automatically displays a line chart with your values, making it pretty easy to review your query results and prepare the query for inclusion in a dashboard.

    When debugging my applications, I use the Explore section almost exclusively to review incoming logs. A live log view and search with context makes it very easy to find warnings and errors within the log entries and determine issues.

    Building Dashboards

    In my limited use of the ELK stack, one thing that always got me was the barrier of entry into Kibana. I have always found that having examples to tinker with is a much easier way for me to learn, and I could never find good examples of Kibana dashboards that I could make my own.

    Grafana, however, has a pretty extensive list of community-built dashboards from which I could begin my learning. I started with some of the basics, like Cluster Monitoring for Kubernetes. And, well, even in that one, I ran into some issues. The current version in the community uses kubernetes_io_hostname as a label for the node name. It would seem that label has changed to node in kube-state-metrics, so I had to import that dashboard and make changes to the queries in order for the data to show.

    I found a few other dashboards that illustrated what I could do with Grafana:

    • ArgoCD – If you add a ServiceMonitor to ArgoCD, you can collect a number of metrics around application status and synchronizations, and this dashboard gives a great view of how ArgoCD is running.
    • Unifi Poller DashboardsUnifi Poller is an application that polls the Unifi controller to pull metrics and expose them to Prometheus in the OpenTelemetry standard. It includes dashboards for a number of different metrics.
    • NGINX Ingress – If you use NGINX for your Ingress controller, you can add a ServiceMonitor to scrape metrics on incoming network traffic.

    With these examples in hand, I have started to build out my own dashboards for some of my internal applications.

  • Kubernetes Observability, Part 2 – Collecting Metrics with Prometheus

    This post is part of a series on observability in Kubernetes clusters:

    Part 1 – Collecting Logs with Loki
    Part 2 – Collecting Metrics with Prometheus (this post)
    Part 3 – Building Grafana Dashboards
    Part 4 – Using Linkerd for Service Observability
    Part 5 – Using Mimir for long-term metric storage

    “Prometheus” appears in many Kubernetes blogs the same way that people whisper a famous person’s name as they enter a crowded room. Throughout a lot of my initial research, particularly with the k8s-at-home project, I kept seeing references to Prometheus in various Helm charts. Not wanting to distract myself, I usually just made sure it was disabled and skipped over it.

    As I found the time to invest in Kubernetes observability, gathering and viewing metrics became a requirement. So I did a bit of a deep dive into Prometheus and how I could use it to view metrics.

    The Short, Short Version

    By no means am I going to regurgitate the documentation around Prometheus. There is a lot to ingest around settings and configuration, and I have not even begun to scratch the surface around what I can do with Prometheus. My simple goal was to get Prometheus installed and running on my clusters, gathering metrics for my clusters. And I did this by installing Prometheus on each of my clusters using Bitnami‘s kube-prometheus Helm chart. Check out the cluster-tools folder in my Argo repository for an example.

    The Bitnami chart includes kube-state-metrics and node-exporter, in addition to settings for Thanos as a sidecar. So, with the kube-prometheus install up and running, I was able to create a new data source in Grafana and view the collected metrics. These metrics were fairly basic, consisting mainly of the metrics being reported by kube-state-metrics.

    Enabling ServiceMonitors

    Many of the charts I use to install cluster tools come with values sections that enabled metrics collection via a ServiceMonitor CRD. A ServiceMonitor instance instructs Prometheus how to discover and scrape services within the cluster.

    For example, I use the following Helm charts with ServiceMonitor definitions:

    So, in order to get metrics on these applications, I simply edited my values.yaml file and enabled the creation of a ServiceMonitor resource for each.

    Prometheus as a piece of the puzzle

    I know I skipped a LOT. In a single cluster environment, installing Prometheus should be as simple as choosing your preferred installation method and getting started scraping. As always, my multi-cluster home lab presents some questions that mirror questions I get at work. In this case, how do we scale out to manage multiple clusters?

    Prometheus allows for federation, meaning one Prometheus server can scrape other Prometheus servers to gather metrics. Through a lot of web searching, I came across an article from Vikram Vaswani and Juan Ariza centered on creating a multi-cluster monitoring dashboard using Prometheus, Grafana, and Thanos. The described solution is close to what I would like to see in my home lab. While I will touch on Grafana and Thanos in later articles, the key piece of this article was installing Prometheus in each cluster, and, in as described, creating a Thanos sidecar to aid in the implementation of Thanos.

    Why Bitnami?

    I have something of a love/hate relationship with Bitnami. The VMWare-owned property is a collection of various images and Helm charts meant to be the “app store” for your Kubernetes cluster. Well, more specifically, it is meant to be the app store for your VMWare Tanzu environments, but the Helm charts are published so that others can use them.

    Generally, I prefer using “official” charts for applications. This usually ensures that the versions are the most current, and the chart is typically free from the “bloat” that can sometimes happen in Bitnami charts, where they package additional sub-charts to make things easy.

    That is not to say that I do not use Bitnami charts at all. My WordPress installation uses the Bitnami chart, and it serves its purpose. However, being community-driven, the versions can lag a bit. I know the 6.x release of WordPress took a few weeks to make it from WordPress to Bitnami.

    For Prometheus, there are a few community-driven charts, but you may notice that Helm is not in the list of installation methods. This, coupled with the desire to implement Thanos later, led me to the kube-prometheus chart by Bitnami. You may have better mileage with one of the Prometheus Community charts, but for now I am sticking to the Bitnami chart.

  • Kubernetes Observability, Part 1 – Collecting Logs with Loki

    This post is part of a series on observability in Kubernetes clusters:

    I have been spending an inordinate amount of time wrapping my head around Kubernetes observability for my home lab. Rather than consolidate all of this into a single post, I am going to break up my learnings into bite-sized chunks. We’ll start with collecting cluster logs.

    The Problem

    Good containers generate a lot of logs. Outside of getting into the containers via kubectl, logging is the primary mechanism for identifying what is happening within a particular container. We need a way to collect the logs from various containers and consolidate them in a single place.

    My goal was to find a log aggregation solution that gives me insights into all the logs in the cluster, without needing special instrumentation.

    The Candidates – ELK versus Loki

    For a while now, I have been running an ELK (Elasticsearch/Logstash/Kibana) stack locally. My hobby projects utilized an ElasticSearch sink for Serilog to ship logs directly from those images to ElasticSearch. I found that I could install Filebeats into the cluster and ship all container logs to Elasticsearch, which allowed me to gather the logs across containers.

    ELK

    Elasticsearch is a beast. It’s capabilities are quite impressive as a document and index solution. But those capabilities make it really heavy for what I wanted, which was “a way to view logs across containers.”

    For a while, I have been running an ELK instance on my internal tools cluster. It has served its purpose: I am able to report logs from Nginx via Filebeats, and my home containers are built with a Serilog sink that reports logs directly to elastic. I recently discovered how to install Filebeats onto my K8 clusters, which allows it to pull logs from the containers. This, however, exploded my Elastic instance.

    Full disclosure: I’m no Elasticsearch administrator. Perhaps, with proper experience, I could make that work. But Elastic felt heavy, and I didn’t feel like I was getting value out of the data I was collecting.

    Loki

    A few of my colleagues found Grafana Loki as a potential solution for log aggregation. I attempted an installation to compare the solutions.

    Loki is a log aggregation system which provides log storage and querying. It is not limited to Kubernetes: there are number of official clients for sending logs, as well as some unofficial third-party ones. Loki stores your incoming logs (see Storage below), creates indices on some of the log metadata, and provides a custom query language (LogQL) to allow you to explore your logs. Loki integrates with Grafana for visual log exploration, LogCLI for command line searches, and Prometheus AlertManager for routing alerts based on logs.

    One of the clients, Promtail, can be installed on a cluster to scrape logs and report them back to Loki. My colleague suggested a Loki instance on each cluster. I found a few discussions in Grafana’s Github issues section around this, but it lead to a pivotal question.

    Loki per Cluster or One Loki for All?

    I laughed a little as I typed this, because the notion of “multiple Lokis” is explored in decidedly different context in the Marvel series. My options were less exciting: do I have one instance of Loki that collects data from different clients across the clusters, or do I allow each cluster to have it’s own instance of Loki, and use Grafana to attach to multiple data sources?

    Why consider Loki on every cluster?

    Advantages

    Decreased network chatter – If every cluster has a local Loki instance, then logs for that cluster do not have far to go, which means minimal external network traffic.

    Localized Logs – Each cluster would be responsible for storing its own log information, so finding logs for a particular cluster is as simple as going to the cluster itself

    Disadvantages

    Non-centralized – The is no way to query logs across clusters without some additional aggregator (like another Loki instance). This would cause duplicative data storage

    Maintenance Overhead – Each cluster’s Loki instance must be managed individually. This can be automated using ArgoCD and the cluster generator, but it still means that every cluster has to run Loki.

    Final Decision?

    For my home lab, Loki fits the bill. The installation was easy-ish (if you are familiar with Helm and not afraid of community forums), and once configured, it gave me the flexibility I needed with easy, declarative maintenance. But, which deployment method?

    Honestly, I was leaning a bit towards the individual Loki instances. So much so that I configured Loki as a cluster tool and deployed it to all of my instances. And that worked, although swapping around Grafana data sources for various logs started to get tedious. And, when I thought about where I should report logs for other systems (like my Raspberry PI-based Nginx proxy), doubts crept in.

    Thinking about using an ELK stack, I certainly would not put an instance of Elasticsearch on every cluster. While Loki is a little lighter than elastic, it’s still heavy enough that it’s worthy of a single, properly configured instance. So I removed the cluster-specific Loki instances and configured one instance.

    With promtail deployed via an ArgoCD ApplicationSet with a cluster generator, I was off to the races.

    A Note on Storage

    Loki has a few storage options, with the majority being cloud-based. At work, with storage options in both Azure and GCP, this is a non-issue. But for my home lab, well, I didn’t want to burn cash storing logs when I have a perfectly good SAN sitting at home.

    My solution there was to stand up an instance of MinIO to act as S3 storage for Loki. Now, could I have run MinIO on Kubernetes? Sure. But, in all honesty, it got pretty confusing pretty quickly, and I was more interested in getting Loki running. So I spun up a Hyper-V machine with Ubuntu 22.04 and started running MinIO. Maybe one day I’ll work on getting MinIO running on one of my K8 clusters, but for now, the single machine works just fine.