• A little maintenance task

    As I mentioned in my previous post, I ran into what I believe are some GPU issues with my Dell 7510. So, like any self-respecting nerd (is that an oxymoron?), I ordered a replacement part from parts-people.com and got to work.

    Prep Work

    As with most things these days, you can almost always find some instructions on the internet. I found a tutorial from UFixTek on YouTube that covers a full cleanup and re-paste. The only additional step in my repair was to replace the GPU with a new one.

    With that, I setup a workstation and got to it. Over the past few years I have acquired some tools that make this type of work super helpful:

    • Precision Screwdriver Set – I have an older version of this Husky set. I cannot tell you how many times it’s saved me when doing small electronics work.
    • Pry Tool Set – I ordered this set about 4 years ago, but it hasn’t changed much. The rollup case is nice.
    • Small Slotted screwdriver – The Husky set is great, but for screws that are deep set, sometimes you need a standard screwdriver. I honestly don’t remember where I got mine, it’s the red handled one in the photos below.
    • Exact-o Knife – Always handy.
    • Cutting/Work mat – I currently use a Fiskars cutting mat from Michaels. It protects the desktop and the piece, and the grid patter is nice for parts organization.
    • Compressed Air Duster – I LOVE this thing. I use it for any number of electronics cleaning tasks, including my keyboard. It’s also powerful enough to use as an inflator tool for small inflatables.
    • Rubbing alcohol for cleaning
    • Lint free paper towels
    • new GPU
    • Thermal Paste

    Teardown

    Following the tutorial, I started disassembly, being careful to organize the screws as I went along. I used a few small Post-It tags to label the screws in case I forgot. I removed the M.2 drive, although, in retrospect, I do not think it was necessary.

    Battery, hard drive, and cover removed

    Laptops have pretty tight tolerances and a number of ribbon cables to connect everything together. The tutorial breaks down where they are, but it’s important to keep those in mind as you tear down. If you miss disconnecting one, you run the risk of tearing it.

    Keyboard and palm rest removed

    Clean up!

    Once I got down to the heatsink assembly, I removed it (and the fans) from the laptop. I gave it the same scrubbing as shown in the tutorial, except I did not have to clean my old GPU since I was installing a new one. I cannot tell you how much thermal paste was on this. It was obscene.

    I took care to really clean out the fan assemblies, including the fins. There was about 5 years of dust built up in there, and there was a noticeable reduction in airflow. I’m sure this didn’t help my thermal issues.

    Re-paste and Re-assemble!

    With everything sufficiently cleaned up and blown out, I applied an appropriate amount of thermal paste to the GPU and CPU, put the heatsink assembly back, and reversed the process to re-assemble. Again, it’s important to make note of all the connections: missing a ribbon cable or connection here will lead to unnecessary disassembly just to get it hooked back up.

    And now, we test…

    “But does it work?” There’s only one way to find out. I turned it back on, and, well, the screen came up, so that is a victory. Although, there is an integrated graphics chip… so maybe not as big a victory as I would anticipate.

    Windows 10 booted fine, and before I plugged in additional displays, I check device manager. The card was detected and stated it was functioning correctly. I plugged in my external displays (one HDMI, one mini DisplayPort -> HDMI), and they detected normally and switched over.

    I fired up FurMark to run some GPU tests and see if the new AMD works. Now, I had used FurMark on the old GPU and was unable to lock up my laptop like Fusion 360 was doing. So, running FurMark again is not a sure test, but worth running anyway.

    One thing I immediately noticed is that FurMark was reporting GPU temperatures, something that was not happening with my old GPU. That’s a good sign, right? After letting FurMark run the stress test for a while, I figured it was time to fire up Fusion 360 and try to hang my laptop.

    As with FurMark, Fusion 360 didn’t always hang the laptop. There was no one action that caused it, although, orbiting objects quickly seemed to be a trigger that caused problems more often than not. So I opened a few images and orbited them. No issues.

    Victory?

    I hesitate to declare total victory here: the GPU issue was not consistent, which means all I really know is I am back to where I started. Without some “time under tension,” I’m going to be very wary as I dig into modeling and make sure I save often. But there is promise that the change was for the better. If nothing else, the laptop got a good cleaning that it desperately needed.

  • Tech Tip – Interacting with ETCD in Rancher Kubernetes Engine 2

    Since cycling my cluster nodes is a “fire script and wait” operation, I kicked one off today. I ended up running into an issue that required me to dig a bit into ETCD in RKE2, and could not find direct help, so this is as much my own reference as it is a guide for others.

    I broke it…

    When provisioning new machines, I still have some odd behaviors when it comes to IP address assignment. I do not set the IP address manually: I use a static MAC address on the VM and then create a fixed IP for that MAC address. About 90% of the time, that works great. Every so often, though, in the provisioning process, the VM picks up an IP address from the DHCP instead of the fixed IP, and that wrecks stuff, especially around ETCD.

    This happened today: In standing up a replacement, the new machine picked up a DHCP IP. Unfortunately, I didn’t remove the machine properly, which caused my ETCD cluster to still see the node as a member. When I deleted the node and tried to re-provision, I got ETCD errors because I was trying to add a node name that already exists.

    Getting in to ETCD

    RKE2’s docs are a little quiet on actually viewing what’s in ETCD. Through some googling, I figured out that I could use etcdctl to show and manipulate members, but I couldn’t figure out how to actually run the command.

    As it turns out, the easiest way to run it is to run it on one of the ETCD pods itself. I came across this bug report in RKE2 that indirectly showed me how to run etcdctl commands from my machine through the ETCD pods. The member list command is

    kubectl -n kube-system exec <etcd_pod_name> -- sh -c "ETCDCTL_ENDPOINTS='https://127.0.0.1:2379' ETCDCTL_CACERT='/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/rke2/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/rke2/server/tls/etcd/server-client.key' ETCDCTL_API=3 etcdctl member list"

    Note all the credential setting via environment variables. In theory, I could “jump in” to the etcd pod using a simple sh command and run a session, but keeping it like this forces me to be judicious in my execution of etcdctl commands.

    I found the offending entry and removed it from the list, and was able to run my cycle script again and complete my updates.

  • Jumping in to 3D Design and Printing

    As I’ve been progressing through various projects at home, I have a few 3D printing projects that I would like to tackle. With that, I needed to learn how to design models to print. This has led me down a bit of a long road, and I’m still not done….

    What to use?

    A colleague of mine who does a fair amount of 3D printing suggested the personal use version of Autodesk’s Fusion360. With my home laptop running a pretty fresh version of Windows 11, I figured it would be worth a shot. He pointed me to a set of Youtube tutorials for learning Fusion360 in 30 Days, so I got started.

    I pretty promptly locked up my machine… In doing some simple rendering (no materials, no complex build/print patterns), my machine simply locked up into a lovely pinstripe pattern. This persisted after following all of Autodesk’s recommendations, including a fresh install of Fusion360.

    Downgrading!

    Knowing that some or all of the devices on the laptop may not have appropriate Windows 11 drivers, I made the decision to re-install Windows 10 and stay there. It’s a little painful, as I have gotten somewhat used to the quirks of 11, but I want to be able to draw!

    So I installed Windows 10 fresh, got all the latest updates (including the AMD Pro software), and tried Fusion360 again. I got paste where I locked up in Windows 11, and actually got to Day 8 of the tutorials. And then the lockups came back.

    A small hiccup on my part

    I may have gotten a little impatient, and simultaneously uninstalled the AMD drivers while installing some other drivers, and I pretty much made my machine unbootable… So, I am in the process of re-installing Windows 10 and applying all the latest updates.

    As part of this, however, I am going to take things a TOUCH slower. I had Fusion360 running pretty smoothly up until my Day 8 lesson, but I also installed Windows Subsystem Linux between my Day 7 and Day 8 lessons. And while I truly hope this isn’t the case, I am wondering if something in WSL is causing issues with Fusion360…

    So I’m going to take my machine back to the same state it was in, minus the WSL install, to see if I get the same lockups in Fusion360. I’ll let you know how it turns out!

    Update – 9/18/2023

    I got everything re-installed, including drivers for my GPU, but it is still locking up. However, there is some added information: I got it to lock up outside of Fusion360 in the same way.

    I searched a number of online forums, and the suggestions seem to center around a dying GPU… Doh! So, I have a few options:

    1. Build a new system….
    2. Fix this one.

    I do not like the idea of time/money spent on a new system, especially when the specs on this laptop are more than sufficient for what I need. I found a replacement GPU today for under $100, so it is on its way. I took a peek at the installation video and I am not looking forwards to a full disassemble, but it will allow me to clean out the drives, reset the heat sinks, and hopefully solve the GPU issue.

  • Hackintosh – Windows 11 Edition

    About a year ago, I upgraded my home laptop to Windows 11. I swapped out my old system drive (which was a spinner) for a new SSD, so I had to go through the process again. I ran into a few different issues this time around that are worth putting to paper.

    Windows 10 Install – Easy

    After installing the new drive, I followed the instructions to create Windows 10 bootable media. Booted from the USB, installed Windows 10. Nothing remarkable, just enough to get into the machine. After a few rounds of Windows updates, I felt like I was ready to go.

    Windows 11 Upgrade – Not so Easy

    My home laptop is running a CPU that isn’t compatible with Windows 11. That doesn’t mean I can’t run it, it just means I have to hack Windows a bit.

    In the past, I followed this guide to set the appropriate registry entry and get things installed. This time around should be no different, right?

    Wrong. I made the change, but the installer continued to crash. A little Googling took me to this post, which lead to this article about resetting Windows Update in Windows 10. After downloading and running the batch file from the instructions, I was able to install Windows 11 again.

    Done!

    After a bit of waiting, I have a Windows 11 machine running. Now time to rebuilt to my liking… Thank goodness for Chocolatey.

  • What’s in a home lab?

    A colleague asked today about my home lab configuration, and I came to the realization that I have never published a good inventory of the different software and hardware that I run as part of my home lab / home automation setup. While I have documented bits and pieces, I never pushed a full update. I will do my best to hit the highlights without boring everyone.

    Hardware

    I have a small cabinet in my basement mechanical room which contains the majority of my hardware, with some other devices sprinkled around.

    This is all a good mix of new and used stuff: Ebay was a big help. Most of it was procured over several years, including a number of partial updates to the NAS disks

    • NAS – Synology Diskstation 1517+. This is the 5-bay model. I added the M2D18 expansion card, and I currently have 5 x 4TB WD Red Drives and 2 x 1GB WD SSDs for cache. Total storage in my configuration is 14TB.
    • Server – HP ProLiant DL380p Gen8. Two Xeon E5-2660 processors, 288 GB of RAM, and two separate RAID arrays. The system array is 136GB, while the storage array is 1TB.
    • Network
      • HP ProCurve Switch 2810-24G – A 24 port GB switch that serves most of my switching needs.
      • Unifi Security Gateway – Handles all of my incoming/outgoing traffic through the modem and provides most of my high-level network capabilities.
      • Unifi Access Points – Three in total, 2 are the UAP-AC-LR models, the other is the UAP-AC-M outdoor antenna.
      • Motorola Modem – I did not need the features of the Comcast/Xfinity modem, nor did I want to lease it, so I bought a compatible modem.
    • Miscellaneous Items
      • BananaPi M5 – Runs Nginx as a reverse proxy into my network.
      • RaspberryPi 4B+ – Runs Home Assistant. This was a recent move, documented pretty heavily in a series of posts that starts here.
      • RaspberryPi Model B – That’s right, an O.G. Pi that runs my monitoring scripts to check for system status and reports to statuspage.io.
      • RaspberryPi 4B+ – Mounted behind the television in my office, this one runs a copy of MagicMirror to give me some important information at a glance.
      • RaspberryPi 3B+ – Currently dormant.

    Software

    This one is lengthy, so I broke it down into what I hope are logical and manageable categories.

    The server is running Windows Hyper-V Server 2019. Everything else, unless noted, is running on a VM on that server.

    Server VMs

    • Domain Controllers – Two Windows domain controllers (primary and secondary).
    • SQL Servers – Two SQL servers (non-production and production). It’s a home lab, so the express editions suffice.

    Kubernetes

    My activities around Kubernetes are probably the most well-documented of the bunch, but, to be complete: Three RKE2 Kubernetes clusters. Two three-node clusters and one four-node cluster to run internal, non-production, and production workloads. The nodes are Ubuntu 22.04 images with RKE2 installed.

    Management and Monitoring Tools

    For some management and observability into this system, I have a few different software suites running.

    • Unifi Controller – This makes management of the USG and Access points much easier. It is currently running in the production cluster using the jacobalberty image.
    • ArgoCD – Argo is my current GitOps operator and is used to make sure what I want deployed on my clusters is out there.
    • LGTM Stack – I have instances of Loki, Grafana, Tempo, and Mimir running in my internal cluster, acting as the target for metrics data.
    • Grafana Agent – For my VMs and other hardware that supports it, I installed Grafana Agent and configured them to report metrics and logs to Mimir/Loki.
    • Hashicorp Vault – I am running an instance of Hashicorp Vault in my clusters to provide secret management, using the External Secrets operator to provide cached secret management in Kubernetes.
    • Minio – In order to provide a local storage instance with S3 compatible APIs, I’m running Minio as a docker image directly on the Synology.

    Cluster Tools

    Using Application Sets and the Cluster generator, I configured a number of “cluster tools” which allow me to install different tools to clusters using labels and annotations on the Argo cluster Secret resource.

    This allows me to install multiple tools using the same configuration, which improves consistency. The following are configured for each cluster.

    • kube-prometheus – I use Bitnami’s kube-prometheus Helm chart to install an instance of Prometheus on each cluster. They are configured to remote-write to Mimir.
    • promtail – I use the promtail Helm chart to install an instance of Promtail on each cluster. They are configured to remote-write to Mimir.
    • External Secrets – The External Secrets operator helps bootstrap connection to a variety of external vaults and creates Kubernetes Secret resources from the ExternalSecret / ExternalClusterSecret custom resources.
    • nfs-subdir-external-provisioner – For PersistantVolumes, I use the nfs-subdir-external-provisioner and configure it to point to dedicated NFS shares on the Synology NAS. Each cluster has its own folder, making it easy to backup through the various NAS tools
    • cert-manager – While I currently have cert-manager installed as a cluster tool, if I remember correctly, this was for my testing of Linkerd, which I’ve since removed. Right now, my SSL traffic is offloaded at the reverse proxy. This has multiple benefits, not the least of which is that I was able to automate my certificate renewals in one place. Still, cert-manager is available but no certificate stores are currently configured.

    Development Tools

    It is a lab, after all.

    • Proget – I am running the free version of Proget for private Nuget and container image feeds. As I move to open source my projects, I may migrate to Github artifact storage, but for now, it is stored locally.
    • SonarQube Community – I am running an instance of SonarQube community for quality control. However, as with Proget, I have begun moving some of my open source projects to Sonarcloud.io, so this instance may fall away.

    Custom Code

    I have a few projects, mostly small APIs that allow me to automate some of my tasks. My largest “project” is my instance of Identity Server, which I use primarily to lock down my other APIs.

    And of course…

    WordPress. This site runs in my production cluster, using the Bitnami chart, which includes the database.

    And there you go…

    So that is what makes up my home lab these days. As with most good labs, things are constantly changing, but hopefully this snapshot presents a high level picture into my lab.

  • Tech Tip – Options Pattern in ASP.NET Core

    I have looked this up at least twice this year. Maybe if I write about it, it will stick with me. If it doesn’t, well, at least I can look here.

    Options Pattern

    The Options pattern is a set of interfaces that allow you to read options into classes in your ASP.NET application. This allows you to configure options classes which are strongly typed with default values and attributes for option validation. It also removes most of the “magic strings” that can come along with reading configuration settings. I will do you all a favor and not regurgitate the documentation, but rather leave a link so you can read all about the pattern.

    A Small Sample

    Let’s assume I have a small class called HostSettings to store my options:

     public class HostSettings
     {
         public const string SectionName = "HostSettings";
         public string Host { get; set; } = string.Empty;
         public int Port { get; set; } = 5000;
     }

    And my appsettings.json file looks like this:

    {
      "HostSettings": {
        "Host": "http://0.0.0.0",
        "Port": 5000
      },
      /// More settings here
    }

    Using Dependency Injection

    For whatever reason, I always seem to remember how to configure options using the dependency injector. Assuming the above, adding options to the store looks something like this:

    var builder = WebApplication.CreateBuilder(options);
    builder.Services.Configure<HostSettings>(builder.Configuration.GetSection(HostSettings.SectionName));

    From here, to get HostSettings into your class, add an IOptions<HostSettings> parameter to your class, and access the options using the IOptions.Value implementation.

    public class MyService
    {
       private readonly HostSettings _settings;
    
       public MyService(IOptions<HostSettings) options)
       {
          _settings = options.Value;
       }
    }

    Options without Dependency Injection

    What I always, always forget about is how to get options without using the DI pattern. Every time I look it up, I have that “oh, that’s right” moment.

    var hostSettings = new HostSettings();
    builder.Configuration.GetSection(HostSEttings.SectionName).Bind(hostSettings);

    Yup. That’s it. Seems silly that I forget that, but I do. Pretty much every time I need to use it.

    A Note on SectionName

    You may notice the SectionName constant that I add to the class that holds the settings. This allows me to keep the name/location of the settings in the appsettings.json file within the class itself.

    Since I only have a few classes which house these options, I load them manually. It would not be a stretch, however, to create a simple interface and use reflection to load options classes dynamically. It could even be encapsulated into a small package for distribution across applications… Perhaps an idea for an open source package.

  • SonarCloud has become my Frank’s Red Hot…

    … I put that $h!t on everything!

    A lot has been made in recent weeks about open source and its effects on all that we do in software. And while we all debate the ethics of Hashicorp’s decision to turn to a “more closed” licensing model and question the subsequent fork of their open source code, we should remember that there are companies who offer their cloud solutions free for open source projects.

    But first, Github

    Github has long been the mecca for open source developers, and even under Microsoft’s umbrella, that does not look to be slowing down. Things like CI/CD through Github Actions and Package Storage are free for public repositories. So, without paying a dime, you can store your open source code, get automatic security and version updates, build your code, and store build artifacts all in Github. All of this built on the back of a great ecosystem for pull request reviews and checks. For my open source projects, it provides great visibility into my code and puts MOST of what I want in one place.

    And then SonarQube/Cloud

    SonarSource’s SonarQube offering is a great way to get static code analysis on your code. While their community edition is missing features that require an enterprise license, their cloud offering provides free analysis of open source projects.

    With that in mind, I have started to add my open source projects to SonarCloud.io. Why? Well, first, it does give me some insight into where my code could be better, which keeps me honest. Second, on the off chance that anyone wants to contribute to my projects, the Sonar analysis will help me quickly determine the quality of the incoming code before I accept the PR.

    Configuring the SonarCloud integration with Github even provides a sonarcloud bot that reports on the quality gate for pull requests. What does that mean? It means I get a great picture of the quality of the incoming code:

    What Next?

    I have been spending a great deal of time on the Static Code Analysis side of the house, and I have been reasonably impressed with SonarQube. I have a few more public projects which will receive a SonarCloud instance, but at work, it is more about identifying the value that can come from this type of scanning.

    So, what is that value, you may ask? Enhancing and automating your quality gates is always beneficial, as it streamlines your developer work flow. It also sets expectations: Engineers know that bad/smelly code will be caught well before a pull request is merged.

    If NOTHING else, SonarQube allows you to track your testing coverage and ensuring it does not trend backwards. If we did nothing else, we should at least ensure that we continue to cover what we write new, even if those before us did not.

  • Taking my MagicMirror modules to Typescript

    It came as a bit of a shock that I have been running MagicMirror in my home office for almost two years now. I even wrote two modules, one to display Prometheus alerts and one to show Status Page status.

    In the past few years I have started to become more and more comfortable with Typescript, I wanted to see if I could convert my modules to Typescript.

    Finding an example

    As is the case with most development, the first step was to see if someone else had done it. As it turns out, a few folks have done it.

    I stumbled across Michael Scharl’s post on dev.to which covered his Typescript MagicMirror module. In the same search, I ran across a forum post by Jalibu that focused a little more on the nitty-gritty, including his contribution of the magicmirror-module in DefinitelyTyped.

    Migrating to Typescript

    Ultimately, the goal was to generate the necessary module files for MagicMirror through transpilation using Rollup (see below), but first I needed to move my code and convert it to Typescript. I created a src folder, moved my module file and node_helper into there, and changed the extension to .ts.

    From there, I split things up into a more logical configuration, utilizing Typescript as well as taking advantage of ESNext based module imports. As it would all be transpiled into Javascript, I could take advantage of the module options in Typescript to clean up my code.

    My modules already had a good amount of development packages around linting and formatting, so I updated all of those and added packages necessary for Typescript linting.

    A Note on Typing

    Originally, following Michael Scharl’s sample code, I had essentially copied the module-types.ts file from the MagicMirror repo and renamed it ModuleTypes.d.ts in my own code. I did not particularly like that method, as it required me to have extra code in my module, and I would have to update it as the MagicMirror types changed.

    Jalibu‘s addition of the @types/magicmirror-module package simplified things greatly. I installed the package and imported what I needed.

    import * as Log from "logger";
    import * as NodeHelper from "node_helper";

    The package includes a Module namespace that makes registering your module easy:

    Module.register<Config>("MMM-PrometheusAlerts", {
      // Module implementation
    }

    A few tweaks to the tsconfig.json file, and the tsc command was running!

    Using Rollup

    The way that MagicMirror is set up, the modules generally need the following:

    • Core Module File, named after the module (<modulename>.js)
    • Node Helper (node_helper.js) that represents a Node.js backend task. It is optional, but I always seem to have one.
    • CSS file, if needed. Would contain any custom styling for the HTML generated in the Core Module file.

    Michael Scharl’s post detailed his use of Rollup to create these files, however, as the post is a few years old, it required a few updates. Most of it was installing the scoped rollup packages (@rollup), but I also removed the banner plugin.

    I configured my Rollup in a ‘one to one’ fashion, mapping my core module file (src/MMM-PrometheusAlerts.ts) to its output file (MMM-PrometheusAlerts.js) and my node helper (src/node_helper.ts) to its output file (node_helper.js). Rollup would use the Typescript transpiler to generate the necessary Javascript files, bringing in any of the necessary imports.

    Taking a cue from Jalibu, I used the umd output format for node_helper, since it will be running on the backend, but iife for the core module, since it will be included in the browser.

    Miscellaneous Updates

    As I was looking at code that had not been touched in almost two years, I took the opportunity to update libraries. I also switched over to Jest for testing, as I am certainly more familiar with it, and I need the ability to mock to complete some of my tests. I also figured out how to implement a SASS compiler as part of rollup, so that I could generate my module CSS as well.

    To make things easier on anyone who might use this module, I added a postinstall script that performs the build task. This generates the necessary Javascript files for MagicMirror using Rollup.

    One down, one to go

    I converted MMM-PrometheusAlerts, but I need to convert MMM-StatusPageIo. Sadly, the latter may require some additional changes, since StatusPage added paging to their APIs and I am not yet in full compliance…. I’ve never had enough incidents that I needed to page. But it has been on my task list for a bit now, and moving to Typescript might give me the excuse I need to drop back in.

  • Tech Tip – You should probably lock that up…

    I have been running in to some odd issues with ArgoCD not updating some of my charts, despite the Git repository having an updated chart version. As it turns out, my configuration and lack of Chart.lock files seems to have been contributing to this inconsistency.

    My GitOps Setup

    I have a few repositories that I use as source repositories for Argo. The contain mix of my own resource definition files, which are raw manifest files, and external Helm charts. The external Helm charts use an umbrella chart to allow me the ability to add supporting resources (like secrets). My Grafana chart is a great example of it.

    Prior to this, I was not including the Chart.lock file in the repository. This made it easier to update the version in the Chart.yaml file without having to run a helm dependency update to update the lock file. I have been running this setup for at least a year, and I never really noticed much problem until recently. There were a few times where things would not update, but nothing systemic.

    And then it got worse

    More recently, however, I noticed that the updates weren’t taking. I saw the issue with both the Loki and Grafana charts: The version was updated, but Argo was looking at the old version.

    I tried hard refreshes on the Applications in Argo, but nothing seemed to clear that cache. I poked around in the logs and noticed that Argo runs helm dependency build, not helm dependency update. That got me thinking “What’s the difference?”

    As it turns out, build operates using the Chart.lock file if it exists, otherwise it acts like upgrade. upgrade uses the Chart.yaml file to install the latest.

    Since I was not committing my Chart.lock file, it stands to reason that somewhere in Argo there is a cached copy of a Chart.lock file that was generated by helm dependency build. Even though my Chart.yaml was updated, Argo was using the old lock file.

    Testing my hypothesis

    I committed a lock file 😂! Seriously, I ran helm dependency update locally to generate a new lock file for my Loki installation and committed it to the repository. And, even though that’s the only file that changed, like magic, Loki determined it needed an update.

    So I need to lock it up. But, why? Well, the lock file exists to ensure that subsequent builds use the exact version you specify, similar to npm and yarn. Just like npm and yarn, helm requires a command to be run to update libraries or dependencies.

    By not committing my lock file, the possibility exists that I could get a different version than I intended or, even worse, get a spoofed version of my package. The lock file maintains a level of supply chain security.

    Now what?

    Step 1 is to commit the missing lock files.

    At both work and home I have Powershell scripts and pipelines that look for potential updates to external packages and create pull requests to get those updates applied. So step 2 is to alter those scripts to run helm dependency update when the Chart.yaml is updated, which will update the Chart.lock and alleviate the issue.

    I am also going to dig into ArgoCD a little bit to see where these generated Chart.lock values could be cached. In testing, the only way around it was to delete the entire ApplicationSet, so I’m thinking that the ApplicationSet controller may be hiding some data.

  • Rollback saved my blog!

    As I was upgrading WordPress from 6.2.2 to 6.3.0, I ran into a spot of trouble. Thankfully, ArgoCD rollback was there to save me.

    It’s a minor upgrade…

    I use the Bitnami WordPress chart as the template source for Argo to deploy my blog to one of my Kubernetes clusters. Usually, an upgrade is literally 1, 2, 3:

    1. Get the latest chart version for the WordPress Bitnami chart. I have a Powershell script for that.
    2. Commit the change to my ops repo.
    3. Go into ArgoCD and hit Sync

    That last one caused some problems. Everything seemed to synchronize, but the WordPress pod stopped at the connect to database section. I tried restarting the pod, but nothing.

    Now, the old pod was still running. So, rather than mess with it, I used Argo’s rollback functionality to roll the WordPress application back to it’s previous commit.

    What happened?

    I’m not sure. You are able to upgrade WordPress from the admin panel, but, well, that comes at a potential cost: If you upgrade the database as part of the WordPress upgrade, but then you “lose” the pod, well, you lose the application upgrade but not the database upgrade, and you are left in a weird state.

    So, first, I took a backup. Then, I started poking around in trying to run an upgrade. That’s when I ran into this error:

    Unknown command "FLUSHDB"

    I use the WordPress Redis Object Cache to get that little “spring” in my step. It seemed to be failing on the FLUSHDB command. At that point, I was stuck in a state where the application code was upgraded but the database was not. So I restarted the deployment and got back to 6.2.2 for both application code and database.

    Disabling the Redis Cache

    I tried to disable the Redis plugin, and got the same FLUSHDB error. As it turns out, the default Bitnami Redis chart disables these commands, but it would seem that the WordPress plugin still wants them.

    So, I enabled the commands in my Redis instance (a quick change in the values files) and then disable the Redis Cache plugin. After that, I was able to upgrade to WordPress 6.3 through the UI.

    From THERE, I clicked Sync in ArgoCD, which brought my application pods up to 6.3 to match my database. Then I re-enabled the Redis Plugin.

    Some research ahead

    I am going to check with the maintainers of the Redis Object Cache plugin. If they are relying on commands that are disabled by default, it most likely caused some issues in my WordPress upgrade.

    For now, however, I can sleep under the warm blanket of Argo roll backs!