• ArgoCD panicked a little…

    I ran into an odd situation last week with ArgoCD, and it took a bit of digging to figure it out. Hopefully this helps someone else along the way.

    Whatever you do, don’t panic!

    Well, unless of course you are ArgoCD.

    I have a small Azure DevOps job that runs nightly and attempts to upgrade some of the Helm charts that I use to deploy external tools. This includes things like Grafana, Loki, Mimir, Tempo, ArgoCD, External Secrets, and many more. This job deploys the changes to my GitOps repositories, and if there are changes, I can manually sync.

    Why not auto-sync, you might ask? Visibility, mostly. I like to see what changes are being applied, in case there is something bigger in the changes that needs my attention. I also like to “be there” if something breaks, so I can rollback quickly.

    Last week, while upgrading Grafana and Tempo, ArgoCD started throwing the following error on sync:

    Recovered from panic: runtime error: invalid memory address or nil pointer

    A quick trip to Google produced a few different results, but nothing immediately apparent. One particular issue mentioned that they had a problem with out-of-date resources (old apiversion). Let’s put a pin in that.

    Nothing was jumping out, and my deployments were still working. I had a number of other things on my plate, so I let this slide for a few days.

    Versioning….

    When I finally got some time to dig into this, I figured I would pull at that apiversion string and see what shook loose. Unfortunately, as there is no real good error as to which resource is causing it, it was luck of the draw as to whether or not I found the offender. This time, I was lucky.

    My ExternalSecret resources were using some alpha versions, so my first thought was to update to the v1 version. Lowe and behold, that fixed the two charts which were failing.

    This, however, leads to a bigger issue: if ArgoCD is not going to inform me when I have out of date apiversion values for a resource, I am going to have to figure out how to validate these resources sometime before I commit the changes. I’ll put this on my ever growing to do list.

  • Upgrading the Home Network – A New Gateway

    I have run a Unifi Security Gateway (USG) for a while now. In conjunction with three wireless access points, the setup has been pretty robust. The only area I have had some trouble in is the controller software.

    I run the controller software on one of my K8 clusters. The deployment is fairly simple, but if the pod dies unexpectedly, it can cause the MongoDB to become corrupted. It’s happened enough that I religiously back up the controller, and restoring isn’t too terribly painful.

    Additionally, the server and cluster are part of my home lab. If they die, well, I will be inconvenienced, but not down and out. Except, of course, for the Unifi controller software

    Enter the Unifi Cloud Gateways

    Unifi has had a number of different entries into the cloud gateways, including the Dream Machine. The price point was a barrier to entry, especially since I do not really need everything that the Dream Machine line has to offer.

    Recently, they released gateways in a compact form factor. The Cloud Gateway Ultra and Cloud Gateway Max are more reasonably priced, and the Gateway Max allows for the full Unifi application suite in that package. I have been stashing away some cash for network upgrades, and the Cloud Gateway Max seemed like a good first step.

    Network Downtime

    It has become a disturbing fact that I have to schedule network downtime in my own home. With about 85 network connected devices, if someone is home, they are probably on the network. Luckily I found some time to squeeze it in while people were not home.

    The process was longer than expected: the short version is, I was not able to successfully restore a back of my old controller on the new gateway. My network configuration is not that complex, though, so I just recreated the necessary networks and WiFi SSIDs, and things we back up.

    I did face the long and arduous process of making sure all of my static IP assignments were moved from the old system to the new one. I had all the information, it was just tedious copy and paste.

    All in all, it took me about 90 minutes to get everything setup… Thankfully no one complained.

    Unexpected Bonus

    The UCG-Max has 4 ports plus a WAN Port, whereas the USG only had 2 ports plus a WAN port. I never utilized the extra port on the USG: everything went through my switch.

    However, with 3 open ports on the UCG-Max, I can move my APs onto their own port, effectively splitting wireless traffic from wired traffic until it hits the gateway. I don’t know how much of a performance effect will have, but it will be nice to see the difference between wireless and wired internet traffic.

    More To Come…. but not soon

    I have longer term plans for upgrades to my switch and wireless APs, but I am back to zero when it comes to “money saved for network upgrades.” I’ll have to be deliberate in my next upgrades, but hopefully the time won’t be measure in years.

  • Migrating to Github Packages

    I have been running a free version of Proget locally for years now. It served as a home for Nuget packages, Docker images, and Helm charts for my home lab projects. But, in an effort to slim down the apps that are running in my home lab, I took a look at some alternatives.

    Where can I put my stuff?

    When I logged in to my Proget instance and looked around, it occurred to me that I only had 3 types of feeds: Nuget packages, Docker images, and Helm charts. So to move off of Proget, I need to find replacements for all of these.

    Helm Charts

    Back in the heady days of using Octopus Deploy for my home lab, I used published Helm charts to deploy my applications. However, since I switched to a Gitops workflow with ArgoCD, I haven’t published a Helm chart in a few years. I deleted that feed in Proget. One down, two to go.

    Nuget Packages

    I have made a few different attempts to create Nuget packages for public consumption. A number of years ago, I tried publishing a data layer that was designed to be used across platforms (think APIs and mobile applications), but even I stopped using that in favor of Entity Framework Core and good old fashioned data models. More recently, I created some “platform” libraries to encapsulate some of the common code that I use in my APIs and other projects. They serve as utility libraries as well as a reference architecture for my professional work.

    There are a number of options for hosting Nuget feeds, with varying costs depending on structure. I considered the following options:

    • Azure DevOps Artifacts
    • Github Packages
    • Nuget.org

    I use Azure DevOps for my builds, and briefly considered using the artifacts feeds. However, none of my libraries are private. Everything I am writing is a public repository in Github. With that in mind, it seemed that the free offerings from Github and Nuget were more appropriate.

    I published the data layer packages to Nuget previously, so I have some experience with that. However, with these platform libraries, while they are public, I do not expect them to be heavily used. For that reason, I decided that publishing the packages to Github Packages made a little more sense. If these platform libraries get to the point where they are heavily used, I can always publish stable packages to Nuget.org.

    Container Images

    In terms of storage percentage, container images take up the bulk of my Proget storage. Now, I only have 5 container images, but I never clean anything up, so those 5 containers are taking up about 7 GB of data. When I was investigating alternatives, I wanted to make sure I had some way to clean up old pre-release tags and manifests to keep my usage down.

    I considered two alternatives:

    • Azure Container Registry
    • Github Container Registry

    An Azure Container Registry instance would cost me about $5 a month and provide me with 10 GB of storage. Github Container Registry provides 500MB of storage and 1GB of Data transfer per month, but that is for private repositories.

    As with my Nuget packages, nothing that I have is private. Github packages are free for public packages. Additionally, I found a Github task that will clean up Github the images. As this was one of my “new” requirements, I decided to take a run at Github packages.

    Making the switch

    With my current setup, the switch was fairly simple. Nuget publishing is controlled by my Azure DevOps service connections, so I created a new service connection for my Github feed. The biggest change was some housekeeping to add appropriate information to the Nuget package itself. This included added the RepositoryUrl property on the .csproj files. This tells Github which repository to associate the package with.

    Container registry wasn’t much different, and again, some housekeeping in adding the appropriate labels to the images. From there, a few template changes and the images were in the Github container registry.

    Overall, the changes were pretty minimal. I have a few projects left to convert, and once that is done, I can decommission my Proget instance.

    Next on the chopping block…

    I am in the beginning stages of evaluating Azure Key Vault as a replacement for my Hashicorp Vault instance. Although it comes at a cost, for my usage it is most likely under $3 a month, and getting away from self-hosted secrets management would make me a whole lot happier.

  • Platform Engineering

    As I continue to build out some reference architecture applications, I realized that there was a great deal of boilerplate code that I add to my APIs to get things running. Time for a library!

    Enter the “Platform”

    I am generally terrible at naming things, but Spydersoft.Platform seemed like a good base namespace for this one. The intent is to put the majority of my boilerplate code into a set of libraries that can be referenced to make adding stuff easier.

    But, what kind of “stuff?” Well, for starters

    • Support for OpenTelemetry trace, metrics, and logging
    • Serilog logging for console logging
    • Simple JWT identity authentication (for my APIs)
    • Default Health Check endpoints

    Going deep with Health Checks

    The first three were pretty easy: just some POCOs for options and then startup extensions to add the necessary items with the proper configuration. With health checks, however, I went a little overboard.

    My goal was to be able to implement IHealthCheck anywhere and decorate it in such a way that it would be added to the health check framework and could be tagged. Furthermore, I wanted to use tags to drive standard endpoints.

    In the end, I used a custom attribute and some reflection to add the checks that are found in the loaded AppDomain. I won’t bore you: the documentation should do that just fine.

    But can we test it?

    Testing startup extensions is, well, interesting. Technically, it is an integration test, but I did not want to setup playwright tests to execute the API tests. Why? Well, usually API integration tests are run again a particular configuration, but in this case, I needed to run the reference application with a lot of different configurations in order to fully test the extensions. Enter WebApplicationFactory.

    With WebApplicationFactory, I was able to configure tests to stand up a copy of the reference application with different configurations. I could then verify the configuration using some custom health checks.

    I am on the fence as to whether or not this is a “unit” test or an “integration” test. I’m not calling out to any other application, which is usually the definition of an integration test. But I did have to configure a reference application in order to get things tested.

    Whatever you call it, I have coverage on my startup extensions, and even caught a few bugs while I was writing the tests.

    Make it truly public?

    Right now, the build publishes the Nuget package to my private nuget feed. I am debating on moving it to Nuget (or maybe Github’s package feeds). While the code is open source, I want to make the library openly available. But until I make the decision on where to put it, I will keep it in my private feed. If you have any interest in it, watch or star the repo in GitHub: it will help me gauge the level of interest.

  • Supporting a GitHub Release Flow with Azure DevOps Builds

    It has been a busy few months, and with the weather changing, I have a little more time in front of the computer for hobby work. Some of my public projects were in need of a few package updates, so I started down that road. Most of the updates were pretty simple: a few package updates and some Azure DevOps step template updates and I was ready to go. However, I had been delaying my upgrade to GitVersion 6, and in taking that leap, I changed my deployment process slightly.

    Original State

    My current development process supports three environments: test, stage, and production. Commits to feature/* branches are automatically deployed to the test environment, and any builds from main are first deployed to stage and then can be deployed to production.

    For me, this works: I am usually only working on one branch at a time, so publishing feature branches to the test environment works. When I am done with a branch, I merge it into main and get it deployed.

    New State

    As I have been working through some processes at work, it occurred to me that versions are about release, not necessarily commits. While commits can help us number releases, they shouldn’t be the driving force. GitVersion 6 and its new workflow defaults drive this home.

    So my new state would be pretty similar: feature/* branches get deployed to the test environment automatically. The difference lies in main: I no longer want to release with every commit to main. I want to be able to control releases through the use of tags (and GitHub releases, which generate tags.

    So I flipped over to GitVersion 6 and modified my GitVersion.yml file:

    workflow: GitHubFlow/v1
    merge-message-formats:
      pull-request: 'Merge pull request \#(?<PullRequestNumber>\d+) from'
    branches:
      feature:
        mode: ContinuousDelivery

    I modified my build pipeline to always build, but only trigger code release for feature/* branch builds and builds from a tag. I figured this would work fine, but Azure DevOps threw me a curve ball.

    Azure DevOps Checkouts

    When you build from a tag, Azure DevOps checks that tag out directly, using the /tags/<tagname> branch reference. When I tried to run GitVersion on this, I got a weird branch number: A build on tag 1.3.0 resulted in 1.3.1-tags-1-3-0.1.

    I dug into GitVersion’s default configuration, and noticed this corresponded with the unknown branch configuration. To get around Azure Devops, I had to do configure the tags/ branches:

    workflow: GitHubFlow/v1
    merge-message-formats:
      pull-request: 'Merge pull request \#(?<PullRequestNumber>\d+) from'
    branches:
      feature:
        mode: ContinuousDelivery
      tags:
        mode: ManualDeployment
        label: ''
        increment: Inherit
        prevent-increment:
          when-current-commit-tagged: true
        source-branches:
        - main
        track-merge-message: true
        regex: ^tags?[/-](?<BranchName>.+)
        is-main-branch: true

    This treats tags as main branches when calculating the version.

    Caveat Emptor

    This works if you ONLY tag your main branch. If you are in the habit of tagging other branches, this will not work for you. However, I only ever release from main branches, and I am in a fix-forward scenario, so this works for me. If you use release/* branches and need builds from there, you may need additional GitVersion configuration to get the correct version numbers to generate.

  • When “as code” makes a difference

    I spent a considerable amount of time setting up my home lab with a high degree of infrastructure and deployment “as code.” Googling “Infrastructure as Code” or “Declarative GitOps” will highlight the breadth of this topic, and I have no less than 10 different posts on my current setup. So what did all this effort get me?

    Effortless Updates

    A quick Powershell script lets me update my GitOps repositories with the latest versions of the applications I am running. With the configurability of ArgoCD, however, those updates are not immediately rolled out. My ArgoCD configurations are setup for manual sync, which gives me the ability to compare changes before they are applied.

    Could I automatically sync? Well, sure, and 9 times out of 10, it would work just fine. But more than once, I ran into updates which required some additional preparation or conversion, so I still have the ability to hold off on upgrades until I am ready.

    Helpful Rollbacks

    Even after synchronization, sometimes things do not go according to plan. Recently, as an example, an upgrade to Argo 2.12 broke my application sets because of a templating issue. Had I been manually managing my applications, that would have meant a manual downgrade or a hacky workaround. Now, well, I just rolled back to the previous version that I had deployed and will patiently await a fix.

    Disaster Recovery

    My impatience caused me to wreck my non-production cluster beyond repair. With my declarative GitOps setup, restoring that cluster was pretty simple:

    • Create a new cluster
    • Add the new cluster to ArgoCD
    • Modify the cluster secret in Argo with labels to install my cluster tools
    • Modify the applications to use the new cluster URL

    As it was my non-production instance, I did not have any volumes/data that needed transferred over, so I have not yet tested that particular bit. However, since my volumes are mounted with consistent name generation, I believe data transfers should work equally well.

    Conclusion

    Even in my home lab, a level of “as code” helps keep things running smoothly. You should try it!

  • A Quick WSL Swap

    I have been using WSL and Ubuntu 22.04 a lot more in recent weeks. From virtual environments for Python development to the ability to use Podman to run container images, the tooling supports some of the work I do much better than Windows does.

    But Ubuntu 22.04 is old! I love the predictable LTS releases, but two years is an eternity in software, and I was looking forward to the 22.04 release.

    Upgrade or Fresh Start?

    I looked at a few options for upgrading my existing Ubuntu 22.04 WSL instance, but I really did not like what I read. The guidance basically suggested it was a “try at your own risk” scenario.

    I took a quick inventory of what was actually on my WSL image. As it turns out, not too much. Aside from some of my standard profile settings, I only have a few files that were not available in some of my Github repositories. Additionally, since you can have multiple instances of WSL running, the easiest solution I could find was to stand up a new 24.04 image and copy my settings and files over.

    Is that it?

    Shockingly, yes. Installing 24.04 is as simple as opening it in the Microsoft store and downloading it. Once that was done, I ran through the quick provisioning to setup the basics, and then copied my profile and file.

    I was able to utilize scp for most of the copying, although I also realized that I could copy files from Windows using the \\wsl.localhost paths. Either way, it didn’t take very long before I had Ubuntu 24.04 up and running.

    I still have 22.04 installed, and I haven’t deleted that image just yet. I figure I’ll keep it around for another month and, if I don’t have to turn it back on, I probably don’t need anything on it.

  • My Very Own Ship of Theseus

    A while back, I wrote a little about how the “Ship of Theseus” thought experiment has parallels to software design. What I did not realize is that I would end up running into a physical “Ship of Theseus” of my own.

    Just another day

    On a day where I woke up to stories of how a Crowdstrike update wreaked havoc with thousands of systems, I was overly content with my small home lab setup. No Crowdstrike installed, primarily Ubuntu nodes… Nothing to worry about, right?

    Confident that I was in the clear, I continued the process of cycling my Kubernetes nodes to use Ubuntu 24.04. I have been pretty methodical about this, just to make sure I am not going to run into anything odd. Having converted my non-production cluster last week, I started work on my internal cluster. I got the control plane nodes updated, but the first agent I tried was not spinning up correctly.

    Sometimes my server gets a little busy, and a quick reset helps clear some of the background work. So I reset… And it never booted again.

    What Happened?

    The server would boot to a certain point (right after the Thermal Calibration step), hang for about 10-15 minutes, and then report a drive array failure. Uh oh…

    I dug through some logs on the Integrated Lights Out system and did some Google sleuthing on the errors I was seeing. The conclusion I came to was that the on-board drive controller went kaput. At this point, I was dead in the water. And then I remembered I had another server…

    Complete Swap

    The other server was much lighter on spec: a single 8 core CPU, 64 GB of RAM, and nowhere near the disk space. Not to mention, with a failed drive controller, I wasn’t getting any data off of those RAID disks.

    But the servers themselves are both HP ProLiant DL380P Gen 8 servers. So I starting thinking, could I just transfer everything except the system board to the backup server?

    The short answer: Yes.

    I pulled all the RAM modules and installed them in the backup. I pulled both CPUs from the old server and installed them in the backup. I pulled all of the hard drives out and installed them in the backup. I even transferred both power backplanes so that I would have dual plugs.

    The Moment of Truth

    After all that was done, I plugged it back in and logged in to the backup server’s ILO. It started up, but pointed me to the RAID utilities, because one of the arrays needed rebuilt. A few hours later, the drives were rebuilt, and I restarted. Much to my shock, it booted up as if it were the old server.

    Is it a new server? or just a new system board in the old server? All I know is, it is running again.

    Now, however, I’m down on replacement parts, so I’m going to have to start thinking about either stocking up some replacements or looking in to a different lab setup.

  • Moving to Ubuntu 24.04

    I have a small home lab running a few Kubernetes clusters, and a good bit of automation to deal with provisioning servers for the K8 clusters. All of my Linux VMs are based on Ubuntu 22.04. I prefer to stick with LTS for stability and compatibility.

    As April turns into July (missed some time there), I figured Ubuntu’s latest LTS (24.04) has matured to the point that I could start the process of updating my VMs to the new version.

    Easier than Expected

    In my previous move from 20.04 to 22.04, there were some changes to the automated installers for 22.04 that forced me down the path of testing my packer provisioning with the 22.04 ISOs. I expected similar changes with 24.04. I was pleasantly surprised when I realized that my existing scripts should work well with the 24.04 ISOs.

    I did spend a little time updating the Azure DevOps pipeline that builds a base image so that it supports building both a 22.04 and 24.04 image. I want to make sure I have the option to use the 22.04 images, should I find a problem with 24.04

    Migrating Cluster Nodes

    With a base image provisioned, I followed my normal process for upgrading cluster nodes on my non-production cluster. There were a few hiccups, mostly around some of my automated scripts that needed to have the appropriate settings to set hostnames correctly.

    Again, other than some script debugging, the process worked with minimal changes to my automation scripts and my provisioning projects.

    Azure DevOps Build Agent?

    Perhaps in a few months. I use the GitHub runner images as a base for my self-hosted agents, but there are some changes that need manual review. I destroy my Azure DevOps build agent weekly and generate a new one, and that’s a process that I need to make sure continues to work through any changes.

    The issue is typically time: the build agents take a few hours to provision because of all the tools that are installed. Testing that takes time, so I have to plan ahead. Plus, well, it is summertime, and I’d much rather be in the pool than behind the desk.

  • Drop that zero…

    I ran into a very weird issue with Nuget packages and the old packages.config reference style.

    Nuget vs Semantic Versioning

    Nuget grew up in Windows, where assembly version numbers support four numbers: major.minor.build.revision. Therefore, NugetVersion supports all four version segments. Semantic versioning, on the other hand, supports three numbers plus additional labels.

    As part of Nuget’s version normalization, in an effort to better support semantic versioning, the fourth segment version is dropped if it’s zero. So 1.2.3.0 becomes 1.2.3. In general, this does not present any problems, since the version numbers are retrieved from the feed by the package manager tools and references updated accordingly.

    Always use the tools provided

    When you ignore the tooling, well, stuff can get weird. This is particularly true in the old packages.config reference style.

    In that style, packages are listed in a packages.config file, and the .Net project file adds a reference to the DLL with a HintPath. That HintPath includes the folder where the package is installed, something like this:

     <ItemGroup>
        <Reference Include="MyCustomLibrary, Version=1.2.3.4, Culture=neutral, processorArchitecture=MSIL">
          <HintPath>..\packages\MyCustomLibrary.1.2.3.4\lib\net472\MyCustomLibrary.dll</HintPath>
        </Reference>
    </ItemGroup>

    But, for argument’s sake, let us assume we publish a new version of MyCustomLibrary, version 1.2.4. Even though the AssemblyVersion might be 1.2.4.0, the Nuget version will be normalized to 1.2.4. And, instead of upgrading the package using one of the package manager tools, you just update the reference file manually, like this:

    <ItemGroup>
        <Reference Include="MyCustomLibrary, Version=1.2.4.0, Culture=neutral, processorArchitecture=MSIL">
          <HintPath>..\packages\MyCustomLibrary.1.2.4.0\lib\net472\MyCustomLibrary.dll</HintPath>
        </Reference>
    </ItemGroup>

    This can cause weird issues. It will most likely build with a warning about not being able to find the DLL. Depending on how the package is used or referenced, you may not get a build error (I didn’t get one). But the build did not include the required library.

    Moving on…

    The “fix” is easy: use the Nuget tools (either the CLI or Visual Studio Package Manager) to update the packages. It will generate the appropriate HintPath for the package that is installed. An even better solution is to migrate to project reference style, where the project includes the Nuget references, and packages.config is not used. This presents immediate errors if an incorrect version is used.