Category: Technology

  • ArgoCD panicked a little…

    I ran into an odd situation last week with ArgoCD, and it took a bit of digging to figure it out. Hopefully this helps someone else along the way.

    Whatever you do, don’t panic!

    Well, unless of course you are ArgoCD.

    I have a small Azure DevOps job that runs nightly and attempts to upgrade some of the Helm charts that I use to deploy external tools. This includes things like Grafana, Loki, Mimir, Tempo, ArgoCD, External Secrets, and many more. This job deploys the changes to my GitOps repositories, and if there are changes, I can manually sync.

    Why not auto-sync, you might ask? Visibility, mostly. I like to see what changes are being applied, in case there is something bigger in the changes that needs my attention. I also like to “be there” if something breaks, so I can rollback quickly.

    Last week, while upgrading Grafana and Tempo, ArgoCD started throwing the following error on sync:

    Recovered from panic: runtime error: invalid memory address or nil pointer

    A quick trip to Google produced a few different results, but nothing immediately apparent. One particular issue mentioned that they had a problem with out-of-date resources (old apiversion). Let’s put a pin in that.

    Nothing was jumping out, and my deployments were still working. I had a number of other things on my plate, so I let this slide for a few days.

    Versioning….

    When I finally got some time to dig into this, I figured I would pull at that apiversion string and see what shook loose. Unfortunately, as there is no real good error as to which resource is causing it, it was luck of the draw as to whether or not I found the offender. This time, I was lucky.

    My ExternalSecret resources were using some alpha versions, so my first thought was to update to the v1 version. Lowe and behold, that fixed the two charts which were failing.

    This, however, leads to a bigger issue: if ArgoCD is not going to inform me when I have out of date apiversion values for a resource, I am going to have to figure out how to validate these resources sometime before I commit the changes. I’ll put this on my ever growing to do list.

  • Upgrading the Home Network – A New Gateway

    I have run a Unifi Security Gateway (USG) for a while now. In conjunction with three wireless access points, the setup has been pretty robust. The only area I have had some trouble in is the controller software.

    I run the controller software on one of my K8 clusters. The deployment is fairly simple, but if the pod dies unexpectedly, it can cause the MongoDB to become corrupted. It’s happened enough that I religiously back up the controller, and restoring isn’t too terribly painful.

    Additionally, the server and cluster are part of my home lab. If they die, well, I will be inconvenienced, but not down and out. Except, of course, for the Unifi controller software

    Enter the Unifi Cloud Gateways

    Unifi has had a number of different entries into the cloud gateways, including the Dream Machine. The price point was a barrier to entry, especially since I do not really need everything that the Dream Machine line has to offer.

    Recently, they released gateways in a compact form factor. The Cloud Gateway Ultra and Cloud Gateway Max are more reasonably priced, and the Gateway Max allows for the full Unifi application suite in that package. I have been stashing away some cash for network upgrades, and the Cloud Gateway Max seemed like a good first step.

    Network Downtime

    It has become a disturbing fact that I have to schedule network downtime in my own home. With about 85 network connected devices, if someone is home, they are probably on the network. Luckily I found some time to squeeze it in while people were not home.

    The process was longer than expected: the short version is, I was not able to successfully restore a back of my old controller on the new gateway. My network configuration is not that complex, though, so I just recreated the necessary networks and WiFi SSIDs, and things we back up.

    I did face the long and arduous process of making sure all of my static IP assignments were moved from the old system to the new one. I had all the information, it was just tedious copy and paste.

    All in all, it took me about 90 minutes to get everything setup… Thankfully no one complained.

    Unexpected Bonus

    The UCG-Max has 4 ports plus a WAN Port, whereas the USG only had 2 ports plus a WAN port. I never utilized the extra port on the USG: everything went through my switch.

    However, with 3 open ports on the UCG-Max, I can move my APs onto their own port, effectively splitting wireless traffic from wired traffic until it hits the gateway. I don’t know how much of a performance effect will have, but it will be nice to see the difference between wireless and wired internet traffic.

    More To Come…. but not soon

    I have longer term plans for upgrades to my switch and wireless APs, but I am back to zero when it comes to “money saved for network upgrades.” I’ll have to be deliberate in my next upgrades, but hopefully the time won’t be measure in years.

  • Migrating to Github Packages

    I have been running a free version of Proget locally for years now. It served as a home for Nuget packages, Docker images, and Helm charts for my home lab projects. But, in an effort to slim down the apps that are running in my home lab, I took a look at some alternatives.

    Where can I put my stuff?

    When I logged in to my Proget instance and looked around, it occurred to me that I only had 3 types of feeds: Nuget packages, Docker images, and Helm charts. So to move off of Proget, I need to find replacements for all of these.

    Helm Charts

    Back in the heady days of using Octopus Deploy for my home lab, I used published Helm charts to deploy my applications. However, since I switched to a Gitops workflow with ArgoCD, I haven’t published a Helm chart in a few years. I deleted that feed in Proget. One down, two to go.

    Nuget Packages

    I have made a few different attempts to create Nuget packages for public consumption. A number of years ago, I tried publishing a data layer that was designed to be used across platforms (think APIs and mobile applications), but even I stopped using that in favor of Entity Framework Core and good old fashioned data models. More recently, I created some “platform” libraries to encapsulate some of the common code that I use in my APIs and other projects. They serve as utility libraries as well as a reference architecture for my professional work.

    There are a number of options for hosting Nuget feeds, with varying costs depending on structure. I considered the following options:

    • Azure DevOps Artifacts
    • Github Packages
    • Nuget.org

    I use Azure DevOps for my builds, and briefly considered using the artifacts feeds. However, none of my libraries are private. Everything I am writing is a public repository in Github. With that in mind, it seemed that the free offerings from Github and Nuget were more appropriate.

    I published the data layer packages to Nuget previously, so I have some experience with that. However, with these platform libraries, while they are public, I do not expect them to be heavily used. For that reason, I decided that publishing the packages to Github Packages made a little more sense. If these platform libraries get to the point where they are heavily used, I can always publish stable packages to Nuget.org.

    Container Images

    In terms of storage percentage, container images take up the bulk of my Proget storage. Now, I only have 5 container images, but I never clean anything up, so those 5 containers are taking up about 7 GB of data. When I was investigating alternatives, I wanted to make sure I had some way to clean up old pre-release tags and manifests to keep my usage down.

    I considered two alternatives:

    • Azure Container Registry
    • Github Container Registry

    An Azure Container Registry instance would cost me about $5 a month and provide me with 10 GB of storage. Github Container Registry provides 500MB of storage and 1GB of Data transfer per month, but that is for private repositories.

    As with my Nuget packages, nothing that I have is private. Github packages are free for public packages. Additionally, I found a Github task that will clean up Github the images. As this was one of my “new” requirements, I decided to take a run at Github packages.

    Making the switch

    With my current setup, the switch was fairly simple. Nuget publishing is controlled by my Azure DevOps service connections, so I created a new service connection for my Github feed. The biggest change was some housekeeping to add appropriate information to the Nuget package itself. This included added the RepositoryUrl property on the .csproj files. This tells Github which repository to associate the package with.

    Container registry wasn’t much different, and again, some housekeeping in adding the appropriate labels to the images. From there, a few template changes and the images were in the Github container registry.

    Overall, the changes were pretty minimal. I have a few projects left to convert, and once that is done, I can decommission my Proget instance.

    Next on the chopping block…

    I am in the beginning stages of evaluating Azure Key Vault as a replacement for my Hashicorp Vault instance. Although it comes at a cost, for my usage it is most likely under $3 a month, and getting away from self-hosted secrets management would make me a whole lot happier.

  • My Very Own Ship of Theseus

    A while back, I wrote a little about how the “Ship of Theseus” thought experiment has parallels to software design. What I did not realize is that I would end up running into a physical “Ship of Theseus” of my own.

    Just another day

    On a day where I woke up to stories of how a Crowdstrike update wreaked havoc with thousands of systems, I was overly content with my small home lab setup. No Crowdstrike installed, primarily Ubuntu nodes… Nothing to worry about, right?

    Confident that I was in the clear, I continued the process of cycling my Kubernetes nodes to use Ubuntu 24.04. I have been pretty methodical about this, just to make sure I am not going to run into anything odd. Having converted my non-production cluster last week, I started work on my internal cluster. I got the control plane nodes updated, but the first agent I tried was not spinning up correctly.

    Sometimes my server gets a little busy, and a quick reset helps clear some of the background work. So I reset… And it never booted again.

    What Happened?

    The server would boot to a certain point (right after the Thermal Calibration step), hang for about 10-15 minutes, and then report a drive array failure. Uh oh…

    I dug through some logs on the Integrated Lights Out system and did some Google sleuthing on the errors I was seeing. The conclusion I came to was that the on-board drive controller went kaput. At this point, I was dead in the water. And then I remembered I had another server…

    Complete Swap

    The other server was much lighter on spec: a single 8 core CPU, 64 GB of RAM, and nowhere near the disk space. Not to mention, with a failed drive controller, I wasn’t getting any data off of those RAID disks.

    But the servers themselves are both HP ProLiant DL380P Gen 8 servers. So I starting thinking, could I just transfer everything except the system board to the backup server?

    The short answer: Yes.

    I pulled all the RAM modules and installed them in the backup. I pulled both CPUs from the old server and installed them in the backup. I pulled all of the hard drives out and installed them in the backup. I even transferred both power backplanes so that I would have dual plugs.

    The Moment of Truth

    After all that was done, I plugged it back in and logged in to the backup server’s ILO. It started up, but pointed me to the RAID utilities, because one of the arrays needed rebuilt. A few hours later, the drives were rebuilt, and I restarted. Much to my shock, it booted up as if it were the old server.

    Is it a new server? or just a new system board in the old server? All I know is, it is running again.

    Now, however, I’m down on replacement parts, so I’m going to have to start thinking about either stocking up some replacements or looking in to a different lab setup.

  • Moving to Ubuntu 24.04

    I have a small home lab running a few Kubernetes clusters, and a good bit of automation to deal with provisioning servers for the K8 clusters. All of my Linux VMs are based on Ubuntu 22.04. I prefer to stick with LTS for stability and compatibility.

    As April turns into July (missed some time there), I figured Ubuntu’s latest LTS (24.04) has matured to the point that I could start the process of updating my VMs to the new version.

    Easier than Expected

    In my previous move from 20.04 to 22.04, there were some changes to the automated installers for 22.04 that forced me down the path of testing my packer provisioning with the 22.04 ISOs. I expected similar changes with 24.04. I was pleasantly surprised when I realized that my existing scripts should work well with the 24.04 ISOs.

    I did spend a little time updating the Azure DevOps pipeline that builds a base image so that it supports building both a 22.04 and 24.04 image. I want to make sure I have the option to use the 22.04 images, should I find a problem with 24.04

    Migrating Cluster Nodes

    With a base image provisioned, I followed my normal process for upgrading cluster nodes on my non-production cluster. There were a few hiccups, mostly around some of my automated scripts that needed to have the appropriate settings to set hostnames correctly.

    Again, other than some script debugging, the process worked with minimal changes to my automation scripts and my provisioning projects.

    Azure DevOps Build Agent?

    Perhaps in a few months. I use the GitHub runner images as a base for my self-hosted agents, but there are some changes that need manual review. I destroy my Azure DevOps build agent weekly and generate a new one, and that’s a process that I need to make sure continues to work through any changes.

    The issue is typically time: the build agents take a few hours to provision because of all the tools that are installed. Testing that takes time, so I have to plan ahead. Plus, well, it is summertime, and I’d much rather be in the pool than behind the desk.

  • Cleaning out the cupboard

    I have been spending a little time in my server cabinet downstairs, trying to organize some things. I took what I thought would be a quick step in consolidation. It was not as quick as I had hoped.

    POE Troubles

    When I got into the cabinet, I realized I had 3 POE injectors in there, powering my three Unifi Access Points. Two of them are the UAP-AC-LR, and the third is a UAP-AC-M. My desire was simple: replace 3 POE injectors with a 5 port PoE switch.

    So, I did what I thought would be a pretty simple process:

    1. Order the switch
    2. Using the MAC, assign it a static IP in my current Unifi Gateway DHCP.
    3. Plug in the switch.
    4. Take the cable coming out of the POE injector and plug it into the switch.

    And that SHOULD be it: devices boot up and I remove the POE injectors. And, for two of the three devices, it worked fine.

    There’s always one

    One of the UAP-AC-LR endpoints simply would not turn on. I thought maybe it was the cable. So I checked the different cables, but still nothing. I swapped out the cables and nothing changed: the one UAP-AC-LRs and the UAP-AC-M worked, but the other UAP-AC-LR did not work.

    I consulted the Oracle and came to realize that I had an old UAP-AC-LR, which only supports a 24v Passive PoE, not the 48v standard that my switch supports. Obviously, the newer UAP-AC-LR and the UAP-AC-M have support 802.3at (or at least a legacy protocols for 48v), but my oldest UAP-AC-LR simply doesn’t turn on.

    The Choice

    There are two solutions, one more expensive than another:

    1. Find an indoor PoE Converter (INS-3AF-I-G) that can convert the 48V coming from my new switch to the 24v that the device needs.
    2. Upgrade! Buy a U6 Pro to replace my old long range access point.

    I like the latter, as it would give me WiFi 6 support and start my upgrade in that area. However, I’m not ready for the price tag at the moment. I was able to find the converter for about $25, and that includes shipping and tax. So I opted for the more economical route in order to get rid of that last PoE injector.

  • Building a new home for my proxy server

    With my BananaPi up and running again, it’s time to put it back in the server cabinet. But it’s a little bit of a mess down there, and I decided my new 3D modeling skills could help me build a new home for the proxy.

    Find the Model

    When creating a case for things, having a 3D model of the thing you are creating becomes crucial. Sometimes, you have to model it yourself, but I have found that grabcad.com has a plethora of models available.

    A quick search yielded a great model of the Banana PI. This one is so detailed that it has all of the individual components modeled. All I really needed/wanted was the mounting hole locations and the external ports, but this one is useful for much more. It was so detailed, in fact, that I may have added a little extra detail just because I could.

    General Design

    This case is extremely simple. The Banana Pi M5 (BPi from here on out) serves as my reverse proxy server, so all it really needs is power and a network cable. However to ensure the case was more useful, I added openings for most of the components. I say most because I fully enclosed the side with the GPIO ports. I never use the GPIO pins on this board, so there was really no need to open those up.

    For this particular case, the BPi will be mounted on the left rack, so I oriented the tabs and the board in such a way that the power/HDMI ports were facing inside the rack, not outside. This also means that the network and USB ports are in the back, which works for my use case.

    A right-mount case with power to the left would put the USB ports at the front of the rack. However, I only have one BPi, and it is going on the left, so I will not be putting that one together.

    Two Tops

    With the basic design in place, I exported the simple top, and got a little creative.

    Cool It Down…

    My BPi kit came with a few heatsinks and a 24mm fan. Considering the proxy is a 24×7 machine, and it is handling a good bit of traffic, I figured it best to keep that fan in place. So I threw a cut-out in for the fan and its mounting screws.

    Light it up!

    On the side where the SD card goes, I closed off everything except the SD card itself. This includes the LEDs. As I was going through the design, I thought that it might be nice to be able to peek into the server rack and see the power/activity LEDs. And, I mean, that rack already looks like a weird Christmas tree, what are a few more lights.

    I had to do a bit of research to actual to find the actual name for the little plastic pieces that can carry LED lights a distance. They are called “light pipes.” I found some 3mm light pipes on Amazon, and thought that would be a good add to the build.

    The detail of the BPi model I found made this task REALLY easy: I was able to locate the center of the LED and project it onto the case top. A few 3mm holes later, and the top is ready to accept light pipes.

    Put it all together

    I sent my design over to Pittsburgh3DPrints.com, which happens to be about two miles from my house. A couple days later, I had a PLA print of the model. As this is pretty much sitting in my server cabinet all day, PLA is perfect for this print.

    Oddly enough, the trick to this one was to be able to turn off the BPi to install it. I had previously setup a temporary reverse proxy as I was messing with the BPi, so I routed all the traffic from the BPi to the temp proxy, and the shutdown the BPi.

    Some Trimming Required

    As I was designing this case, I went with a best-guess for tolerances. I was a little off. The USB and audio jack cutouts needed to be taller to allow the BPi to be installed in the case. Additionally, the stands were too thick and the screw fan holes too thin. I modified these designs in drawings, however, for the printed model, I just made them a little larger with an Exact-o blade.

    I heat-set a female M3 insert into the case body. I removed the fan from the old case top and attached it to my new case top. After putting the BPi into place in the case bottom, I attached the fan wires to the GPIO ports to get power. I put the case top on, placing the tabs near the USB ports first. Screwed in an M3 bolt and dropped three light pipes into the case top. They protruded a little, so I cut them to sit flush while still transmitting light.

    Finished Product

    BPi in assembled case
    Case Components

    Overall I am happy with the print. From a design perspective, having a printer here would have alleviated some of the trimming, as I could have test printed some smaller parts before committing.

    I posted this print to Makerworld and Printables.com. Check out the full build there!

  • An epic journey…

    I got all the things I needed to diagnose my BananaPi M5 issues. And I took a very long, windy road to a very simple solution. But I learned an awful lot in the process.

    Reconstructing the BananaPi M5

    I got tired of poking around the BananaPi M5, and decided I wanted to start from scratch. The boot order of the BananaPi means that, in order to format the EMMC and start from scratch, I needed some hardware.

    I ordered a USB to Serial debug cable so that I could connect to the BananaPi (BPi from here on out), interrupt the boot sequence, and use uboot to wipe the disk (or at least the MBR). That would force the BPi to use the SD as a boot drive. From there, I would follow the same steps I did in provisioning the BPi the first time around.

    For reference, with the cable I bought, I was able to connect the debug using Putty with the following settings:

    Your COM port will probably be different: open the Device Manager to find yours.

    I also had to be a little careful about wiring: When I first hooked it up, I connected the transmit cable (white) to the Tx pin, and the receive cable (green) to the Rx pin. That gave me nothing. Then I realized that I had to swap the pins: The transmit cable (white) goes to the Rx pin, and the receive cable (green) goes to the Tx pin. Once swapped, the terminal lit up.

    I hit the reset button on the BPi, and as soon as I could, I hit Ctrl-C. This took me into the uboot console. I then followed these steps to erase the first 1000 blocks. From there, I had a “cleanish” BPi. To fully wipe the EMMC, I booted an SD card that had the BPI Ubuntu image, and wiped the entire disk:

    dd if=/dev/zero of=/dev/mmcblk0 bs=1M

    Where /dev/mmcblk0 is the address of the EMMC drive. This writes all zeros to the EMMC, and cleaned it up nicely.

    New install, same problem

    After following the steps to install Ubuntu 20.04 to the EMMC, I did an apt upgrade and a do-release-upgrade to get up to 22.04.3. And the SAME network issue reared its ugly head. Back at it with fresh eyes, I determined that something changed in the network configuration, and the cloud-init setup that had worked for this particular BPI image is no longer valid.

    What were the symptoms? I combed through logs, but the easiest identifier was, when running networkctl, eth0 was reporting as unmanaged.

    So, I did two things: First, disable the network configuration in cloud-init by changing /etc/cloud/cloud.cfg.d/99-fake_cloud.cfg to the following:

    datasource_list: [ NoCloud, None ]
    datasource:
      NoCloud:
        fs_label: BPI-BOOT
    network: { config : disable }

    Second, configure netplan by editing /etc/netplan/50-cloud-init.yaml:

    network:
        ethernets:
            eth0:
                dhcp4: true
                dhcp-identifier: mac
        version: 2

    After that, I ran netplan generate and netplan apply, and the interface now showed as managed when executing networkctl. More importantly, after a reboot, the BPi initialized the network and everything is up and running.

    Backup and Scripting

    This will be the second proxy I’ve configured in under 2 months, so, well, now is the time to write the steps down and automate if possible.

    Before I did anything, I created a bash script to copy important files off of the proxy and onto my NAS. This includes:

    • Nginx configuration files
    • Custom rsyslog file for sending logs to loki
    • Grafana Agent configuration file
    • Files for certbot/cloudflare certificate generation
    • The backup script itself.

    With those files on the NAS, I scripted out restoration of the proxy to the fresh BPi. I will plan a little downtime to make the switch: while the switchover won’t be noticeable to the outside world, some of the internal networking takes a few minutes to swap over, and I would hate to have a streaming show go down in the middle of viewing…. I would certainly take flak for that.

  • Terraform Azure DevOps

    As a continuation of my efforts to use Terraform to manage my Azure Active Directory instance, I moved my Azure DevOps instance to a Terraform project, and cleaned a lot up in the process.

    New Project, same pattern

    As I mentioned in my last post, I setup my repository to support multiple Terraform projects. So starting up an Azure DevOps Terraform project was as simple as creating a new folder in the terraform folder and setting up the basics.

    As with my Azure AD project, I’m using the S3 backend. For providers, this project only needs the Azure DevOps and Hashicorp Vault providers.

    The process was very similar to Azure AD: create resources in the project, and use terraform import to import existing resources to be managed by the project. In this case, I tried to be as methodical as possible, following the following pattern:

    1. Import a project.
    2. Import the project’s service connections.
    3. Import the project’s variable libraries.
    4. Import the project’s build pipelines.

    This order ensured that I was bringing objects into the project in an order where I could then reference them for their child projects.

    Handling Secrets

    When I got to service connections and libraries, it occurred to me that I needed to pull secrets out of my Hashicorp Vault instance to make this work smoothly. This is where the Vault provider came in handy: using the data resource type in Terraform, I could pull secrets out of my key vault and have them available for my project.

    Not only does this keep secrets out of the files (which is why I can share them all in Github), but it also means that cycling these secrets is as simple as changing the secret in Vault and then re-running the Terraform apply. While I am not yet using this to its fullest extent, I have some ambitions to cycle these secrets automatically on a weekly basis.

    Github Authentication

    One thing I ran into was the authentication between Azure DevOps and Github. The ADO UI likes to use the built-in “Github app” authentication. Meaning, when you click on the Edit button in a pipeline, ADO defaults to asking Github for “app” permissions. This also happens if you manually create a new pipeline in the User Interface. This automatically creates a service connection in the project.

    You cannot create this service connection in a Terraform project, but you can let Terraform see it as a managed resource. To do that:

    1. Find the created service connection in your Azure DevOps project.
    2. Create a new azuredevops_serviceendpoint_github resource in your Terraform Project with no authentication block. Here is mine for reference.
    3. Import the service connection to the newly created Terraform Resource.
    4. Make sure description is explicitly set to a blank string: ""

    That last step got me: If you don’t explicitly set that value to blank, the provider tried to set the description as “Managed by Terraform”. When doing that, it attempts to validate the change, and since we have no authentication block, it fails.

    What are those?!?

    An interesting side effect to this effort is seeing all the junk that exists in my Azure DevOps projects. I say “junk,” but I mean unused variable libraries and service connections. This triggered my need for digital tidiness, so rather than importing, I deleted.

    I even went so far as to review some of the areas where service connections were passed into a pipeline, but never actually used. I ended up modifying a number of my Azure DevOps pipeline templates (and documenting them) to stop requiring connections that they ultimately were not using.

    It’s not done until it is automated!

    This is all great, but the point of Terraform is to keep my infrastructure in the state I intend it to be. This means automating the application of this project. I created a template pipeline in my repository that I could easily extend for new projects.

    I have a task on my to-do list to automate the execution of the Terraform plan on a daily basis and notify me if there are unexpected changes. This will serve as an alert that my infrastructure has changed, potentially unintentionally. For now, though, I will execute the Terraform plan/apply manually on a weekly basis.

  • Terraform Azure AD

    Over the last week or so, I realized that while I bang the drum of infrastructure as code very loudly, I have not been practicing it at home. I took some steps to reconcile that over the weekend.

    The Goal

    I have a fairly meager home presence in Azure. Primarily, I use a free version of Azure Active Directory (now Entra ID) to allow for some single sign-on capabilities in external applications like Grafana, MinIO, and ArgoCD. The setup for this differs greatly among the applications, but common to all of these is the need to create applications in Azure AD.

    My goal is simple: automate provisioning of this Azure AD account so that I can manage these applications in code. My stretch goal was to get any secrets created as part of this process into my Hashicorp Vault instance.

    Getting Started

    The plan, in one word, is Terraform. Terraform has a number of providers, including both the azuread and vault providers. Additionally, since I have some experience in Terraform, I figured it would be a quick trip.

    I started by installing all the necessary tools (specifically, the Vault CLI, the Azure CLI, and the Terraform CLI) in my WSL instance of Ubuntu. Why there instead of Powershell? Most of the tutorials and such lean towards the bash syntax, so it was a bit easier to roll through the tutorials without having to convert bash into powershell.

    I used my ops-automation repository as the source for this, and started by creating a new folder structure to hold my projects. As I anticipated more Terraform projects to come up, I created a base terraform directory, and then an azuread directory under that.

    Picking a Backend

    Terraform relies on state storage. They use the term backend to describe this storage. By default, Terraform uses a local file backend provider. This is great for development, but knowing that I wanted to get things running in Azure DevOps immediately, I decided that I should configure a backend that I can use from my machine as well as from my pipelines.

    As I have been using MinIO pretty heavily for storage, it made the most sense to configure MinIO as the backend, using the S3 backend to do this. It was “fairly” straightforward, as soon as I turned off all the nonsense:

    terraform {
      backend "s3" {
        skip_requesting_account_id  = true
        skip_credentials_validation = true
        skip_metadata_api_check     = true
        skip_region_validation      = true
        use_path_style              = true
        bucket                      = "terraform"
        key                         = "azuread/terraform.tfstate"
        region                      = "us-east-1"
      }
    }

    There are some obvious things missing: I am setting environment variables for values I would like to treat as secret, or, at least not public.

    • MinIO Endpoint -> AWS_ENDPOINT_URL_S3 environment variable instead of endpoints.s3
    • Access Key -> AWS_ACCESS_KEY_ID environment variable instead of access_key
    • Secret Key -> AWS_SECRET_ACCESS_KEY environment variable instead of secret_key

    These settings allow me to use the same storage for both my local machine and the Azure Pipeline.

    Configuration Azure AD

    Likewise, I needed to configure the azuread provider. I followed the steps in the documentation, choosing the environment variable route again. I configured a service principal in Azure and gave it the necessary access to manage my directory.

    Using environment variables allows me to set these from variables in Azure DevOps, meaning my secrets are stored in ADO (or Vault, or both…. more on that in another post).

    Importing Existing Resources

    I have a few resources that already exist in my Azure AD instance, enough that I didn’t want to re-create them and then re-configure everything which uses them. Luckily, most Terraform providers allow for importing existing resources. Thankfully, most of the resources I have support this feature.

    Importing is fairly simple: you create the simplest definition of a resource that you can, and then run a terraform import variant to import that resource into your project’s state. Importing an Azure AD Application, for example, looks like this:

    terraform import azuread_application.myapp /applications/<object-id>

    It is worth noting that the provider is looking for the object-id, not the client ID. The provider documentation has information as to which ID each resource uses for import.

    More importantly, Applications and Service Principals are different resources in Azure AD, even though they are pretty much a one to one. To import a Service Principal, you run a similar command:

    terraform import azuread_service_principal.myprincipal <sp-id>

    But where is the service principal’s ID? I had to go to the Azure CLI to get that info:

    az ad sp list --display myappname

    From this JSON, I grabbed the id value and used that to import.

    From here, I ran a terraform plan to see what was going to be changed. I took a look at the differences, and even added some properties to the terraform files to maintain consistency between the app and the existing state. I ended up with a solid project full of Terraform files that reflected my current state.

    Automating with Azure DevOps

    There are a few extensions available to add Terraform tasks to Azure DevOps. Sadly, most rely on “standard” configurations for authentication against the backends. Since I’m using an S3 compatible backend, but not S3, I had difficulty getting those extensions to function correctly.

    As the Terraform CLI is installed on my build agent, though, I only needed to run my commands from a script. I created an ADO template pipeline (planning for expansion) and extended it to create the pipeline.

    All of the environment variables in the template are reflected in the variable groups defined in the extension. If a variable is not defined, it’s simply blank. That’s why you will see the AZDO_ environment variables in the template, but not in the variable groups for the Azure AD provisioning.

    Stretch: Adding Hashicorp Vault

    Adding HC Vault support was somewhat trivial, but another exercise in authentication. I wanted to use AppRole authentication for this, so I followed the vault provider’s instructions and added additional configuration to my provider. Note that this setup requires additional variables that now need to be set whenever I do a plan or import.

    Once that was done, I had access to read and write values in Vault. I started by storing my application passwords in a new key vault. This allows me to have application passwords that rotate weekly, which is a nice security feature. Unfortunately, the rest of my infrastructure isn’t quite setup to handle such change. At least, not yet.