Category: Open Source

Some time with my open source projects…
Work has been busy, but I found some time to update one of my open source tools, md-to-conf.

md-to-conf is a tool forked from RittmanMead that I continue to modify to push Markdown documentation into Confluence. It handles the boring parts of that workflow — converting fenced code blocks, uploading images as attachments, wiring up internal links — so the rest of the pipeline can stay out of it.

The 1.1.x releases focused on attachment handling and GitHub-flavored alert support. Version 1.2.0 ships something I’ve had on the backlog for a while: native Mermaid diagram rendering, with a fallback strategy that works out of the box in CI/CD pipelines that don’t have npm installed.

Here’s what changed.

What Was in 1.1.2

Before getting into the new stuff, a quick recap of where 1.1.2 left things:

– Attachment handling was overhauled to use Confluence’s REST API v2 endpoints consistently. The v1 attachment paths had been causing subtle issues, especially around upload confirmation and 404 handling.

– GitHub alert boxes landed in 1.1.0 — [!NOTE], [!TIP], [!IMPORTANT], [!WARNING], and [!CAUTION] now convert to proper Confluence info/tip/warning/note macros. The 1.1.1 and 1.1.2 cycle stabilized those, fixed a ReDoS vulnerability in the alert regex, and improved test coverage significantly.

– Folder ancestor support was added, so pages can be organized under a folder hierarchy when published.

– A handful of SonarQube issues were resolved — mostly type annotation hygiene and regex security flags.

Solid maintenance work. The goal for that cycle was making the tool reliably boring to operate, which is the right goal.

What’s New in 1.2.0

Mermaid Diagram Rendering

The headline feature: pass --mermaid and any fenced Mermaid code block in your Markdown gets rendered to a PNG and uploaded as an attachment before the page is published.
```
md-to-conf architecture.md MY_SPACE --mermaid
```
Your Markdown stays clean and human-readable:
```
```mermaid

graph TD

    A[Developer] -->|git push| B[CI Pipeline]

    B -->|md-to-conf| C[Confluence Page]

    C --> D[Team reads it]

    D -->|eventually| E[Documentation is current]

```
```
And what lands in Confluence is an actual rendered diagram, not a gray code block that people have to squint at.

The Two-Strategy Rendering Pipeline

This is the design decision I spent the most time on. A lot of tools that promise Mermaid support require `mmdc` (the Mermaid CLI) to be on the PATH, which means coordinating an npm install in every environment that runs the tool. That’s fine for a developer workstation. It’s annoying for a CI agent, and it’s the first thing people hit when they try to use these features in automation.

md-to-conf 1.2.0 tries two strategies, in order:

Strategy 1: Local mmdc CLI

If mmdc is on the PATH, it renders the diagram locally. No network required. Fastest option, highest quality output.
```
npm install -g @mermaid-js/mermaid-cli
```
Strategy 2: mermaid.ink public API

If mmdc isn’t found, the tool automatically falls back to mermaid.ink — a public rendering API that takes Base64-encoded diagram source and returns a PNG. It just needs outbound HTTPS access, which basically every pipeline already has.

No npm. No configuration. It works.

The fallback behavior is transparent — you don’t have to opt into it or configure anything. If mmdc is available, it’s used. If not, mermaid.ink picks it up silently. If both fail (offline environment, diagram syntax error), a warning is logged and the original code block is left intact. Nothing breaks.

For CI/CD pipelines, the practical implication is:

> --mermaid works out of the box with no additional tooling.

That’s the goal. You shouldn’t need to install anything extra just to get diagrams into your documentation.

Why This Matters

The reason I wanted this feature in the first place: architecture documentation that includes diagrams. The workflow without it was always one of:

1. Screenshot a diagram from some external tool, upload it manually, pray it stays current

2. Write the diagram in Mermaid, render it somewhere else, copy the image, upload it… manually

3. Just skip the diagram entirely (this one wins more often than it should)

Having --mermaid as a flag in the same command that handles the rest of the publish step closes that loop. The source of truth is the Markdown file, the rendered output is the Confluence page, and there’s nothing manual in between.

Upgrade Notes

The --mermaid flag is opt-in. Existing pipelines will continue to work without any changes. If you want diagram rendering, add the flag.
```
# Before

md-to-conf my-doc.md MY_SPACE

# After, with Mermaid support

md-to-conf my-doc.md MY_SPACE --mermaid
```
If you want the local CLI for better quality or offline environments:
```
npm install -g @mermaid-js/mermaid-cli
```
What I Learned

1. Two strategies beat one. The mmdc + mermaid.ink combination covers the two real environments this tool runs in: developer machines (mmdc likely available) and CI agents (probably not). Building the fallback in from day one avoids the “works on my machine” problem that would otherwise become a support issue.

2. Opt-in is the right default for new features. `–mermaid` doesn’t change existing behavior. I’ve learned to be conservative with flags — it’s much easier to add behavior behind a flag than to explain to people why their pipeline broke after an upgrade.

3. Fail gracefully and loudly. If a diagram can’t render, the code block stays intact and a warning is logged. The page publish doesn’t fail. That’s the right tradeoff — a partially rendered page is better than a broken pipeline, and the warning gives people something to act on.

4. Build on solid foundations. The mermaid images are written to a temp directory, uploaded as attachments, and wired into the page HTML. The 1.1.x attachment handling work — getting the v2 API paths right, fixing 404 handling — meant this feature had a clean surface to build on. That kind of maintenance investment pays off when you add something new.

The full changelog is in the repository. If you’re running md-to-conf and have Mermaid diagrams sitting unused in your Markdown files, give 1.2.0 a try and let me know how it goes.
February 27, 2026
Migrating from MinIO to Garage
When Open Source Isn’t So Open Anymore

Sometimes migrations aren’t about chasing the newest technology—they’re about abandoning ship before it sinks. In December 2025, MinIO officially entered “maintenance mode” for its open-source edition, effectively ending active development. Combined with earlier moves like removing the admin UI, discontinuing Docker images, and pushing users toward their $96,000+ AIStor paid product, the writing was on the wall: MinIO’s open-source days were over.

Time to find a replacement.

Why I Had to Leave MinIO

Let’s be clear: MinIO used to be excellent open-source software. Past tense. Over the course of 2025, the company systematically dismantled what made it valuable for home lab and small-scale deployments:

June 2025: Removed the web admin console from the Community Edition. Features like bucket configuration, lifecycle policies, and account management became CLI-only—or you could pay for AIStor.

October 2025: Stopped publishing Docker images to Docker Hub. Want to run MinIO? Build it from source yourself.

December 2025: Placed the GitHub repository in “maintenance mode.” No new features, no enhancements, no pull request reviews. Only “critical security fixes…evaluated on a case-by-case basis.”

The pattern was obvious: push users toward AIStor, a proprietary product starting at nearly $100k, by making the open-source version progressively less usable. The community called it what it was—a lock-in strategy disguised as “streamlining.”

I’m not paying six figures for object storage in my home lab. Time to migrate.

Enter Garage

I needed S3-compatible storage that was:
- Actually open source, not “open source until we change our minds”
- Lightweight, suitable for single-node deployments
- Actively maintained by a community that won’t pull the rug out
Garage checked all the boxes. Built in Rust by the Deuxfleurs collective, it’s designed for geo-distributed deployments but scales down beautifully to single-node setups. More importantly, it’s genuinely open source—developed by a collective, not a company with a paid product to upsell.

The Migration Process

Vault: The Critical Path

Vault was the highest-stakes piece of this migration. It’s the backbone of my secrets management, and getting this wrong meant potentially losing access to everything. I followed the proper migration path:
1. Stopped the Vault pod in my Kubernetes cluster—no live migrations, no shortcuts
2. Used vault operator migrate to transfer the storage backend from MinIO to Garage—this is the officially supported method that ensures data integrity
3. Updated the vault-storage-config Kubernetes secret to point at the new Garage endpoint
4. Restarted Vault and unsealed it with my existing keys
The vault operator migrate command handled the heavy lifting, ensuring every key-value pair transferred correctly. While I could have theoretically just mirrored S3 buckets and updated configs, using the official migration tool gave me confidence nothing would break in subtle ways later.

Monitoring Stack: Configuration Updates

With Vault successfully migrated, the rest was straightforward. I updated S3 endpoint configurations across my monitoring stack in ops-internal-cluster:

Loki, Mimir, and Tempo all had their storage backends updated:
- Old: cloud.gerega.net:39000 (MinIO)
- New: cloud.gerega.net:3900 (Garage)
I intentionally didn’t migrate historical metrics and logs. This is a lab environment—losing a few weeks of time-series data just means starting fresh with cleaner retention policies. In production, you’d migrate this data. Here? Not worth the effort.

Monitoring Garage Itself

I added a Grafana Alloy scrape job to collect Garage’s Prometheus metrics from its /metrics endpoint. No blind spots from day one—if Garage has issues, I’ll know immediately.

Deployment Architecture

One deliberate choice: Garage runs as a single Docker container on bare metal, not in Kubernetes. Object storage is foundational infrastructure. If my Kubernetes clusters have problems, I don’t want my storage backend tied to that failure domain.

Running Garage outside the cluster means:
- Vault stores data independently of cluster state
- Monitoring storage (Loki, Mimir, Tempo) persists during cluster maintenance
- One less workload competing for cluster resources
Verification and Cleanup

Before decommissioning MinIO, I verified nothing was still pointing at the old endpoints:
```
# Searched across GitOps repos
grep -r "39000" .        # Old MinIO port
grep -r "192.168.1.30" . # Old MinIO IP
grep -r "s3.mattgerega.net" .
```
Clean sweep—everything migrated successfully.

Current Status

Garage has been running for about a week now. Resource usage is lower than MinIO ever was, and everything works:
- Vault sealed/unsealed multiple times without issues
- Loki ingesting logs from multiple clusters
- Mimir storing metrics from Grafana Alloy
- Tempo collecting distributed traces
The old MinIO instance is still running but idle. I’ll give it another week before decommissioning entirely—old habits die hard, and having a fallback during initial burn-in feels prudent.

Port 3900 is the new standard. Port 39000 is legacy. And my infrastructure is no longer dependent on a company actively sabotaging its open-source product.

Lessons for the Homelab Community

If you’re still running MinIO Community Edition, now’s the time to plan your exit strategy. The maintenance-mode announcement wasn’t a surprise—it was the inevitable conclusion of a year-long strategy to push users toward paid products.

Alternatives worth considering:
- Garage: What I chose. Lightweight, Rust-based, genuinely open source.
- SeaweedFS: Go-based, active development, designed for large-scale deployments but works at small scale.
- Ceph RGW: If you’re already running Ceph, the RADOS Gateway provides S3 compatibility.
The MinIO I deployed years ago was a solid piece of open-source infrastructure. The MinIO of 2025 is a bait-and-switch. Learn from my migration—don’t wait until you’re forced to scramble.

Technical Details:
- Garage deployment: Single Docker container on bare metal
- Migration window: ~30 minutes for Vault migration
- Vault migration method: vault operator migrate CLI command
- Affected services: Vault, Loki, Mimir, Tempo, Grafana Alloy
- Data retained: All Vault secrets, new metrics/logs only
- Repositories: ops-argo, ops-internal-cluster
- Garage version: Latest stable release as of December 2025
References:
December 10, 2025
Modernizing the Gateway
From NGINX Ingress to Envoy Gateway

As with any good engineer, I cannot leave well enough alone. Over the past week, I’ve been working through a significant infrastructure modernization across my home lab clusters – migrating from NGINX Ingress to Envoy Gateway and implementing the Kubernetes Gateway API. This also involved some necessary housekeeping with chart updates and a shift to Server-Side Apply for all ArgoCD-managed resources.

Why Change?

The timing couldn’t have been better. In November 2024, the Kubernetes SIG Network and Security Response Committee announced that Ingress NGINX will be retired in March 2026. The project has struggled with insufficient maintainer support, security concerns around configuration snippets, and accumulated technical debt. After March 2026, there will be no further releases, security patches, or bug fixes.

The announcement strongly recommends migrating to the Gateway API, described as “the modern replacement for Ingress.” This validated what I’d already been considering – the Gateway API provides a more standardized, vendor-neutral approach with better separation of concerns between infrastructure operators and application developers.

Envoy Gateway, being a CNCF project built on the battle-tested Envoy proxy, seemed like a natural choice for this migration. Plus, it gave me an excuse to finally move off Traefik, which was… well, let’s just say it was time for a change.

The Migration Journey

The migration happened in phases across my ops-argo, ops-prod-cluster, and ops-nonprod-cluster repositories. Here’s what changed:

Phase 1: Adding Envoy Gateway

I started by adding Envoy Gateway as a cluster tool, complete with its own ApplicationSet that deploys to clusters labeled with spydersoft.io/envoy-gateway: "true". The deployment includes:
- GatewayClass and Gateway resources: Defined a main gateway that handles traffic routing
- EnvoyProxy configuration: Set up with a static NodePort service for consistent external access
- ClientTrafficPolicy: Configured to properly handle forwarded headers – crucial for preserving client IP information through the proxy chain
The Envoy Gateway deployment lives in the envoy-gateway-system namespace and exposes services via NodePort 30080 and 30443, making it easy to integrate with my existing network setup.

Phase 2: Migrating Applications to HTTPRoute

This was the bulk of the work. Each application needed its Ingress resource replaced with an HTTPRoute. The new Gateway API resources are much cleaner. For example, my blog (www.mattgerega.com) went from an Ingress definition to this:
```
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: wp-mattgerega
  namespace: sites
spec:
  parentRefs:
    - name: main
      namespace: envoy-gateway-system
  hostnames:
    - www.mattgerega.com
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /
      backendRefs:
        - name: wp-mattgerega-wordpress
          port: 80
```
Much more declarative and expressive than the old Ingress syntax.

I migrated several applications across both production and non-production clusters:
- Gravitee API Management
- ProGet (my package management system)
- n8n and Node-RED instances
- Linkerd-viz dashboard
- ArgoCD (which also got a GRPCRoute for its gRPC services)
- Identity Server (across test and stage environments)
- Tech Radar
- Home automation services (UniFi client and IP manager)
Phase 3: Removing the Old Guard

Once everything was migrated and tested, I removed the old ingress controller configurations. This cleanup happened across all three repositories:

ops-prod-cluster:
- Removed all Traefik configuration files
- Cleaned up traefik-gateway.yaml and traefik-middlewares.yaml
ops-nonprod-cluster:
- Removed Traefik configurations
- Deleted the RKE2 ingress NGINX HelmChartConfig (rke2-ingress-nginx-config.yaml)
The cluster-resources directories got significantly cleaner with this cleanup. Good riddance to configuration sprawl.

Phase 4: Chart Maintenance and Server-Side Apply

While I was in there making changes, I also:
- Bumped several Helm charts to their latest versions:
  - ArgoCD: 9.1.5 → 9.1.7
  - External Secrets: 1.1.0 → 1.1.1
  - Linkerd components: 2025.11.3 → 2025.12.1
  - Grafana Alloy: 1.4.0 → 1.5.0
  - Common chart dependency: 4.4.0 → 4.5.0
  - Redis deployments updated across production and non-production
- Migrated all clusters to use Server-Side Apply (ServerSideApply=true in the syncOptions):
  - All cluster tools in ops-argo
  - Production application sets (external-apps, production-apps, cluster-resources)
  - Non-production application sets (external-apps, cluster-resources)
This is a better practice for ArgoCD as it allows Kubernetes to handle three-way merge patches instead of client-side strategic merge, reducing conflicts and improving sync reliability.

Lessons Learned

Gateway API is ready for production: The migration was surprisingly smooth. The Gateway API resources are well-documented and intuitive. With NGINX Ingress being retired, now’s the time to make the jump.

HTTPRoute vs. Ingress: HTTPRoute is more expressive and allows for more sophisticated routing rules. The explicit parentRefs concept makes it clear which gateway handles which routes.

Server-Side Apply everywhere: Should have done this sooner. The improved conflict handling makes ArgoCD much more reliable, especially when multiple controllers touch the same resources.

Envoy’s configurability: The EnvoyProxy custom resource gives incredible control over the proxy configuration without needing to edit ConfigMaps or deal with annotations.

Multi-cluster consistency: Making these changes across production and non-production environments simultaneously kept everything aligned and reduced cognitive overhead when switching between environments.

Current Status

All applications across all clusters are now running through Envoy Gateway with the Gateway API. Traffic is flowing correctly, TLS is terminating properly, and I’ve removed all the old ingress-related configuration from both production and non-production environments.

The clusters are more standardized, the configuration is cleaner, and I’m positioned to take advantage of future Gateway API features like traffic splitting and more advanced routing capabilities. More importantly, I’m ahead of the March 2026 retirement deadline with plenty of time to spare.

Now, the real question: what am I going to tinker with next?
December 9, 2025
Summer Project – Home Lab Refactor
As with any good engineer, I cannot leave well enough alone. My current rainy day project is reconfiguring my home lab for some much needed updates and simplification.

What’s Wrong?

My home lab is, well, still going strong. My automation scripts work well, and I don’t spend a ton of time doing what I need to do to keep things up to date, at least when it comes to my Kubernetes clusters.

The other servers, however, are in a scary spot. Everything is running on top of the free version of Windows Hyper-V Server from 2019, so general updates are a concern. I would LOVE to move to Windows Server 2025, but I do not have the money for that kind of endeavor.

The other issue with running a Windows Server is that, well, they usually expected a Windows Domain (or, at least, my version does). This requirement has forced me to run my own domain controllers for a number of years now. Earlier iterations of my lab included a lot of Windows VMs, so the domain helped me manage authentication across them all. But, with RKE2 and Kubernetes running the bulk of my workloads, the domain controllers are more hassle than anything right now.

The Plan

My current plan is to migrate my home server to Proxmox. It seems a pretty solid replacement for Hyper-V, and has a few features in it that I may use in the future, like using cloud-init for creating new cluster nodes and better management of storage.

Obviously, this is going to require some testing, and luckily, my old laptop is free for some experimentation. So I installed Proxmox there and messed around, and I came up with an interesting plan.
- Migrate my VMs to my laptop instance of Proxmox, reducing the workload as much as I can.
- Install Proxmox on my server
- Create a Proxmox cluster with my laptop and server as the nodes.
- Transfer my VMs from the laptop node to the server node.
Cutting my Workload

My laptop is a paltry 32GB of RAM, compared to 288 GB in my server. While I need to get everything “over” to the laptop, it doesn’t all have to be running at the same time.

For the windows VMs, my current plan is as follows:
- Move my primary domain controller to the laptop, but run at a reduced capacity (1 CPU/2GB).
- Move my backup DC to the laptop, shut it down.
- Move and shut down both SQL Server instances: they are only running lab DBs, nothing really vital.
For my clusters, I’m not actually going to “move” the VMs. I’m going to create new nodes on the laptop proxmox, add them to the clusters, and then deprovision the old ones. This gives me some control over what’s there.
- Non-Production Cluster -> 1 control plane server, 2 agents, but shut them down.
- Internal Cluster -> 1 control plane server (down from 3), 3 agents, all shut down.
- Production Cluster -> 1 control plane (down from 3), 2 agents, running vital software. I may need to migrate my HC Vault instance to the production cluster just to ensure secrets stay up and running.
With this setup, I should really only have 4 VMs running on my laptop, which it should be able to handle. Once that’s done, I’ll have time to install and configure Proxmox on the server, and then move VMs from the laptop to the server.

Lots to do

I have a lot of learning to do. Proxmox seems pretty simple to start, but I find I’m having to read a lot about the cloning and cloud-init pieces to really make use of the power of the tool.

Once I feel comfortable with Proxmox, the actual move will need scheduled… So, maybe by Christmas I’ll actually have this done.
May 22, 2025
ArgoCD panicked a little…
I ran into an odd situation last week with ArgoCD, and it took a bit of digging to figure it out. Hopefully this helps someone else along the way.

Whatever you do, don’t panic!

Well, unless of course you are ArgoCD.

I have a small Azure DevOps job that runs nightly and attempts to upgrade some of the Helm charts that I use to deploy external tools. This includes things like Grafana, Loki, Mimir, Tempo, ArgoCD, External Secrets, and many more. This job deploys the changes to my GitOps repositories, and if there are changes, I can manually sync.

Why not auto-sync, you might ask? Visibility, mostly. I like to see what changes are being applied, in case there is something bigger in the changes that needs my attention. I also like to “be there” if something breaks, so I can rollback quickly.

Last week, while upgrading Grafana and Tempo, ArgoCD started throwing the following error on sync:
```
Recovered from panic: runtime error: invalid memory address or nil pointer
```
A quick trip to Google produced a few different results, but nothing immediately apparent. One particular issue mentioned that they had a problem with out-of-date resources (old apiversion). Let’s put a pin in that.

Nothing was jumping out, and my deployments were still working. I had a number of other things on my plate, so I let this slide for a few days.

Versioning….

When I finally got some time to dig into this, I figured I would pull at that apiversion string and see what shook loose. Unfortunately, as there is no real good error as to which resource is causing it, it was luck of the draw as to whether or not I found the offender. This time, I was lucky.

My ExternalSecret resources were using some alpha versions, so my first thought was to update to the v1 version. Lowe and behold, that fixed the two charts which were failing.

This, however, leads to a bigger issue: if ArgoCD is not going to inform me when I have out of date apiversion values for a resource, I am going to have to figure out how to validate these resources sometime before I commit the changes. I’ll put this on my ever growing to do list.
April 29, 2025
Platform Engineering
As I continue to build out some reference architecture applications, I realized that there was a great deal of boilerplate code that I add to my APIs to get things running. Time for a library!

Enter the “Platform”

I am generally terrible at naming things, but Spydersoft.Platform seemed like a good base namespace for this one. The intent is to put the majority of my boilerplate code into a set of libraries that can be referenced to make adding stuff easier.

But, what kind of “stuff?” Well, for starters
- Support for OpenTelemetry trace, metrics, and logging
- Serilog logging for console logging
- Simple JWT identity authentication (for my APIs)
- Default Health Check endpoints
Going deep with Health Checks

The first three were pretty easy: just some POCOs for options and then startup extensions to add the necessary items with the proper configuration. With health checks, however, I went a little overboard.

My goal was to be able to implement IHealthCheck anywhere and decorate it in such a way that it would be added to the health check framework and could be tagged. Furthermore, I wanted to use tags to drive standard endpoints.

In the end, I used a custom attribute and some reflection to add the checks that are found in the loaded AppDomain. I won’t bore you: the documentation should do that just fine.

But can we test it?

Testing startup extensions is, well, interesting. Technically, it is an integration test, but I did not want to setup playwright tests to execute the API tests. Why? Well, usually API integration tests are run again a particular configuration, but in this case, I needed to run the reference application with a lot of different configurations in order to fully test the extensions. Enter WebApplicationFactory.

With WebApplicationFactory, I was able to configure tests to stand up a copy of the reference application with different configurations. I could then verify the configuration using some custom health checks.

I am on the fence as to whether or not this is a “unit” test or an “integration” test. I’m not calling out to any other application, which is usually the definition of an integration test. But I did have to configure a reference application in order to get things tested.

Whatever you call it, I have coverage on my startup extensions, and even caught a few bugs while I was writing the tests.

Make it truly public?

Right now, the build publishes the Nuget package to my private nuget feed. I am debating on moving it to Nuget (or maybe Github’s package feeds). While the code is open source, I want to make the library openly available. But until I make the decision on where to put it, I will keep it in my private feed. If you have any interest in it, watch or star the repo in GitHub: it will help me gauge the level of interest.
November 11, 2024
Tech Tip – Formatting External Secrets in Helm
This has tripped me up a lot, so I figure it is worth a quick note.

The Problem

I use Helm charts to define the state of my cluster in a Git repository, and ArgoCD to deploy those charts. This allows a lot of flexibility in my deployments and configuration.

For secrets management, I use External Secrets to populate secrets from Hashicorp Vault. In many of those cases, I need to use the templating functionality of External Secrets to build secrets that can be used from external charts. A great case of this is populating user secrets for the RabbitMQ chart.

In the link above, you will notice the templates/default-user-secrets.yaml file. This file is meant to generate a Kubernetes Secret resource which is then sent to the RabbitMqCluster resource (templates/cluster.yaml). This secret is mounted as a file, and therefore, needs some custom formatting. So I used the template property to format the secret:
```
template:
  type: Opaque
  engineVersion: v2
  data:
    default_user.conf: |
        default_user={{ `{{ .username  }}` }}
        default_pass={{ `{{ .password  }}` }}
    host: {{ .Release.Name }}.rabbitmq.svc
    password: {{`"{{ .password }}"`}}
    port: "5672"
    provider: rabbitmq
    type: rabbitmq
    username: {{`"{{ .username }}"`}}
```
Notice in the code above the duplicated {{ and }} around the username/password values. These are necessary to ensure that the template is properly set in the ExternalSecret resource.

But, Why?

It has to do with templating. Helm uses golang templates to process the templates and create resources. Similarly, the ExternalSecrets template engine uses golang templates. When you have a “template in a template”, you have to somehow tell the processor to put the literal value in.

Let’s look at one part of this file.
```
  default_user={{ `{{ .username  }}` }}
```
What we want to end up in the ExternalSecret template is this:
```
default_user={{ .username  }}
```
So, in order to do that, we have to tell the Helm template to write {{ .username }} as written, not processing it as a golang template. In this case, we use the backtick (`) to allow for this escape without having that value written to the template. Notice that other areas use the double-quote (“) to wrap the template.
```
password: {{`"{{ .password }}"`}}
```
This will generate the quotes in the resulting template:
```
password: "{{ .password }}"
```
If you need a single quote, the use the same pattern, but replace the double quote with a single quote (‘).
```
username: {{`'{{ .username }}'`}}
```
For whatever it is worth, VS Code’s YAML parser did not like that version at all. Since I have not run into a situation where I need a single quote, I use double quotes if quotes are required, and backticks if they are not.
April 8, 2024
Spoolman for Filament Management

“You don’t know what you go ’til it’s gone” is a great song line, but a terrible inventory management approach. As I start to stock up on filament for the 3D printer, it occurred to me that I need a way to track my inventory.

The Community Comes Through

I searched around for different filament management solutions and landed on Spoolman. It seemed a pretty solid fit for what I needed. The owner also configured builds for container images, so it was fairly easy to configure a custom chart to run an instance on my internal tools cluster.

The client UI is pretty easy to use, and the ability to add extra fields to the different modules makes the solution very extensible. I was immediately impressed and started entering information about vendors, filaments, and spools.

Enhancing the Solution

Since I am using a Bambu Labs printer and Bambu Studio, I do not have the ability to integrate Bambu into Spoolman to report filament usage. I searched around, but it does not seem that the Bambu reports such usage.

My current plan for managing filament is by weight the spool when I open it, and then weighing it again after each use. That difference is the amount of filament I have used. But, to calculate the amount remaining, I need to know the weight of an empty spool. Assuming most manufacturers use the same spools, that shouldn’t be too hard to figure out long term.

Spoolman is not quite set up for that type of usage. Weight and spool weight is set at the filament level and cannot be overridden at the spool level. Most spools will not be exactly 1000g of filament, so the need to track initial weight at the spool level is critical. Additionally, I want to support partial spools, including re-spooling.

So, using all the Python I have learned recently, I took a crack at updating the API and UI to support this very scenario. In a “do no harm” type of situation, I made sure that I had all the integration tests running correctly, then went about adding the new fields and some of the new default functionality. After I had the updated functionality in place, I added a few new integration test to verify my work.

Oddly, as I started working it, I found 4 feature requests in that were related to the changes I was suggesting. It took me a few nights, but I generated a pull request for the changes.

And Now, We Wait…

With my PR in place, I wait. The beauty of open source is that anyone can contribute, but the owners have the final say. This also means the owners need to respond, and most owners aren’t doing this as a full time job. So sometimes, there isn’t anything to do but wait.

I’m hopeful that my changes will be accepted, but for now, I’m using Spoolman as-is, and just doing some of the “math” myself. It is definitely helping me keep track of my filament, and I’m keeping an eye on possible integrations with the Bambu ecosystem.

March 28, 2024
Updated Site Monitoring
What seemed like forever ago, I put together a small project for simple site monitoring. My md-to-conf work enhanced my Python skills, and I thought it would be a good time to update the monitoring project.

Housekeeping!

First things first: I transferred the repository from my personal GitHub account to the spydersoft-consulting organization. Why? Separation of concerns, mostly. Since I fork open source repositories into my personal, I do not want the open source projects I am publishing to be mixed in with those forks.

After that, I went through the process of converting my source to a package with GitHub Actions to build and publish to PyPi.org. I also added testing, formatting, and linting, copying settings and actions from the md_to_conf project.

Oh, SonarQube

Adding the linting with SonarQube added a LOT of new warnings and errors. Everything from long lines to bad variable names. Since my build process does not succeed if those types of things are found, I went through the process of fixing all those warnings.

The variable naming ones were a little difficult, as some of my classes mapped to the configuration file serialization. That meant that I had to change my configuration files as well as the code. I went through a few iterations, as I missed some.

I also had to add a few tests, just so that the tests and coverage scripts get run. Could I have omitted the tests entirely? Sure. But a few tests to read some sample configuration files never hurt anyone.

Complete!

I got everything renamed and building pretty quickly, and added my PyPi.org API token to the repository for the actions. I quickly provisioned a new analysis project in SonarCloud, and merged everything into main. Created a new GitHub release, which triggered a new publish to PyPi.org.

Setting up the Raspberry Pi

The last step was to get rid of the code on the Raspberry Pi, and use pip to install the package. This was relatively easy, with a few caveats.
1. Use pip3 install instead of pip – Forgot the old Pi has both Python 2 and 3 installed.
2. Fix the config files – I had to change my configuration file to reflect the variable name changes.
3. Change the cron job – This one needs a little more explanation
For the last one, when changing the cron job, I had to point specifically to /usr/local/bin/pi-monitor, since that’s where pip installed it. My new cron job looks like this:
```
SHELL=/bin/bash

*/5 * * * * pi cd /home/pi && /usr/local/bin/pi-monitor -c monitor.config.json 2>&1 | /usr/bin/logger -t PIMONITOR
```
That runs the application and logs everything to syslog with the PIMONITOR tag.

Did this take longer than I expected? Yea, a little. Is it nice to have another open source project in my portfolio. Absolutely. Check it out if you are interested!
February 28, 2024
Terraform Azure AD
Over the last week or so, I realized that while I bang the drum of infrastructure as code very loudly, I have not been practicing it at home. I took some steps to reconcile that over the weekend.

The Goal

I have a fairly meager home presence in Azure. Primarily, I use a free version of Azure Active Directory (now Entra ID) to allow for some single sign-on capabilities in external applications like Grafana, MinIO, and ArgoCD. The setup for this differs greatly among the applications, but common to all of these is the need to create applications in Azure AD.

My goal is simple: automate provisioning of this Azure AD account so that I can manage these applications in code. My stretch goal was to get any secrets created as part of this process into my Hashicorp Vault instance.

Getting Started

The plan, in one word, is Terraform. Terraform has a number of providers, including both the azuread and vault providers. Additionally, since I have some experience in Terraform, I figured it would be a quick trip.

I started by installing all the necessary tools (specifically, the Vault CLI, the Azure CLI, and the Terraform CLI) in my WSL instance of Ubuntu. Why there instead of Powershell? Most of the tutorials and such lean towards the bash syntax, so it was a bit easier to roll through the tutorials without having to convert bash into powershell.

I used my ops-automation repository as the source for this, and started by creating a new folder structure to hold my projects. As I anticipated more Terraform projects to come up, I created a base terraform directory, and then an azuread directory under that.

Picking a Backend

Terraform relies on state storage. They use the term backend to describe this storage. By default, Terraform uses a local file backend provider. This is great for development, but knowing that I wanted to get things running in Azure DevOps immediately, I decided that I should configure a backend that I can use from my machine as well as from my pipelines.

As I have been using MinIO pretty heavily for storage, it made the most sense to configure MinIO as the backend, using the S3 backend to do this. It was “fairly” straightforward, as soon as I turned off all the nonsense:
```
terraform {
  backend "s3" {
    skip_requesting_account_id  = true
    skip_credentials_validation = true
    skip_metadata_api_check     = true
    skip_region_validation      = true
    use_path_style              = true
    bucket                      = "terraform"
    key                         = "azuread/terraform.tfstate"
    region                      = "us-east-1"
  }
}
```
There are some obvious things missing: I am setting environment variables for values I would like to treat as secret, or, at least not public.
- MinIO Endpoint -> AWS_ENDPOINT_URL_S3 environment variable instead of endpoints.s3
- Access Key -> AWS_ACCESS_KEY_ID environment variable instead of access_key
- Secret Key -> AWS_SECRET_ACCESS_KEY environment variable instead of secret_key
These settings allow me to use the same storage for both my local machine and the Azure Pipeline.

Configuration Azure AD

Likewise, I needed to configure the azuread provider. I followed the steps in the documentation, choosing the environment variable route again. I configured a service principal in Azure and gave it the necessary access to manage my directory.

Using environment variables allows me to set these from variables in Azure DevOps, meaning my secrets are stored in ADO (or Vault, or both…. more on that in another post).

Importing Existing Resources

I have a few resources that already exist in my Azure AD instance, enough that I didn’t want to re-create them and then re-configure everything which uses them. Luckily, most Terraform providers allow for importing existing resources. Thankfully, most of the resources I have support this feature.

Importing is fairly simple: you create the simplest definition of a resource that you can, and then run a terraform import variant to import that resource into your project’s state. Importing an Azure AD Application, for example, looks like this:
```
terraform import azuread_application.myapp /applications/<object-id>
```
It is worth noting that the provider is looking for the object-id, not the client ID. The provider documentation has information as to which ID each resource uses for import.

More importantly, Applications and Service Principals are different resources in Azure AD, even though they are pretty much a one to one. To import a Service Principal, you run a similar command:
```
terraform import azuread_service_principal.myprincipal <sp-id>
```
But where is the service principal’s ID? I had to go to the Azure CLI to get that info:
```
az ad sp list --display myappname
```
From this JSON, I grabbed the id value and used that to import.

From here, I ran a terraform plan to see what was going to be changed. I took a look at the differences, and even added some properties to the terraform files to maintain consistency between the app and the existing state. I ended up with a solid project full of Terraform files that reflected my current state.

Automating with Azure DevOps

There are a few extensions available to add Terraform tasks to Azure DevOps. Sadly, most rely on “standard” configurations for authentication against the backends. Since I’m using an S3 compatible backend, but not S3, I had difficulty getting those extensions to function correctly.

As the Terraform CLI is installed on my build agent, though, I only needed to run my commands from a script. I created an ADO template pipeline (planning for expansion) and extended it to create the pipeline.

All of the environment variables in the template are reflected in the variable groups defined in the extension. If a variable is not defined, it’s simply blank. That’s why you will see the AZDO_ environment variables in the template, but not in the variable groups for the Azure AD provisioning.

Stretch: Adding Hashicorp Vault

Adding HC Vault support was somewhat trivial, but another exercise in authentication. I wanted to use AppRole authentication for this, so I followed the vault provider’s instructions and added additional configuration to my provider. Note that this setup requires additional variables that now need to be set whenever I do a plan or import.

Once that was done, I had access to read and write values in Vault. I started by storing my application passwords in a new key vault. This allows me to have application passwords that rotate weekly, which is a nice security feature. Unfortunately, the rest of my infrastructure isn’t quite setup to handle such change. At least, not yet.
February 12, 2024