Author: Matt

  • Building a Radar, Part 1 – A New Pattern

    This is the first in a series meant to highlight my work to build a non-trivial application that can serve as a test platform and reference. When it comes to design, it is helpful to have an application with enough complexity to properly evaluate the features and functionality of proposed solutions.

    Backend For Frontend

    Lately, I have been somewhat obsessed with the Backend for Frontend pattern, or BFF. There are a number of benefits to the pattern, articulated well all across the internet, so I will avoid a recap. I wanted an application that took advantage of this pattern so that I could start to demonstrate the benefits.

    I had previously done some work in putting a simple backend on the Zalando tech radar. It is a pretty simple Create/Retrieve/Update/Delete (CRUD) application, but complex enough that it would work in this case.

    Configuring the BFF

    At first, I started looking at converting the existing project, but quickly realized that this is a good time for a clean slate. I followed the MSDN tutorial to the letter to get a working sample application. From there, I moved my existing SPA to the working sample.

    With that in place, I walked through Auth0’s tutorial on implementing Backend for Frontend authentication in ASP.NET Core. In this case, I substituted my Duende Identity Server for the OAuth/Okta instance used in the tutorial. This all worked great, with the notable exception that I had to ensure all my proxies were in order.

    Show Your Work!

    Now, admittedly, my blogging is well behind my actual work, so if you go browsing the repository, it is a little farther ahead than this post. Next in this series, I’ll discuss configuring the BFF to proxy calls to a backend service.

    While the work is ahead of the post, the documentation is WAY behind, so please ignore the README.md file for now. I’ll get proper documentation completed as soon as I can.

  • A Tale of Two Proxies

    I am working on building a set of small reference applications to demonstrate some of the patterns and practices to help modernize cloud applications. In configuring all of this in my home lab, I spent at least 3 hours fighting a problem that turned out to be a configuration issue.

    Backend-for-Frontend Pattern

    I will get into more details when I post the full application, but I am trying to build out a SPA with a dedicated backend API that would host the SPA and take care of authentication. As is typically the case, I was able to get all of this working on my local machine, including the necessary proxying of calls via the SPA’s development server (again, more on this later).

    At some point, I had two containers ready to go: a BFF container hosting the SPA and the dedicated backend, and an API container hosting a data service. I felt ready to deploy to the Kubernetes cluster in my lab.

    Let the pain begin!

    I have enough samples within Helm/Helmfile that getting the items deployed was fairly simple. After fiddling with the settings of the containers, things were running well in the non-authenticated mode.

    However, when I clicked login, the following happened:

    1. I was redirected to my oAuth 2.0/OIDC provider.
    2. I entered my username/password
    3. I was redirected back to my application
    4. I got a 502 Bad Gateway screen

    502! But, why? I consulted Google and found any number of articles indicating that, in the authentication flow, Nginx’s default header size limit is too small to limit what might be coming back from the redirect. So, consulting the Nginx configuration documents, I changed the Nginx configuration in my reverse proxy to allow for larger headers.

    No luck. Weird. In the spirit of true experimentation (change one thing at a time), I backed those changes out and tried changing the configuration of my Nginx Ingress controller. No luck. So what’s going on?

    Too Many Cooks

    My current implementation looks like this:

    flowchart TB
        A[UI] --UI Request--> B(Nginx Reverse Proxy)
        B --> C("Kubernetes Ingress (Nginx)")
        C --> D[UI Pod]
    

    There are two Nginx instances between all of my traffic: an instance outside of the cluster that serves as my reverse proxy, and an Nginx ingress controller that serves as the reverse proxy within the cluster.

    I tried changing both separately. Then I tried changing both at the same time. And I was still seeing this error. As it turns out, well, I was being passed some bad data as well.

    Be careful what you read on the Internet

    As it turns out, the issue was the difference in configuration between the two Nginx instances and some bad configuration values that I got from old internet articles.

    Reverse Proxy Configuration

    For the Nginx instance running on Ubuntu, I added the following to my nginx.conf file under the http section:

            proxy_buffers 4 512k;
            proxy_buffer_size 256k;
            proxy_busy_buffers_size 512k;
            client_header_buffer_size 32k;
            large_client_header_buffers 4 32k;

    Nginx Ingress Configuration

    I am running RKE2 clusters, so configuring Nginx involves a HelmChartConfig resource being created in the kube-system namespace. My cluster configuration looks like this:

    apiVersion: helm.cattle.io/v1
    kind: HelmChartConfig
    metadata:
      name: rke2-ingress-nginx
      namespace: kube-system
    spec:
      valuesContent: |-
        controller:
          kind: DaemonSet
          daemonset:
            useHostPort: true
          config:
            use-forwarded-headers: "true"
            proxy-buffer-size: "256k"
            proxy-buffers-number: "4"
            client-header-buffer-size: "256k"
            large-client-header-buffers: "4 16k"
            proxy-body-size: "10m"

    The combination of both of these settings got my redirects to work without the 502 errors.

    Better living through logging

    One of the things I fought with on this was finding the appropriate logs to see where the errors were occurring. I’m exporting my reverse proxy logs into Loki using a Promtail instance that listens on a syslog port. So I am “getting” the logs into Loki, but I couldn’t FIND them.

    I forgot about the facility in syslog: I have the access logs sending as local5, but did configured the error logs without pointing them to local5. I learned that, by default, they go to local7.

    Once I found the logs I was able to diagnose the issue, but I spent a lot of time browsing in Loki looking for those logs.

  • Tech Tip – Chiseled Images from Microsoft

    I have been spending a considerable amount of time in .Net 8 lately. In addition to some POC work, I have been transitioning some of my personal projects to .Net 8. While the details of that work will be the topic of a future post (or posts), Microsoft’s chiseled containers are worth a quick note.

    In November, Microsoft released .NET Chiseled Containers into GA. These containers are slimmed-down versions of the .NET Linux containers, focused on getting a “bare bones” container that can be used as a base for a variety of containers.

    If you are building containers from Microsoft’s .NET container images, chiseled containers are worth a look!

    A Quick Note on Globalization

    I tried moving two of my containers to the 8.0-jammy-chiseled base image. The fronted, with no database connection, worked fine. However, the API with the database connection ran into a globalization issue.

    Apparently, Microsoft.Data.SqlClient requires a few OS libraries that are not part of chiseled. Specifically the International Components for Unicode (ICU) is not included, by default, in the chiseled image. Ubuntu-rocks demonstrates how it can be added, but, for now, I am leaving that image as the standard 8.0-jammy image.

  • Re-configuring Grafana Secrets

    I recently fixed some synchronization issues that had been silently plaguing some of the monitoring applications I had installed, including my Loki/Grafana/Tempo/Mimir stack. Now that the applications are being updated, I ran into an issue with the latest Helm chart’s handling of secrets.

    Sync Error?

    After I made the change to fix synchronization of the Helm charts, I went to sync my Grafana chart, but received a sync error:

    Error: execution error at (grafana/charts/grafana/templates/deployment.yaml:36:28): Sensitive key 'database.password' should not be defined explicitly in values. Use variable expansion instead.

    I certainly didn’t change anything in those files, and I am already using variable expansion in the values.yaml file anyway. What does that mean? Basically, in the values.yaml file, I used ${ENV_NAME} in areas where I had a secret value, and told Grafana to expand environment variables into the configuration.

    The latest version of Helm doesn’t seem to like this. It views ANY value in secret fields to be bad. A search of the Grafana Helm Chart repo’s issues list yielded someone with a similar issue and a comment with a link to another comment that is the recommended solution.

    Same Secret, New Name

    After reading through the comment’s suggestion and Grafana’s documentation on overriding configuration with environment variables, I realized the fix was pretty easy.

    I already had a Kubernetes secret being populated from Hashicorp Vault with my secret values. I also already had envFromSecret set in the values.yaml to instruct the chart to use my secret. And, through some dumb luck, two of the three values were already named using the standards in Grafana’s documentation.

    So the “fix” was to simply remove the secret expansions from the values.yaml file, and rename one of the secretKey values so that it matched Grafana’s environment variable template. You can see the diff of the change in my Github repository.

    With that change, the Helm chart generated correctly, and once Argo had the changes in place, everything was up and running.

  • Synced, But Not: ArgoCD Differencing Configuration

    Some of the charts in my Loki/Grafana/Tempo/Mimir stack have an odd habit of not updating correctly in ArgoCD. I finally got tired of it and fixed it… I’m just not 100% sure how.

    Ignoring Differences

    At some point in the past, I had customized a few of my Application objects with ignoreDifferences settings. It was meant to tell ArgoCD to ignore fields that are managed by other things, and could change from the chart definition.

    Like what, you might ask? Well, the external-secrets chart generates it’s own caBundle and sets properties on a ValidatingWebhookConfiguration object. Obviously, that’s managed by the controller, and I don’t want to mess with it. However, I also don’t want ArgoCD to report that chart as Out of Sync all the time.

    So, as an example, my external-secrets application looks like this:

    project: cluster-tools
    source:
      repoURL: 'https://github.com/spydersoft-consulting/ops-argo'
      path: cluster-tools/tools/external-secrets
      targetRevision: main
    destination:
      server: 'https://kubernetes.default.svc'
      namespace: external-secrets
    syncPolicy:
      syncOptions:
        - CreateNamespace=true
        - RespectIgnoreDifferences=true
    ignoreDifferences:
      - group: '*'
        kind: '*'
        managedFieldsManagers:
          - external-secrets

    And that worked just fine. But, with my monitor stack, well, I think I made a boo-boo.

    Ignoring too much

    When I looked at the application differences for some of my Grafana resources, I noticed that the live vs desired image was wrong. My live image was older than the desired one, and yet, the application wasn’t showing as out of sync.

    At this point, I suspected ignoreDifferences was the issue, so I looked at the Application manifest. For some reason, my monitoring applications had an Application manifest that looked like this:

    project: external-apps
    source:
      repoURL: 'https://github.com/spydersoft-consulting/ops-internal-cluster'
      path: external/monitor/grafana
      targetRevision: main
      helm:
        valueFiles:
          - values.yaml
        version: v3
    destination:
      server: 'https://kubernetes.default.svc'
      namespace: monitoring
    syncPolicy:
      syncOptions:
        - RespectIgnoreDifferences=true
    ignoreDifferences:
      - group: "*"
        kind: "*"
        managedFieldsManagers:
        - argocd-controller
      - group: '*'
        kind: StatefulSet
        jsonPointers:
          - /spec/persistentVolumeClaimRetentionPolicy
          - /spec/template/metadata/annotations/'kubectl.kubernetes.io/restartedAt'

    Notices the part where I am ignoring managed fields from argocd-controller. I have no idea why I added that, but, it looks a little “all inclusive” for my tastes, and it was ONLY present in the ApplicationSet for my LGTM stack. So I commented it out.

    Now We’re Cooking!

    Lo and behold, ArgoCD looked at my monitoring stack and said “well, you have some updates, don’t you!” I spent the next few minutes syncing those applications individually. Why? There are a lot of hard working pods in those applications, I don’t like to cycle them all at once.

    I searched through my posts and some of my notes, and I honestly have no idea why I decided I should ignore all fields managed by argocd-controller. Needless to say, I will not be doing that again.

  • Proxy Down!

    A simple package updates seems to have adversely affected the network on my Banana Pi proxy server. I wish this was a post on how I solved it, but, alas, I haven’t solved it.

    It was a simple upgrade…

    That statement has been uttered prior to massive outages a time or two. I logged in to my proxy server, ran apt update and apt upgrade, and restarted the device.

    Now, this is a headless device, so I tried re-connecting via SSH a few times. Nothing. I grabbed the device out of the server closet and connected it to a small “workstation” I have in my office. I booted it up, and, after a pretty long boot cycle, it came up, but with the network adapter down.

    Why’s the network down?

    Running ip addr show displayed eth0 as being down. But, well, I have no idea why. And, since this is something of a production device, I manually enabled as follows:

    # enable the link
    sudo ip link set eth0 up
    # restart networking
    sudo systemctl restart systemd-networkd
    # restart nginx
    sudo systemctl restart nginx

    This all worked and the network was back up and running. But what happened?

    Digging through the syslog file, I came across this:

    Jan  8 14:09:28 m5proxy systemd-udevd[2204]: eth0: Failed to query device driver: Device or resource busy

    A little research yielded nothing really fruitful. I made sure that everything was setup correctly, including cloud-init and netplan, but nothing worked.

    What Now?

    I am in a weird spot. The proxy is working, and it’s such a vital piece of my current infrastructure that I do not currently have the ability to “play with it.”

    What I should probably do is create a new proxy machine, either with one of my spare Raspberry PIs or just a VM on the server. Then I can transfer traffic to the new proxy and diagnose this one without fear of downtime.

    Before I do that, though, I’m going to do some more research into the error above and see if I can tweak some of the configuration to get things working on a reboot. Regardless, I will post a solution when one comes about.

  • Mermaids!

    Whether it’s software, hardware, or real world construction, an architect’s life is about drawings. I am always on the lookout for new tools to make keeping diagrams and drawings up-to-date, and, well, I found a mermaid.

    Mermaid.js

    Mermaid is, essentially, a system to render diagrams and visualizations using text and code. According to their site, it is Javascript-based and Markdown-inspired, and allows developers to spend less time managing diagrams and more time writing code.

    It currently supports a number of different diagram types, including flow charts, sequence diagrams, and state diagrams. In addition to that, many providers (including Github and Atlassian Confluence Cloud) provide support for Mermaid charts, either free of charge (thanks Github!) or via paid add on applications (not surprised, Atlassian). I’m sure other providers have support, but those are the two I am using.

    Mermaid in Action

    As of right now, I have only had the opportunity to use Mermaid charts at work, so my examples are not publicly available. You will have to settle for my anecdotes until I get some charts and visualization into some of my open source projects.

    At work, though, I have been using the Gitgraph diagrams to visualize some of our current and proposed workflows for our development teams. Being able to visualize that Git work flow make the documentation much easier to understand for our teams.

    Additionally, I created a few sequence diagrams to illustrate a proposed flow for authentication across multiple services and applications. I could have absolutely created these diagrams in Miro (which is our current illustrating tool), but aligning the different boxes and lines would take a tremendous amount of time. By comparison, my Mermaid diagrams were around 20 lines and fully illustrated my scenarios.

    In WordPress?

    Obviously, I would really like to be able to use Mermaid charts in my blog to add visualizations to posts. Since Mermaid is Javascript-based, I figured there would be a plugin for to render Mermaid code to blog post.

    WP-Mermaid should, in theory, make this work. However…. well, it doesn’t. I’m not the only person with the issue. A quick bit of research shows that the issue is how WordPress is “cleaning up” the code that is put in, since it’s not tagged as preformatted (using the pre tag). I was able to hack in a test to see if adding pre and then changing the rendering in the plugin would work. It works just fine…

    And so my to-do list grows. I would like to use Mermaid charts in WordPress, but I have to fix it first.

  • Taking Stock

    My wife has a go-to birthday question: Name the best thing and the worst thing about the past year. We realized that she has a habit of asking this for the New Year as well.

    Many of our family and friends have birthdays that fall around the holiday season, so the question becomes somewhat repetitive. As we sat at what will end up being our New Year’s celebration (she’s out of town over the holiday), we came up with a new question.

    What are you most excited about in the coming year?

    She and I broke this down into “personally and professionally,” which gave us some interesting talking points. Personally, we both identified some new travel destinations. After all, I have some new scuba gear I need to take for a test dive.

    Professionally, well, I had a pretty lengthy reflection on my years at Four Rivers and Accruent back in October. That was, however, primarily a gaze into the past. As I considered what’s in store in 2024, I realized that there are some pretty exciting things happening at my new company. These things give me a chance to apply what I have learned in my years of software engineering, all while learning new and exciting technologies.

    Growing my “brand”

    I really don’t like to use the term “brand” to describe what I am doing here. I use it, though, because it pretty accurately describes what I would ultimately like to do.

    My goal with this publication is to make what I do fun. I mean, it is fun to me, and I know it is fun to others, so why not give that excitement a voice. Do I refer back to this blog as a reference for things I have done? Absolutely! It has become a great reference for me on its own, and hopefully others find some use in its pages.

    Through the last year, I have done a better job of being consistent in terms of generating content. I had, oddly, 35 posts in 2021 an 2022, and I am up to 76 posts in 2023 (this one makes 77). This equates to a post about every 5 days. My goal is a post every 4 days, so I’m pretty close.

    As to content quality, well, that’s harder to judge. My most popular pages, by far, are the write-ups with details instructions. Sadly, my anecdotes do not far so well on content. That said, it is far easier to write the anecdotes than it is the write-ups, but, I guess you get out what you put in.

    So my goals for this blog in the coming year are two-fold:

    1. Get closer to my “post every 4 days metrics”
    2. Increase the number of technical write-ups

    That last one gets difficult, as I cannot divulge work I do that remains IP for my company. Most of my technical write-ups end up being small, general pieces of a larger puzzle. So I’ll have to figure out how to get more creative in that regard.

    Happy New Year!

    It’s good to have goals. I’ve shared some of mine, and hopefully you have some of your own. In the meantime, I wish anyone reading this the best of luck in the new year!

  • More GitOps Fun!

    I have been curating some scripts that help me manage version updates in my GitOps repositories… It’s about time they get shared with the world.

    What’s Going On?

    I manage the applications in my Kubernetes clusters using Argo CD and a number of Git repositories. Most of the ops- repositories act as “desired state” repositories.

    As part of this management, I have a number of external tools running in my clusters that are installed using their Helm charts. Since I want to keep my installs up to date, I needed a way to update the Helm chart versions as new releases came out.

    However.. some external tools do not have their own Helm charts. For that, I have been using a Helm library chart from bjw-s. In that case, I have had to manually find new releases and update my values.yaml file.

    While I have had the Helm chart version updates automated for some time, I just recently got around to updating the values.yaml file from external sources. Now is a good time to share!

    The Scripts

    I put the scripts in the ops-automation repository in the Spydersoft organization. I’ll outline the basics of each script, but if you are interested in the details, check out the scripts themselves.

    It is worth nothing that these scripts require the git and helm command line tools to be installed, in addition to the Powershell Yaml module.

    Also, since I manage more than one repository, all of these scripts are designed to be given a basePath and then a list of directory names for the folders that are the Git repositories I want to update.

    Update-HelmRepositoryList

    This script iterates through the given folders to find the chart.yaml files in it. For every dependency in the found chart files, it adds the repository to the local helm if the URL does not already exist.

    Since I have been running this on my local machine, I only have to do this once. But, on a build agent, this script should be run every time to make sure the repository list contains all the necessary repositories for an update.

    Update-HelmCharts

    This script iterates through the given folders to find the chart.yaml files in it. For every dependency, the script determines if there is an updated version of the dependency available.

    If there is an update available, the Chart.yaml file is updated, and helm dependency update is run to update the Chart.lock file. Additionally, commit comments are created to note the version changes.

    For each chart.yaml file, a call to Update-FromAutoUpdate will be made to make additional updates if necessary.

    Update-FromAutoUpdate

    This script looks for a file called auto-update.json in the path given. The file has the following format:

    {
        "repository": "redis-stack/redis-stack",
        "stripVFromVersion": false,
        "tagPath": "redis.image.tag"
    }

    The script looks for the latest release from the repository in Github, using tag_name from Github as the version. If the latest release is newer than the current tagPath in values.yaml, the script then updates the tagPath in the values.yaml file to the new version. The script returns an object indicating whether or not an update was made, as well as a commit comment indicating the version jump.

    Right now, the auto-update only works for images that come from Github releases. I have one item (Proget) that needs to search a docker API directly, but that will be a future enhancement.

    Future Tasks

    Now that these are automated tasks, I will most likely create an Azure Pipeline that runs weekly to get these changes made and committed to Git.

    I have Argo configured to not auto-sync these applications, so even though the changes are made in Git, I still have to manually apply the updates. And I am ok with that. I like to stagger application updates, and, in some cases, make sure I have the appropriate backups before running an update. But this gets me to a place where I can log in to Argo and sync apps as I desire.

  • Adding a Little Style

    I have never really liked the default code blocks in WordPress. So I went looking to find a better plugin… Then I went looking again.

    Round 1

    I literally went into WordPress’ Plugins section, clicked on Add New, and searched for syntax. For whatever reason, I landed on SyntaxHighlighter Evolved, and it seemed like it fit the bill.

    So, I installed it, and went through the process of finding all my wp-code blocks so that I could convert them. That was a slightly laborious task, as I had to get the list from the database and then click through and edit. Sure, I could have probably wrote a small application to make the replacement. I only have 35 posts with code blocks, and I figured I’d only do it once, so I did it by hand.

    6 hours later…

    I hated it. I like the dark theme of my site, so the giant white blocks of goo on page were very intrusive. Language support was decent, but missing some of my favorites (including Dockerfile and HCL).

    A more detailed Google search lead me to Code Block Pro. Before I went through 35 posts replacing code blocks, though, I decided to test this one a bit more. The language list is extensive, and there are a bunch of included themes. There are a few dark themes, and GitHub Dark seems to fit fairly well into the site style.

    A bit more confident of my new choice, I went back through those 34 posts and replaced the SyntaxHighlighter Evolved blocks with Code Block Pro blocks. I am starting the stopwatch to see if this one makes it longer than 6 hours.