Tag: nginx

Modernizing the Gateway
From NGINX Ingress to Envoy Gateway

As with any good engineer, I cannot leave well enough alone. Over the past week, I’ve been working through a significant infrastructure modernization across my home lab clusters – migrating from NGINX Ingress to Envoy Gateway and implementing the Kubernetes Gateway API. This also involved some necessary housekeeping with chart updates and a shift to Server-Side Apply for all ArgoCD-managed resources.

Why Change?

The timing couldn’t have been better. In November 2024, the Kubernetes SIG Network and Security Response Committee announced that Ingress NGINX will be retired in March 2026. The project has struggled with insufficient maintainer support, security concerns around configuration snippets, and accumulated technical debt. After March 2026, there will be no further releases, security patches, or bug fixes.

The announcement strongly recommends migrating to the Gateway API, described as “the modern replacement for Ingress.” This validated what I’d already been considering – the Gateway API provides a more standardized, vendor-neutral approach with better separation of concerns between infrastructure operators and application developers.

Envoy Gateway, being a CNCF project built on the battle-tested Envoy proxy, seemed like a natural choice for this migration. Plus, it gave me an excuse to finally move off Traefik, which was… well, let’s just say it was time for a change.

The Migration Journey

The migration happened in phases across my ops-argo, ops-prod-cluster, and ops-nonprod-cluster repositories. Here’s what changed:

Phase 1: Adding Envoy Gateway

I started by adding Envoy Gateway as a cluster tool, complete with its own ApplicationSet that deploys to clusters labeled with spydersoft.io/envoy-gateway: "true". The deployment includes:
- GatewayClass and Gateway resources: Defined a main gateway that handles traffic routing
- EnvoyProxy configuration: Set up with a static NodePort service for consistent external access
- ClientTrafficPolicy: Configured to properly handle forwarded headers – crucial for preserving client IP information through the proxy chain
The Envoy Gateway deployment lives in the envoy-gateway-system namespace and exposes services via NodePort 30080 and 30443, making it easy to integrate with my existing network setup.

Phase 2: Migrating Applications to HTTPRoute

This was the bulk of the work. Each application needed its Ingress resource replaced with an HTTPRoute. The new Gateway API resources are much cleaner. For example, my blog (www.mattgerega.com) went from an Ingress definition to this:
```
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: wp-mattgerega
  namespace: sites
spec:
  parentRefs:
    - name: main
      namespace: envoy-gateway-system
  hostnames:
    - www.mattgerega.com
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /
      backendRefs:
        - name: wp-mattgerega-wordpress
          port: 80
```
Much more declarative and expressive than the old Ingress syntax.

I migrated several applications across both production and non-production clusters:
- Gravitee API Management
- ProGet (my package management system)
- n8n and Node-RED instances
- Linkerd-viz dashboard
- ArgoCD (which also got a GRPCRoute for its gRPC services)
- Identity Server (across test and stage environments)
- Tech Radar
- Home automation services (UniFi client and IP manager)
Phase 3: Removing the Old Guard

Once everything was migrated and tested, I removed the old ingress controller configurations. This cleanup happened across all three repositories:

ops-prod-cluster:
- Removed all Traefik configuration files
- Cleaned up traefik-gateway.yaml and traefik-middlewares.yaml
ops-nonprod-cluster:
- Removed Traefik configurations
- Deleted the RKE2 ingress NGINX HelmChartConfig (rke2-ingress-nginx-config.yaml)
The cluster-resources directories got significantly cleaner with this cleanup. Good riddance to configuration sprawl.

Phase 4: Chart Maintenance and Server-Side Apply

While I was in there making changes, I also:
- Bumped several Helm charts to their latest versions:
  - ArgoCD: 9.1.5 → 9.1.7
  - External Secrets: 1.1.0 → 1.1.1
  - Linkerd components: 2025.11.3 → 2025.12.1
  - Grafana Alloy: 1.4.0 → 1.5.0
  - Common chart dependency: 4.4.0 → 4.5.0
  - Redis deployments updated across production and non-production
- Migrated all clusters to use Server-Side Apply (ServerSideApply=true in the syncOptions):
  - All cluster tools in ops-argo
  - Production application sets (external-apps, production-apps, cluster-resources)
  - Non-production application sets (external-apps, cluster-resources)
This is a better practice for ArgoCD as it allows Kubernetes to handle three-way merge patches instead of client-side strategic merge, reducing conflicts and improving sync reliability.

Lessons Learned

Gateway API is ready for production: The migration was surprisingly smooth. The Gateway API resources are well-documented and intuitive. With NGINX Ingress being retired, now’s the time to make the jump.

HTTPRoute vs. Ingress: HTTPRoute is more expressive and allows for more sophisticated routing rules. The explicit parentRefs concept makes it clear which gateway handles which routes.

Server-Side Apply everywhere: Should have done this sooner. The improved conflict handling makes ArgoCD much more reliable, especially when multiple controllers touch the same resources.

Envoy’s configurability: The EnvoyProxy custom resource gives incredible control over the proxy configuration without needing to edit ConfigMaps or deal with annotations.

Multi-cluster consistency: Making these changes across production and non-production environments simultaneously kept everything aligned and reduced cognitive overhead when switching between environments.

Current Status

All applications across all clusters are now running through Envoy Gateway with the Gateway API. Traffic is flowing correctly, TLS is terminating properly, and I’ve removed all the old ingress-related configuration from both production and non-production environments.

The clusters are more standardized, the configuration is cleaner, and I’m positioned to take advantage of future Gateway API features like traffic splitting and more advanced routing capabilities. More importantly, I’m ahead of the March 2026 retirement deadline with plenty of time to spare.

Now, the real question: what am I going to tinker with next?
December 9, 2025
An epic journey…
I got all the things I needed to diagnose my BananaPi M5 issues. And I took a very long, windy road to a very simple solution. But I learned an awful lot in the process.

Reconstructing the BananaPi M5

I got tired of poking around the BananaPi M5, and decided I wanted to start from scratch. The boot order of the BananaPi means that, in order to format the EMMC and start from scratch, I needed some hardware.

I ordered a USB to Serial debug cable so that I could connect to the BananaPi (BPi from here on out), interrupt the boot sequence, and use uboot to wipe the disk (or at least the MBR). That would force the BPi to use the SD as a boot drive. From there, I would follow the same steps I did in provisioning the BPi the first time around.

For reference, with the cable I bought, I was able to connect the debug using Putty with the following settings:

Your COM port will probably be different: open the Device Manager to find yours.

I also had to be a little careful about wiring: When I first hooked it up, I connected the transmit cable (white) to the Tx pin, and the receive cable (green) to the Rx pin. That gave me nothing. Then I realized that I had to swap the pins: The transmit cable (white) goes to the Rx pin, and the receive cable (green) goes to the Tx pin. Once swapped, the terminal lit up.

I hit the reset button on the BPi, and as soon as I could, I hit Ctrl-C. This took me into the uboot console. I then followed these steps to erase the first 1000 blocks. From there, I had a “cleanish” BPi. To fully wipe the EMMC, I booted an SD card that had the BPI Ubuntu image, and wiped the entire disk:
```
dd if=/dev/zero of=/dev/mmcblk0 bs=1M
```
Where /dev/mmcblk0 is the address of the EMMC drive. This writes all zeros to the EMMC, and cleaned it up nicely.

New install, same problem

After following the steps to install Ubuntu 20.04 to the EMMC, I did an apt upgrade and a do-release-upgrade to get up to 22.04.3. And the SAME network issue reared its ugly head. Back at it with fresh eyes, I determined that something changed in the network configuration, and the cloud-init setup that had worked for this particular BPI image is no longer valid.

What were the symptoms? I combed through logs, but the easiest identifier was, when running networkctl, eth0 was reporting as unmanaged.

So, I did two things: First, disable the network configuration in cloud-init by changing /etc/cloud/cloud.cfg.d/99-fake_cloud.cfg to the following:
```
datasource_list: [ NoCloud, None ]
datasource:
  NoCloud:
    fs_label: BPI-BOOT
network: { config : disable }
```
Second, configure netplan by editing /etc/netplan/50-cloud-init.yaml:
```
network:
    ethernets:
        eth0:
            dhcp4: true
            dhcp-identifier: mac
    version: 2
```
After that, I ran netplan generate and netplan apply, and the interface now showed as managed when executing networkctl. More importantly, after a reboot, the BPi initialized the network and everything is up and running.

Backup and Scripting

This will be the second proxy I’ve configured in under 2 months, so, well, now is the time to write the steps down and automate if possible.

Before I did anything, I created a bash script to copy important files off of the proxy and onto my NAS. This includes:
- Nginx configuration files
- Custom rsyslog file for sending logs to loki
- Grafana Agent configuration file
- Files for certbot/cloudflare certificate generation
- The backup script itself.
With those files on the NAS, I scripted out restoration of the proxy to the fresh BPi. I will plan a little downtime to make the switch: while the switchover won’t be noticeable to the outside world, some of the internal networking takes a few minutes to swap over, and I would hate to have a streaming show go down in the middle of viewing…. I would certainly take flak for that.
February 23, 2024
A Tale of Two Proxies
I am working on building a set of small reference applications to demonstrate some of the patterns and practices to help modernize cloud applications. In configuring all of this in my home lab, I spent at least 3 hours fighting a problem that turned out to be a configuration issue.

Backend-for-Frontend Pattern

I will get into more details when I post the full application, but I am trying to build out a SPA with a dedicated backend API that would host the SPA and take care of authentication. As is typically the case, I was able to get all of this working on my local machine, including the necessary proxying of calls via the SPA’s development server (again, more on this later).

At some point, I had two containers ready to go: a BFF container hosting the SPA and the dedicated backend, and an API container hosting a data service. I felt ready to deploy to the Kubernetes cluster in my lab.

Let the pain begin!

I have enough samples within Helm/Helmfile that getting the items deployed was fairly simple. After fiddling with the settings of the containers, things were running well in the non-authenticated mode.

However, when I clicked login, the following happened:
1. I was redirected to my oAuth 2.0/OIDC provider.
2. I entered my username/password
3. I was redirected back to my application
4. I got a 502 Bad Gateway screen
502! But, why? I consulted Google and found any number of articles indicating that, in the authentication flow, Nginx’s default header size limit is too small to limit what might be coming back from the redirect. So, consulting the Nginx configuration documents, I changed the Nginx configuration in my reverse proxy to allow for larger headers.

No luck. Weird. In the spirit of true experimentation (change one thing at a time), I backed those changes out and tried changing the configuration of my Nginx Ingress controller. No luck. So what’s going on?

Too Many Cooks

My current implementation looks like this:
```
flowchart TB
    A[UI] --UI Request--> B(Nginx Reverse Proxy)
    B --> C("Kubernetes Ingress (Nginx)")
    C --> D[UI Pod]
```
There are two Nginx instances between all of my traffic: an instance outside of the cluster that serves as my reverse proxy, and an Nginx ingress controller that serves as the reverse proxy within the cluster.

I tried changing both separately. Then I tried changing both at the same time. And I was still seeing this error. As it turns out, well, I was being passed some bad data as well.

Be careful what you read on the Internet

As it turns out, the issue was the difference in configuration between the two Nginx instances and some bad configuration values that I got from old internet articles.

Reverse Proxy Configuration

For the Nginx instance running on Ubuntu, I added the following to my nginx.conf file under the http section:
```
        proxy_buffers 4 512k;
        proxy_buffer_size 256k;
        proxy_busy_buffers_size 512k;
        client_header_buffer_size 32k;
        large_client_header_buffers 4 32k;
```
Nginx Ingress Configuration

I am running RKE2 clusters, so configuring Nginx involves a HelmChartConfig resource being created in the kube-system namespace. My cluster configuration looks like this:
```
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rke2-ingress-nginx
  namespace: kube-system
spec:
  valuesContent: |-
    controller:
      kind: DaemonSet
      daemonset:
        useHostPort: true
      config:
        use-forwarded-headers: "true"
        proxy-buffer-size: "256k"
        proxy-buffers-number: "4"
        client-header-buffer-size: "256k"
        large-client-header-buffers: "4 16k"
        proxy-body-size: "10m"
```
The combination of both of these settings got my redirects to work without the 502 errors.

Better living through logging

One of the things I fought with on this was finding the appropriate logs to see where the errors were occurring. I’m exporting my reverse proxy logs into Loki using a Promtail instance that listens on a syslog port. So I am “getting” the logs into Loki, but I couldn’t FIND them.

I forgot about the facility in syslog: I have the access logs sending as local5, but did configured the error logs without pointing them to local5. I learned that, by default, they go to local7.

Once I found the logs I was able to diagnose the issue, but I spent a lot of time browsing in Loki looking for those logs.
January 23, 2024
Going Banana
That’s right… just one banana. I have been looking to upgrade the Raspberry PI 3 that has been operating as home lab’s reverse proxy. While it would have been more familiar to find another Raspberry Pi 4 to use, their availability is, well, terrible. I found a workable, potentially more appropriate, solution in the Banana Pi M5.

If imitation is the sincerest form of flattery…

Then the Raspberry Pi Foundation should be blushing so much they may pass out. A Google search of “Rasbperry Pi Alternatives 2023” leads to a trove of reviews on various substitutes. Orange Pis, Rock Pis, Banana Pis…. where do I begin?

It is suffice to say that Single Board Computers (SBCs) have taken a huge step forward in the past few years, and many companies are trying to get in the game. It became clear that, well, I needed a requirements list.

Replacing the Pi3 Proxy

I took a few minutes to come up with my requirements list:
- Ubuntu – My Pi3 Proxy has been running Nginx on Ubuntu for over a year. I’m extremely comfortable with the setup that I have, including certbot for automating SSL and the Grafana Agent to report statistics to Mimir. My replacement needs to run Ubuntu, since I have no desire to learn another distro.
- Gigabit Ethernet – The Pi3 does not support true Gigabit ethernet because of the USB throughput. I want to upgrade, since the proxy is handling all of my home lab traffic. Of note, though: I do not need Wifi or Bluetooth support.
- Processor/Memory – The Pi3 runs the 1.4 GHz Quad Core Cortex A53 processor with a whopping 1GB of RAM. Truthfully, the Pi3 handles the traffic well, but an upgrade would be nice. Do I need 8GB of RAM? Again, nice to have: my minimum is 4 GB.
- eMMC – Nginx does a lot of logging, and I worry a bit about the read/write limits on the SD cards. As I did my research, a few of the Pi alternatives have eMMC flash memory onboard. This would be a bit more resilient than an SD card, and should be faster. There are also some hats to support NVMe drives. So, yes, I want some solid memory.
Taking this list of requirements, I started looking around, and one board stood out: the Banana Pi M5.

Not the latest Banana in the bunch

The Banana Pi M5 is not the newest model from Banana Pi. The M6 is their latest offering, and sports a much stronger chipset. However, I had zero luck finding one in stock for a reasonable price. I found a full M5 kit on Amazon for about $120 USD.

The M5’s Cortex-A55 is a small step up from the RPi3 and sports 4GB of RAM, so my processor/memory requirements were met.

Gigabit ethernet? Check. The M5 has no built-in Wifi, but, for what I need it for, I frankly do not care.

Ubuntu? This one was tough to source: their site shows downloads for Ubuntu 20.04 images, but I had to dig around the Internet to verify that someone was able to run a release upgrade to get it to 22.04.

eMMC? A huge 16GB eMMC flash chip. Based on my current usage, this will more than cover my needs.

The M5 looked to be a great upgrade to my existing setup without breaking the bank or requiring me to learn something new. Would it be that easy?

Making the switch

After receiving the M5 (in standard Amazon 2 day fashion), I got to work. The kit included a “built it yourself” case, heatsinks, and a small fan. After a few minutes of trying to figure out how the case went together, I had everything assembled.

Making my way over to the M5 Wiki, I followed the steps on the page. Surprisingly, it really was that simple. I imaged an SD card so I could boot Ubuntu, then followed their instructions for installing the Linux image to EMMC. I ejected the SD Card, rebooted, and I was up and running.

A quick round of apt upgrade and do-release-upgrade later, and I was running Ubuntu 22.04. Installed nginx, certbot, and grafana-agent, copied my configuration files over from the old Pi (changing the hostnames, of course), and I was re-configured in easily under 30 minutes.

The most satisfying portion of this project was, oddly enough, changing the DNS entries and port forwarding rules to hit the M5 and watching the log entries switch places:

The green line is log entries from the M5, the yellow line is log entries from the Pi3. You can see I had some stragglers hitting the old pi, but once everything flushed out, the Pi was no longer in use. I shut it down to give it a little break for now, as I contemplate what to do with it next.

Initial Impressions

The M5 is certainly snappier, although load levels are about the same as were reported by the RPi3. The RPi3 was a rock, always on, always working. I hope for the same with the M5, but, unfortunately, only time will tell.
March 17, 2023
My 15 pieces of flair… Cloudflare
With parts of my home lab exposed to the internet for my own convenience, it is always good to add layers of protection to incoming traffic. At a colleague’s suggestion, I took Cloudflare up on their free WAF offering to help add some protection to my setup. As a bonus, I have a much better DNS for my domains, which made automating my SSL certificate renewals a snap.

What’s a WAF?

A Web Application Firewall, or WAF, protects your web applications by processing requests and monitoring for attacks. While not a comprehensive solution, it adds a layer of protection.

Cloudflare offers cloud-based services which have a variety of price points. For my hobby/home lab sites, well, free as in beer is the best price point for me. Now, you may notice on the price sheet, that “WAF” is not actually included in the free version. The free edition does not let you define custom firewall rules and block, challenge, or log requests that match those rules.

Well, that’s not 100% accurate: you get 5 active firewall rules with the free plan. Not enough to go crazy, but enough to test if you need it.

And, well, I do not particularly care: For my home lab, the features of most interest to me are the DNS, Caching, DDoS Protection, and the Managed Ruleset.

Basic Content Caching

Cloudflare provides some basic caching on my proxied sites, which definitely helps with sites like WordPress. My PageSpeed insights scores are almost 100 ms faster on mobile devices (down from 310 ms), which is pretty good. While I have never paid too much attention to page load speeds, it is good to know that I can improve some things while adding a layer of protection

DDoS and Managed Rulesets

Truthfully, I have not read up on much of this, and have left the Cloudflare defaults pretty much intact. Cloudflare’s blog does a good job of explaining the Managed Rules, and their documentation covers the DDoS rulesets.

Perhaps if I get bored, or need something to put me to sleep at night, I will start reading up on those rulesets. For now, they are in place, which gives me a little more protection than I had without them.

Cloudflare DNS

Truthfully, if Cloudflare did nothing else than manage my DNS in a way that allowed certbot to automatically renew my Let’s Encrypt certificates, I would have still moved everything over. Prior to the cutover, I was using GoDaddy’s DNS management and, well, it’s a pain. GoDaddy is very good at selling websites, but DNS management is clearly very low on their list. Cloudflare’s DNS, meanwhile, is simple to manage both through their portal and through the APIs.

With my DNS moved over, I revisited the certificates on my internal reverse proxy. Following the instructions from Vineet Choudhary over at developerinsider.co, I updated certbot to renew using the Cloudflare plugin.

Automagic Renewals?

In the past, with certbot-auto, you would have to schedule a cron job to schedule automatic renewals. The new certbot snap, however, uses systemctl and timers to achieve the same. So, with my certificates renewed using the correct plugin, I ran a quick test:
```
sudo certbot renew --dry-run
```
The dry run succeeded without issue. So I checked the timers with the following command:
```
systemctl list-timers
```
Lo and behold, the certbot time is scheduled to run in the middle of the night.

Restarting Nginx on Certbot renewals

There is one small issue: even though I am using the --cert-only option to only get or renew certificates and not edit Nginx, I AM using Nginx as a reverse proxy. Therefore, I need a way to restart Nginx after certbot has done its thing. I found this short article and followed the instructions to edit the /etc/letsencrpyt/cli.ini file with a deploy hook.

The article above noted that, to test, you can run the following:
```
certbot renew --dry-run
```
However, for me, this did NOT trigger the deploy hook. To force triggering the deploy hook, I needed to run this command:
```
sudo certbot renew --dry-run --run-deploy-hooks
```
This command executed the renewal dry run and successfully reloaded Nginx.

Minimal Pieces of Cloudflare

Sure, I have only scratched the surface of Cloudflare’s offerings by adding some free websites and proxying some content. But, as I mentioned, it adds a layer of protection that I did not have before. And, in this day and age, the wire coming into the house presents a bigger security threat than the front door.
March 6, 2023
Creating a simple Nginx-based web server image
One of the hardest parts of blogging is identifying topics. I sometimes struggle with identifying things that I have done that would be interesting or helpful to others. In trying to establish a “rule of thumb” for such decisions, I think things that I have done at least twice qualify as potential topics. As it so happens, I have had to construct simple web server containers twice in the last few weeks.

The Problem

Very simply, I wanted to be able to build a quick and painless container to host some static web sites. They are mostly demo sites for some of the UI libraries that we have been building. One is raw HTML, the other is built using Storybook.js, but both end up being a set of HTML/CSS/JS files to be hosted.

Requirements

The requirements for this one are pretty easy:
- Host a static website
- Do not run as root
There was no requirement to be able to change the content outside of the image: changes would be handled by building a new image.

My Solution

I have become generally familiar with Nginx for a variety of uses. It serves as a reverse proxy for my home lab and is my go-to ingress controller for Kubernetes. Since I am familiar with its configuration, I figured it would be a good place to start.

Quick But Partial Success

The “do not run as root” requirement led me to the Nginx unprivileged image. With that as a base, I tried something pretty quick and easy:
```
# Dockerfile
FROM nginxinc/nginx-unprivileged:1.20 as runtime


COPY output/ /usr/share/nginx/html
```
Where output contains the generated HTML files that I wanted to host.

This worked great for the first page that loaded. However, links to other pages within the site kept coming back from Nginx with :8080 as the port. Out networking configuration is offloading SSL outside of the cluster and using ingress within the cluster, so I did not want any port forwarding at all.

Custom Configuration Completes the Set

At that point, I realized that I needed to configure Nginx to disabled the port redirects, and then include the new configuration in my container. So I trapsed through the documentation for the Nginx containers. As it turns out, the easiest way to configure these images is to replace the default.conf file in /etc/nginx/conf.d folder.

So I went about creating a new Nginx config file with the appropriate settings:
```
server { 
  listen 8080;
  server_name localhost;
  port_in_redirect off;
  
  location / {
    root /usr/share/nginx/html;
    index index.html index.htm;
  }
  error_page   500 502 503 504  /50x.html;
    location = /50x.html {
        root   /usr/share/nginx/html;
    }
}
```
From there, my Dockerfile changed only slightly:
```
# Dockerfile
FROM nginxinc/nginx-unprivileged:1.20 as runtime
COPY nginx/default.conf /etc/nginx/conf.d/default.conf
COPY output/ /usr/share/nginx/html
```
Success!

With those changes, the image built with the appropriate files and the links no longer had the port redirect. Additionally, my containers are not running as root, so I do not run afoul of our cluster’s policy management rules.

Hope this helps!
May 23, 2022
Tech Tip – Turn on forwarded headers in Nginx
I have been using Nginx as a reverse proxy for some time. In the very first iteration of my home lab, it lived on a VM and allowed me to point my firewall rules to a single target, and then route traffic from there. It has since been promoted to a dedicated Raspberry Pi in my fight with the network gnomes.

My foray into Kubernetes in the home lab has brought Nginx in as an ingress controller. While there are many options for ingress, Nginx seems the most prevalent and, in my experience, the easiest to standardize on across a multitude of Kubernetes providers. As we drive to define what a standard K8 cluster looks like across our data centers and public cloud providers, Nginx seemed like a natural choice for our ingress provider.

Configurable to a fault

The Nginx Ingress controller is HIGHLY configurable. There are cluster-wide configuration settings that can be controlled through ConfigMap entries. Additionally, annotations can be used on specific ingress objects to control behavior on individual ingress.

As I worked with one team to setup Duende’s Identity Server, we started running into issues with the identity server endpoints using http instead of https in its discovery endpoints (such as /.well-known/openid-configuration. Most of our research suggest that the X-Forwarded-* headers needed to be configured (which we did), but we were still seeing the wrong scheme in those endpoints.

It was a weird problem: I had never run into this issue in my own Identity Server instance, which is running in my home Kubernetes environment. I figured it had to do with an Nginx setting, but had a hard time figuring out which one.

One blog post pointed me in the right direction. Our Nginx ingress install did not have the use-forwarded-headers setting configured in the ConfigMap, which meant the X-Forwarded-* headers were not being passed to the pod. A quick change of our deployment project, and the openid-configuration endpoint returned the appropriate schemes.

For reference, we are using the ingress-nginx helm chart. Adding the following to our values file solved the issue:
```
controller:
  replicaCount: 2
  
  service:
    ... several service settings
  config:
    use-forwarded-headers: "true"
```
Investigation required

What I do not yet know is, whether or not I randomly configured this at home and just forgot about it, or if it is a default of the Rancher Kubernetes Engine (RKE) installer. I use RKE at home to stand up my clusters, and one of the add-ons I have it configure is ingress with Nginx. Either I have settings in my RKE configuration to forward headers or it’s a default of RKE…. Unfortunately, I am at a soccer tournament this weekend, so the investigation will have to wait until I get home.

Update:

Apparently I did know about use-forwarded-headers earlier: it was part of the options I had set in my home Kubernetes clusters. One of many things I have forgotten.
March 5, 2022
ISY and the magic network gnomes

For nearly 2 years, I struggled mightily with communication issues between my ISY 994i and some of my docker images and servers. So much, in fact, that I had a fairly long running post in the Universal Devices forums dedicated to the topic.

I figure it is worth a bit of a rehash here, if only to raise the issue in the hopes that some of my more network-experienced contacts can suggest a fix.

The Beginning

The initial post was essentially about my ASP.NET Core API (.net Core 2.2 at the time) not being able to communicate with the ISY’s REST API. You can read through the initial post for details, but, basically, it would hit it once, then timeout on subsequent requests.

It would seem that some time between my original post and the administrator’s reply, I set the container’s networking to host and the problem went away.

In retrospect, I had not been heavily using that API anyway, so it may have just been hidden a bit better by the host network. In any case, I ignored it for a year.

The Return

About twenty (that’s right, 20) months later, I started moving my stuff to Kubernetes, and the issue reared its ugly head. I spent a lot of time trying to get some debug information from the ISY, which only confused me more.

As I dug more into when it was happening, it occurred to me that I could not reliably communicate with the ISY from any of the VMs on my HP Proliant server. Also, and, more puzzling, I could not do a port 80 retrieval from the server itself to the ISY. Oddly, though, I’m able to communicate with other hardware devices on the network (such as my MiLight Gateway) from the server and it’s VMs. Additionally, the ISY responds to pings, so it is able to be reached.

Time for a new proxy

Now, one of the VMs on my server was an Ubuntu VM that was serving as an NGINX reverse proxy. For various reasons, I wanted to move that from a virtual machine to a physical box. This, it would seem, would be a good time to see if a new proxy leads to different results.

I had an old Raspberry Pi 3B+ lying around, and that seemed like the perfect candidate for a stand alone proxy. So I did a quick image of an SD card with Ubuntu 20, copied my Nginx configuration files from the VM to the Pi, and re-routed my firewall traffic to the proxy.

Not only did that work, but it solved the issue of ISY connectivity. Routing traffic through the PI, I am able to communicate with the ISY reliably from my server, all of my VMs, and other PCs on the network.

But, why?

Well, that is the million dollar question, and, frankly, I have no clue. Perhaps it has to do with the NIC teaming on the server, or some oddity in the network configuration on the server. But I burned way too many hours on it to want to dig more into it.

You may be asking, why a hardware proxy? I liked the reliability and smaller footprint of a dedicated Raspberry PI proxy, external to the server and any VMs. It made the networking diagram much simpler, as traffic now flows neatly from my gateway to the proxy and then to the target machine. It also allows me to control traffic to the server in a more granular fashion, rather than having ALL traffic pointed to a VM on the server, and then routed via proxy from there.

November 29, 2021