Kubernetes Observability, Part 4 – Using Linkerd for Service Observability

This post is part of a series on observability in Kubernetes clusters:

Part 1 – Collecting Logs with Loki
Part 2 – Collecting Metrics with Prometheus
Part 3 – Dashboards with Grafana
Part 4 – Using Linkerd for Service Observability (this post)
Part 5 – Using Mimir for long-term metric storage

As we start to look at traffic within our Kubernetes clusters, the notion of adding a service mesh crept into our discussions. I will not pretend to be an expert in service meshes, but the folks at bouyant.io (the creators of Linkerd) have done a pretty good job of explaining service meshes for engineers.

My exercise to install Linkerd as a cluster was an exercise in “can I do it” more than having a need for a service mesh in place. However, Linkerd’s architecture is such that I can have Linkerd installed in the cluster, but only active on services that need it. This is accomplished via pod annotations, and make the system very configurable.

Installing Linkerd

With my ArgoCD setup, adding Linkerd as a cluster tool was pretty simple: I added the chart definition to the repository, then added a corresponding ApplicationSet definition. The ApplicationSet defined a cluster generator with a label match, meaning Linkerd would only be installed to clusters where I added spydersoft.io/linkerd=true as a label on the ArgoCD cluster secret.

The most troublesome part of all of the installation process was figuring out how to manage Linkerd via GitOps. The folks at Linkerd, however, have a LOT of guides to help. You can review my chart definition for my installation methods, however, that was built from the following Linkerd articles:

Many kudos to the Linkerd team, as their documentation was thorough and easy to follow.

Adding Linkerd-viz

Linkerd-viz is an add-on to Linkerd that has its own helm chart. As such, I manage it as a separate cluster tool. The visualization add-on has a dashboard that can be exposed via ingress and provide an overview of Linkerd and the metrics it is collecting. In my case, I tried to expose Linkerd-viz on a subpath (using my cluster’s internal domain name as the host). I ran into some issues (more on that below), but overall it works well.

I broke it…

As I started adding podAnnotations to inject Linkerd into my pods, things seemed to be “just working.” I even decorated my Nginx ingress controllers following the Linkerd guide, which meant traffic within my cluster was all going through Linkerd. This seemed to work well, until I tried to access my installation of Home Assistant. I spent a good while trying to debug, but as soon as I removed the pod annotations from Nginx, Home Assistant started working. While I am sure there is a way to fix that, I have not had much time to devout to the home lab recently, so that is on my to do list.

I also noticed that the Linkerd-viz dashboard does not, at all, like to be hosted in a non-root URL. This has been documented as a bug in Linkerd, but is currently marked with the “help wanted” tag, so I am not expecting it to be fixed anytime soon. However, that bug identifies an ingress configuration snippet that can be added to the ingress definition to provide some basic rewrite functionality. It is a dirty workaround, and does not fix everything, but it is servicable.

Benefits?

For the pods that I have marked up, I can glance at the network traffic and latency between the services. I have started to create Grafana dashboards in my external instance to pull those metrics into an easy-to-read graphs for network performance.

I have a lot more learning to do when it comes to Linkerd. While it is installed and running, I am certainly not using it for any heavy tasking. I hope to make some more time to investigate, but for now, I have some additional network metrics that help me understand what is going on in my clusters.