Category: Open Source

  • Snakes… Why Did It Have To Be Snakes?

    What seems like ages ago, I wrote some Python scripts to keep an eye on my home lab. What I did not realize is that little introduction to Python would help me dive into the wonderful world of ETL (or ELT, more on that later).

    Manage By Numbers

    I love numbers. Pick your favorite personality profile, and I come out as the cold, calculated, patient person who needs all the information to make a decision. As I ramp up and improve my management skills after a brief hiatus as an individual contributor, I identified a few blinds spots that I wanted to address with my team. But first, I need the data, and that data currently lives in our Jira Cloud instance and an add-on called Tempo Timesheets.

    Now, our data teams have started to build out an internal data warehouse for our various departments to collect and analyze data from our various sales systems. They established ELT flow for this warehouse with the following toolsets:

    • Singer.io – Establish data extraction and loading
    • dbt – Define data transformations
    • Prefect – Used to orchestrate flows via Singer Taps & Targets and dbtCore transformations.

    Don’t you mean ETL?

    There are two primary data integration methods:

    • ETL – Extract, Transform, and Load
    • ELT – Extract, Load, Transform

    At their core, they have the same basic task: get data from one place to another. The difference lies in where data transformations are processed.

    In ETL, data is extracted from the source system, transformed, and then loaded into the destination system (data warehouse) where it can be analyzed. In this case, the raw data is not stored within the data warehouse: only the transformed data is available.

    ELT, on the other hand, loads raw data directly into the data warehouse, where transformations can be exercised within the data warehouse itself. Since the raw data is stored in the warehouse, multiple transformations can be run without accessing the source system. This allows for data activities such as cleansing, enrichment, and transformation to occur on the same data set with less strain on the source system.

    In our case, an ELT transition made the most sense: we will have different transformations for different departments, including the need to perform point-in-time transformations for auditing purposes.

    Getting Jira Data

    Jira data was, well, the easier part. Singer.io maintains a Jira tap to pull data our of Jira Cloud using Jira’s APIs. “Taps” are connectors to external systems, and Singer has a lot of them. “Targets” are ways to load the data from tap streams into other systems. Singer does not have as many official targets, however, there are a number of open source contributors with additional targets. We are using the Snowflake target to load data into a Snowflake instance.

    Our team built out the data flow, but we were missing Tempo data. Tempo presents some REST APIs, so I figured I could use the tap-jira code as a model to build out a custom tap for Tempo data. And that got me back into Python.

    Environment Setup

    I’m running WSL2 on Windows 11 with Ubuntu 22.04. I finished up my tap development, things seemed to be working fine using virtualenv to isolate, and I had finished testing on my Tempo tap. I wanted to test pushing the data into Snowflake, and upon trying to load the target-snowflake library, I got a number of errors about version incompatibility.

    Well hell. As it turns out, most of the engineers at work are using Windows 10/WSL2 with Ubuntu 20.04. With that, they are running Python 3.8. I was running 3.10. A few quick Google searches and I quickly realized that I need a better way to isolate my environments than virtualenv. Along comes a bigger snake…

    Anaconda

    My Google searches led me to Anaconda. First, I’m extremely impressed that they got that domain name. Second, Anaconda is way more than what I’m using it for: I’m using the environment management, but there is so much more.

    I installed Anaconda according to the Linuxhint.com guide and, within about 10 minutes, I had a virtual environment with Python 3.8 that I could use to build and test my taps and targets. The environment management is, in my mind, much easier than using virtualenv: the conda command can be used to list environments and search for available packages, rather than remembering where you stored your virtual environment files and how you activate them.

    Not changing careers yet

    Sure, I wrote a tap for Tempo data. Am I going to change over to a data architect? Probably not. But at least I can say I succeeded in simple tap development.

    A Note on Open Source

    I’m a HUGE fan of open source and look to contribute where I can. However, while this tap “works,” it’s definitely not ready for the world. I am only extracting a few of the many objects available in the Tempo APIs. I have no tests to ensure correctness. And, most importantly, I have no documentation to guide users.

    Until those things happen, this tap will be locked up behind closed doors. When I find the time to complete it, I will make it public.

  • Installing Minio on a Synology Diskstation with Nginx SSL

    In an effort to get rid of a virtual machine on my hypervisor, I wanted to move my Minio instance to my Synology. Keeping the storage interface close to the storage container helps with latency and is, well, one less thing I have to worry about in my home lab.

    There are a few guides out there for installing Minio on a Synology. Jaroensak Yodkantha walks you through the full process of setting up the Synology and Minio using a docker command line. The folks over at BackupAssist show you how to configure Minio through the Diskstation Manager web portal. I used the BackupAssist article to get myself started, but found myself tweaking the setup because I want to have SSL communication available through my Nginx reverse proxy.

    The Basics

    Prep Work

    I went in to the Shared Folder section of the DSM control panel and created a new shared folder called minio. The settings on this share are pretty much up to you, but I did this so that all of my Minio data was in a known location.

    Within the minio folder, I created a data folder and a blank text file called minio. Inside the minio file, I setup my minio configuration:

    # MINIO_ROOT_USER and MINIO_ROOT_PASSWORD sets the root account for the MinIO server.
    # This user has unrestricted permissions to perform S3 and administrative API operations on any resource in the deployment.
    # Omit to use the default values 'minioadmin:minioadmin'.
    # MinIO recommends setting non-default values as a best practice, regardless of environment
    
    MINIO_ROOT_USER=myadmin
    MINIO_ROOT_PASSWORD=myadminpassword
    
    # MINIO_VOLUMES sets the storage volume or path to use for the MinIO server.
    
    MINIO_VOLUMES="/mnt/data"
    
    # MINIO_SERVER_URL sets the hostname of the local machine for use with the MinIO Server
    # MinIO assumes your network control plane can correctly resolve this hostname to the local machine
    
    # Uncomment the following line and replace the value with the correct hostname for the local machine.
    
    MINIO_SERVER_URL="https://s3.mattsdatacenter.net"
    MINIO_BROWSER_REDIRECT_URL="https://storage.mattsdatacenter.net"

    It is worth noting the URLs: I want to put this system behind my Nginx reverse proxy and let it do SSL termination, and in order to do that, I found it easiest to use two domains: one for the API and one for the Console. I will get into more details on that later.

    Also, as always, change your admin username and password!

    Setup the Container

    Following the BackupAssist article, I installed the Docker package on to my Synology and opened it up. From the Registry menu, I searched for minio and found the minio/minio image:

    Click on the row to highlight it, and click on the Download button. You will be prompted for the label to download, I chose latest. Once the image is downloaded (you can check the Image tab for progress), go to the Container tab and click Create. This will open the Create Wizard and get you started.

    • On the Image screen, select the minio/minio:latest image.
    • On the Network screen, select the bridge network that is defaulted. If you have a custom network configuration, you may have some work here.
    • On the General Settings screen, you can name the container whatever you like. I enabled the auto-restart option to keep it running. On this screen, click on the Advanced Settings button
      • In the Environment tab, change MINIO_CONFIG_ENV_FILE to /etc/config.env
      • In the Execution Command tab, change the execution command to minio server --console-address :9090
      • Click Save to close Advanced Settings
    • On the Port Settings screen, add the following mappings:
      • Local Port 39000 -> Container Port 9000 – Type TCP
      • Local Port 39090 -> Container Port 9090 – Type TCP
    • On the Volume Settings Screen, add the following mappings:
      • Click Add File, select the minio file created above, and set the mount path to /etc/config.env
      • Click Add Folder, select the data folder created above, and set the mount path to /mnt/data

    At that point, you can view the Summary and then create the container. Once the container starts, you can access your Minio instance at http://<synology_ip_or_hostname>:39090 and log in with the password saved in your config file.

    What Just Happened?

    The above steps should have worked to create a Docker container running on Synology on your Minio. Minio has two separate ports: one for the API, and one for the Console. Reviewing Minio’s documentation, adding the --console-address parameter in the container execution is required now, and that sets the container port for the console. In our case, we set it to 9090. The API port defaults to 9000.

    However, I wanted to run on non-standard ports, so I mapped ports 39090 and 39000 to port 9090 and 9000, respectively. That means that traffic coming in on 39090 and 39000 get routed to my Minio container on ports 9090 and 9000, respectively.

    Securing traffic with Nginx

    I like the ability to have SSL communication whenever possible, even if it is just within my home network. Most systems today default to expecting SSL, and sometimes it can be hard to find that switch to let them work with insecure connections.

    I was hoping to get the console and the API behind the same domain, but with SSL, that just isn’t in the cards. So, I chose s3.mattsdatacenter.net as the domain for the API, and storage.mattsdatacenter.net as the domain for the Console. No, those aren’t the real domain names.

    With that, I added the following sites to my Nginx configuration:

    storage.mattsdatacenter.net
      map $http_upgrade $connection_upgrade {
          default Upgrade;
          ''      close;
      }
    
      server {
          server_name storage.mattsdatacenter.net;
          client_max_body_size 0;
          ignore_invalid_headers off;
          location / {
              proxy_pass http://10.0.0.23:39090;
              proxy_set_header Host $host;
              proxy_set_header X-Real-IP $remote_addr;
              proxy_set_header X-Forwarded-proto $scheme;
              proxy_set_header X-Forwarded-port $server_port;
              proxy_set_header X-Forwarded-for $proxy_add_x_forwarded_for;
    
              proxy_set_header Upgrade $http_upgrade;
              proxy_set_header Connection $connection_upgrade;
    
              proxy_http_version 1.1;
              proxy_read_timeout 900s;
              proxy_buffering off;
          }
    
        listen 443 ssl; # managed by Certbot
        allow 10.0.0.0/24;
        deny all;
    
        ssl_certificate /etc/letsencrypt/live/mattsdatacenter.net/fullchain.pem; # managed by Certbot
        ssl_certificate_key /etc/letsencrypt/live/mattsdatacenter.net/privkey.pem; # managed by Certbot
    }
    s3.mattsdatacenter.net
      map $http_upgrade $connection_upgrade {
          default Upgrade;
          ''      close;
      }
    
      server {
          server_name s3.mattsdatacenter.net;
          client_max_body_size 0;
          ignore_invalid_headers off;
          location / {
              proxy_pass http://10.0.0.23:39000;
              proxy_set_header Host $host;
              proxy_set_header X-Real-IP $remote_addr;
              proxy_set_header X-Forwarded-proto $scheme;
              proxy_set_header X-Forwarded-port $server_port;
              proxy_set_header X-Forwarded-for $proxy_add_x_forwarded_for;
    
              proxy_set_header Upgrade $http_upgrade;
              proxy_set_header Connection $connection_upgrade;
    
              proxy_http_version 1.1;
              proxy_read_timeout 900s;
              proxy_buffering off;
          }
    
        listen 443 ssl; # managed by Certbot
        allow 10.0.0.0/24;
        deny all;
    
        ssl_certificate /etc/letsencrypt/live/mattsdatacenter.net/fullchain.pem; # managed by Certbot
        ssl_certificate_key /etc/letsencrypt/live/mattsdatacenter.net/privkey.pem; # managed by Certbot
    }

    This configuration allows me to access the API and Console via domains using SSL terminated on the proxy. Configuring Minio is pretty easy: set MINIO_BROWSER_REDIRECT_URL to the URL of your console (In my case, port 39090), and MINIO_SERVER_URL to the URL of your API (port 39000).

    This configuration allows me to address Minio for S3 in two ways:

    1. Use https://s3.mattsdatacenter.net for secure connectivity through the reverse proxy.
    2. Use http://<synology_ip_or_hostname>:39000 for insecure connectivity directly to the instance.

    I have not had the opportunity to test the performance difference between option 1 and option 2, but it is nice to have both available. For now, I will most likely lean towards the SSL path until I notice degradation in connection quality or speed.

    And, with that, my Minio instance is now running on my Diskstation, which means less VMs to manage and backup on my hypervisor.

  • Using SonarCloud for Open Source

    My last few posts have centered around adding some code linting and analysis to C# projects. Most of this has been to identify some standards and best practices for my current position.

    During this research, I came across SonarCloud, which is Sonarqube’s hosted instance. SonarCloud is free for open source projects, and given the breadth of languages it supports, I have decided to start adding my open source projects to SonarCloud. This will allow some extra visibility into my open source code and provide me with a great sandbox for evaluating Sonarqube for corporate use.

    I added Sonar Analysis to a GitHub actions pipeline for my Hyper-V Info API. You can see the Sonar analysis on SonarCloud.io.

    The great part?? All the code is public, including the GitHub Actions pipeline. So, feel free to poke around and see how I made it work!

  • Tech Tips – Adding Linting to C# Projects

    Among the Javascript/Typescript community, ESlint and Prettier are very popular ways to enforce some standards and formatting within your code. In trying to find similar functionality for C#, I did not find anything as ubiquitous as ESLint/Prettier, but there are some front runners.

    Roslyn Analyzers and Dotnet Format

    John Reilly has a great post on enabling Roslyn Analyzers in your .Net applications. He also posted some instructions on using the dotnet format tool as a “Prettier for C#” tool.

    I will not bore you by re-hashing his posts, but following those posts allowed me to apply some basic formatting and linting rules to my projects. Additionally, the Roslyn Analyzers can be made to generate build warnings and errors, so any build worth its salt (builds that fail with warnings) will be free of undesirable code.

    SonarLint

    I was not really content to stop there, and a quick Google search led me to an interesting article around linting options for C#. One of those was SonarLint. While SonarLint bills itself as an IDE plugin, it has a Roslyn Analyzer package (SonarAnalyzer.CSharp) that can be added and configured in a similar fashion to the built-in Roslyn Analyzers.

    Following the instructions in the article, I installed SonarAnalyzer and configured it alongside the base Roslyn Analyzers. It produced a few more warnings, particularly around some best practices from Sonar that go beyond what the Microsoft standards apply.

    SonarQube, my old friend

    Getting into SonarLint brought be back to SonarQube. What seems like forever ago, but really was only a few years ago, SonarQube was something of a go-to tool in my position. We had hoped to gather a portfolio-wide view of our bugs, vulnerabilities, and code smells. For one reason or another, we abandoned that particular tool set.

    After putting SonarLint in place, I was interested in jumping back in, at least in my home lab, to see what kind of information I could get out of Sonar. I found the Kubernetes instructions and got to work setting up a quick instance on my production instance, alongside my Proget instance.

    Once installed, I have to say, the application has done well to improve the user experience. Tying in to my Azure DevOps instance was quick and easy, with very good in-application tutorials for that configuration. I setup a project based on the pipeline for my test application, made my pipeline changes, and waited for results…

    Failed! I kept getting errors about not being allowed to set the branch name in the Community edition. That is fair, and for my projects, I only really need analysis on the main branch, so I setup analysis to only happen on builds of main. Failed again!

    There seems to be a known issue around this, but thanks to the SonarSource community, I found a workaround for my pipeline. With that in place, I had my code analysis in place, but, well, what do I do with it? Well, I can add quality gates to fail builds based on missing code coverage, tweak my rule sets, and have a “portfolio wide” view of my private projects.

    Setting the Standard

    For any open source C# projects, simply building the linting/formatting into the build/commit process might be enough. If project maintainers are so inclined, they can add their projects to SonarCloud and get the benefits of SonarQube (including adding quality gates).

    For enterprise customers, the move to a paid tier depends on how much visibility you want in your code base. Sonar can be an expensive endeavor, but provides a lot of quality and tech debt tracking that you may find useful. My suggestion? Start with a trial or the community version, and see if you like it before you start requesting budget.

    Either way, setting standards for formatting and analysis on your C# projects make contributions across teams much easier and safer. I suggest you try it!

  • Deprecating Microsoft Teams Notifications

    My first “owned” open source project was a TeamCity plugin to send notifications to Microsoft Teams based on build notifications in Teamcity. It was based on a similar TeamCity plugin for Slack.

    Why? Well, out of necessity. Professionally, we were migrating to using MS Teams, and we wanted functionality to post messages when builds failed/succeeded. So I copied the Slack notifier, made the requisite changes, and it worked well enough to publish. I even went the extra mile of adding some GitHub actions to build and deploy, so that I could fix dependabot security issues quickly.

    The plugin is currently published in Jetbrains’ plugin repository.

    The Sun Always Sets

    Fast-forward 5 years: both professionally and personally I have moved towards Azure DevOps / GitHub Actions for building. Why? Well, the core of them is essentially the same, as Microsoft has melded them together. For open source projects in GitHub, it is a defacto standard, and for my lab instance of Azure DevOps, well, it makes transitioning lab work to professional recommendations much easier. But none of this uses TeamCity.

    Additionally, I have spent the majority of my professional career in C/C++/C#. Java is not incredibly different at its core, but add in Maven, Spring, and the other tag-alongs that come with TeamCity plugin development, and I was well out of my league. And while I have expanded into the various Javascript languages and frameworks, I have never had a reason to dive into Java to learn.

    So, with that, I am officially deprecating this plugin. Truthfully, I have not done much in the repository recently, so this should not be a surprise. However, I wanted to formally do this so that anyone who may want to take it over (or start over, if they so desire) can do so. I will gladly turn over ownership of the code to someone willing to spend their time to improve it.

    To those who use the plugin: I appreciate all of the support from the community, and I apologize for not doing this sooner: perhaps someone will take the reins and bring the plugin up to the state it deserves.

    Thanks!

  • Pulling metrics from Home Assistant into Prometheus

    I have setup an instance of Home Assistant as the easiest front end for interacting with my home automation setup. While I am using the Universal Devices ISY994 as the primary communication hub for my Insteon devices, Home Assistant provides a much nicer interface for my family, including a great mobile app for them to use the system.

    With my foray into monitoring, I started looking around to see if I was able to get some device metrics from Home Assistant into my Grafana Mimir instance. Turns out, there is an a Prometheus integration built right in to Home Assistant.

    Read the Manual

    Most of my blog posts are “how to” style: I find a problem that maybe I could not find an exact solution for online, and walk you through the steps. In this case, though, it was as simple as reading the configuration instructions for the Prometheus integration.

    ServiceMonitor?

    Well, almost that easy. I have been using ServiceMonitor resources within my clusters, rather than setting up explicit scrape configs. Generally, this is easier to manage, since I just install the Prometheus operator, and then create ServiceMonitor instances when I want Prometheus to scrape an endpoint.

    The Home Assistant Prometheus endpoint requires a token, however, and I did not have the desire to dig in to configuring a ServiceMonitor with an appropriate secret. For now, it is a to-do on my ever-growing list

    What can I do now?

    This integration has opened up a LOT of new alerts on my end. Home Assistant talks to many of the devices in my home, including lights and garage doors. This means I can write alerts for when lights go on or off, when the garage door goes up or down, and, probably the best, when devices are reporting low battery.

    The first alert I wrote was to alert me when my Ring Doorbell battery drops below 30%. Couple that with my Prometheus Alerts module for Magic Mirror, and I now get a display when the battery needs changed.

    What’s Next?

    I am giving back to the community. The Prometheus integration for Home Assistant does not currently report cover statuses. Covers are things like shades or, in my case, garage doors. Since I would like to be able to alert when the garage door is open, I am working on a pull request to add cover support to the Prometheus integration.

    It also means I would LOVE to get my hands on some automated shades/blinds… but that sounds really expensive.

  • Bruce Lee to the Rescue! Health Checks for .NET Worker Services

    As we start to develop more containers that are being run in Kubernetes, we encounter non-http workloads. I came across a workload that represents a non-http processor for queued events. In .NET, I used the IHostedService offerings to run a simple service in a container to do this work.

    However, when it came time to deploy to Kubernetes, I quickly realized that my standard liveness/health checks would not work for this container. I searched around, and the HealthChecks libraries are limited to ASP.NET Core. Not wanting to bloat my image, I looked for some alternatives. My Google searches led me to Bruce Lee.

    No, not Bruce Lee the actor, but Bruce Lee Harrison. Bruce published a library called TinyHealthChecks, which provides the ability to add lightweight endpoints without dragging in the entire ASP.NET Core libraries.

    While it seems a pretty simple concept, it solved an immediate need of mine with minimal effort. Additionally, there was a sample and documentation!

    Why call this out? Many developers use open source software to solve these types of problems, and I feel as though they deserve a little publicity for their efforts. So, thanks to the contributors to TinyHealthCheck, I will certainly watch this repository and contribute as I can.

  • MMM-PrometheusAlerts: Display Alerts in Magic Mirror

    I have had MagicMirror running for about a year now, and I love having it in my office. A quick glance gives my family and I a look at information that is relevant for the days ahead. As I continue my dive into Prometheus for monitoring, it occurred to me that I might be able to create a new module for displaying Prometheus Alerts.

    Current State

    Presently, my Magic Mirror configuration uses the following modules:

    Creating the Prometheus Alerts module

    In recent weeks, my experimentation with Mimir has lead me to write some alerts to keep tabs on things in my Kubernetes cluster and, well, the overall health of my systems. Currently, I have a personal Slack team with an alerts channel, and that has been working nicely. However, as I stared at my office panel, it occurred to me that there should be a way to gather these alerts and show them in Magic Mirror.

    Since Grafana Mimir is Prometheus-compatible, I should be able to use the Prometheus APIs to get alert data. A quick Google search yielded the HTTP API for Prometheus.

    With that in hand, I copied the StatusPage IO module’s code and got to work. In many ways, the Prometheus Alerts are simpler than Status Page, since it is a single collection of alerts with labels and annotations. So I stripped out some of the extra handling for Status Page Components, renamed a few things, and after some debugging, I have a pretty good MVP.

    What’s next?

    It’s pretty good, but not perfect. I started adding some issues to the GitHub repository for things like message templating and authentication, and when I get around to adding authentication to Grafana Mimir and Loki, well, I’ll probably need to update the module.

    Watch the Github repository for changes!