Tag: powershell

  • A big mistake and a bit of bad luck…

    In the Home Lab, things were going good. Perhaps a little too good. A bonehead mistake on my part and hardware failure combined to make another ridiculous weekend. I am beginning to think this blog is becoming “Matt messed up again.”

    Permissions are a dangerous thing

    I wanted to install the Azure DevOps agent on my hypervisor to allow me to automate and schedule provisioning of new machines. That would allow the provisioning to occur overnight and be overall less impactful. And it is always a bonus when things just take care of themselves.

    I installed the agent, but it would not start. It was complaining that it need permissions to basically the entire drive where it was installed. Before really researching or thinking to much about it, I set about giving the service group access to the root of the drive.

    Now, in retrospect, I could have opened the share on my laptop (\\machinename\c$), right clicked in the blank area, and chose Properties from there, which would have got me into the security menu. I did not realize that at the time, and I used the Set-ACL Powershell command.

    What I did not realize that Set-ACL causes a full replacement, it is not additive. So, while I thought I was adding permissions for a group, what I was really doing was REMOVING EVERYONE ELSE’S PRIVILEDGES from the drive, and replacing it with group access. I realized my error when I simply had no access to the C: drive…

    I thought I got it back…

    After panicking a bit, I realized that what I had added wasn’t a user, but a group. I was able to get into the Group Policy editor for the server and add the Domain Admins group to that service group, which got my user account access. From there, I started rebuilding permissions on the C drive. Things were looking up.

    I was wrong…

    Then, I decided it would be a good idea to install Windows updates on the server and reboot. That was a huge mistake. The server got into a boot loop, where it would boot, attempt to do the updates, fail, and reboot, starting the process over again. It got worse…

    I stopped the server completely during one of the boot cycles for a hard shutdown/restart. When the server posted again, the post said, well, basically, that the cache module in the server was no longer there, so it shut off access to my logical drives…. All of them.

    What does that mean, exactly? Long story short, my HP Gen8 server has a Smart Array that had a 1GB write cache card in it. That card is, as best I can tell, dead. However, there was a 512MB write cache card in my backup server. I tried a swap, and it was not recognized either. So, potentially, the cache port itself is dead. Either way, my drives were gone.

    Now what?

    I was pretty much out of options. While my data was pretty much safe and secure on the Synology, all of my VMs were down for the count. My only real option was to see if I could get the server to re-mount the drives without the cache and start rebuilding.

    I setup the drives in the same configuration I had previously. I have two 146GB drives and 2 1TB drives, so I paired them up into two RAID1 arrays. I caught a break: the machine recognized the previous drives and I did not lose any data. Now, the C drive was, well, toast: I believe my Set-ACL snafu just put that windows install out of commission. But all of my VMs were on the D drive.

    So I re-installed Hyper V Server 2019 on the server and got to work attempting to import and start VMs. Once I got connected to the Server, I was able to re-import all of my Ubuntu VMs, which are my RKE2 nodes. They started up, and everything was good to go.

    There was a catch…

    Not everything came back. Specifically, ALL of my Windows VMs would not boot. They imported fine, but when it came time to boot, I got a “File not found” exception. I honestly have no idea why. I even had a backup of my Domain Controller, taken using Active Business Backup on the Synology. I was able to restore it, however, it would not start, throwing the same error.

    My shot in the dark is the way the machines were built: I had provisioned the Windows machines manually, while the Ubuntu machines use Packer. I’m wondering if the export/import process that is part of the Packer process may have moved some vital files that I lost because those actions do not occur with a manual provision.

    At this point, I’m rebuilding my windows machines (domain controllers and SQL servers). Once that is done, I will spend some time experimenting on a few test machines to make sure my backups are working… I suppose that’s what disaster recovery tests are for.

  • Managing Hyper-V VM Startup Times with .Net Minimal APIs

    In a previous post, I had a to-do list that included managing my Hyper-V VMs so that they did not all start at once. I realized today that I never explained what I was able to do or post the code for my solution. So today, you get both.

    And, for the impatient among you, my repository for this API is on Github.

    Managing VMs with Powershell

    My plan of attack was something like this:

    • Organize Virtual Machines into “Startup Groups” which can be used to set Automatic Start Delays
    • Using the group and an offset within the group, calculate a start delay and set that value on the VM.

    Powershell’s Hyper-V Module is a powerful and pretty easy way to interact with the Hyper-V services on a particular machine. The module itself had all of the functionality I needed to implement my plan. This included the ability to modify the Notes of a VM. I am storing JSON in the Notes field to denote the start group and offset within the group. Powershell has the built in JSON conversion necessary to make quick work of retrieving this data from the VM’s Notes field and converting it into an object.

    Creating the API

    For the API, this seemed an appropriate time to try out the Minimal APIs in ASP.NET Core 6. Minimal APIs are Microsoft’s approach to building APIs fast, without all the boilerplate code that sometimes comes with .Net projects. For this project, as I only had three endpoints (and maybe some test/debug ones) and a few services, so it seemed a good candidate.

    Without getting into the details, I was pleased with the approach, although scaling this type of approach requires implementing some standards that, in the end, would have you re-designing the notion of Controllers as it exists in a typical API project. So, while it is great for small, agile APIs, if you expect your API to grow, stick with the Controller-structured APIs.

    Hosting the API

    The server I am using as a Hyper-V hypervisor is running a version of Windows Hyper-V server, which means it has a limited feature set that does not include Internet Information Systems (IIS). Even if it did, I want to keep the hypervisor focused on running VMs. However, in order to manage the VMs, the easiest path is to put the API on the hypervisor.

    With that in mind, I went about configuring this API to run within a Windows Service. That allowed me to ensure the API was running through standard service management (instead of as a console application) but still avoid the need for a heavy IIS install.

    I installed the service using one of the methods described in How to: Install and uninstall Windows services.  However, for proper access, the service needs to run as a user with Powershell access and rights to modify the VMs.  

    I created a new domain user and granted it the ability to perform a service log on via local security policy.  See Enable Service Logon for details.

    Prepping the VMs

    The API does not, at the moment, pre-populate the Notes field with JSON settings. So I went through my VM List and added the following JSON snippet:

    {
        "startGroup": 0,
        "delayOffset": 0
    }

    I chose a startGroup value based on the VM’s importance (Domain Controllers first, then data servers, then Kubernetes nodes, etc), and then used the delayOffset to further stagger the start times.

    All this for an API call

    Once each VM has the initialization data, I made a call to /vm/refreshdelay and viola! The AutomaticStartDelay gets set based on its startGroup and delayOffset.

    There is more to do (see my to-do list in my previous post for other next steps), but since I do not typically provision many machines, this one usually falls to a lower spot on the priority list. So, well, I apologize in advance if you do not see more work on this for another six months.

  • An Impromptu Home Lab Disaster Recovery Session

    It has been a rough 90 days for my home lab. We have had a few unexpected power outages which took everything down. And, for the unexpected outages, things came back up. Over the weekend, I was doing some electrical work outside, wiring up outlets and lighting. Being safety conscious, I killed the power to the breaker I was tying into inside, not realize it was the same breaker that the server was on. My internal dialog went something like this:

    • “Turn off breaker Basement 2”
    • ** clicks breaker **
    • ** Hears abrupt stop of server fans **
    • Expletive….

    When trying to recover from that last sequence, I ran into a number of issues.

    • I’m CRUSHING that server when it comes back up: having 20 VMs attempting to start simultaneously is causing a lot of resource contention.
    • I had to run fsck manually on a few machines to get them back up and running.
    • Even after getting the machines running, ETCD was broken on two of my four clusters.

    Fixing my Resource Contention Issues

    I should have done this from the start, but all of my VMs had their Automatic Start Action set to Restart if previously running. That’s great in theory, but, in practice, starting 20 or so VMs on the same hypervisor is not recommended.

    Part of Hyper-V’s Automatic Start Action panel is an Automatic Startup Delay. In Powershell, it is the AutomaticStartDelay property on the VirtualMachine object (what’s returned from a Get-VM call). My ultimate goal is to set that property to stagger start my VMs. And I could have manually done that and been done in a few minutes. But, how do I manage that when I spin up new machines? And can I store some information on the VM to reset that value as I play around with how long each VM needs to start up?

    Groups and Offsets

    All of my VMs can be grouped based on importance. And it would have been easy enough to start 2-3 VMs in group 1, wait a few minutes, then do group 2, etc. But I wanted to be able to assign offsets within the groups to better address contention. In an ideal world, the machines would come up sequentially to a point, and then 2 or 3 at a time after the main VMs have started. So I created a very simple JSON object to track this:

    {
      "startGroup": 1,
      "delayOffset": 120
    }

    There is a free-text Notes field on the VirtualMachine object, so I used that to set a startGroup and delayOffset for each of my VMs. Using a string of Powershell commands, I was able to get a tabular output of my custom properties:

    get-vm | Select Name, State, AutomaticStartDelay, @{n='ASDMin';e={$_.AutomaticStartDelay / 60}}, @{n='startGroup';e= {(ConvertFrom-Json $_.Notes).startGroup}}, @{n='delayOffset';e= {(ConvertFrom-Json $_.Notes).delayOffset}} | Sort-Object AutomaticStartDelay | format-table
    • Get-VM – Get a list of all the VMs on the machine
    • Select Name, ... – The Select statement (alias to Select-Object) pulls values form the object. There are two calculated properties that pull values from the Notes field as a JSON object.
    • Sort-Object – Sort the list by the AutomaticStartDelay property
    • Format-Table – Format the response as a table.

    At that point, the VM had its startGroup and delayOffset, but how can I set the AutomaticStartDelay based on those? More Powershell!!

    get-vm | Select Name, State, AutomaticStartDelay, @{n='startGroup';e= {(ConvertFrom-Json $_.Notes).startGroup}}, @{n='delayOffset';e= {(ConvertFrom-Json $_.Notes).delayOffset}} |? {$_.startGroup -gt 0} | % { set-vm -name $_.name -AutomaticStartDelay ((($_.startGroup - 1) * 480) + $_.delayOffset) }

    The first two commands are the same as the above, but after that:

    • ? {$_.startGroup -gt 0} – Use Where-Object (? alias) to select VMs with a startGroup value
    • % { set-vm -name ... }ForEach-Object (% alias) in that group, set the AutomaticStartDelay.

    In the command above, I hard-coded the AutomaticStartDelay to the following formula:

    ((startGroup - 1) * 480) + delayOffset

    With this formula, the server will wait 4 minutes between groups, and add a delay within the group should I choose. As an example, my domain controllers carry the following values:

    # Primary DC
    {
      "startGroup": 1,
      "delayOffset": 0
    }
    # Secondary DC
    {
      "startGroup": 1,
      "delayOffset": 120
    }

    The calculated delay for my domain controllers is 0 and 120 seconds, respectively. The next group won’t start until 480 seconds (4 minutes), which gives my DCs 4 minutes on their own to boot up.

    Now, there will most likely be some tuning involved in this process, which is where my complexity becomes helpful: say I can boot 2-3 machines every 3 minutes… I can just re-run the population command with a new formula.

    Did I over-engineer this? Probably. But the point is, use AutomaticStartDelay if you are running a lot of VMs on a Hypervisor.

    Restoring ETCD

    Call it fate, but that last power outage ended up causing ETCD issues in two of my servers. I had to run fsck manually on a few of my servers to repair the file system. Even when the servers were up and running, two of my clusters had problems with their ETCD services.

    In the past, my solution to this was “nuke the cluster and rebuild,” but I am trying to be a better Kubernetes administrator, so this time, I took the opportunity to actually read the troubleshooting documentation that Rancher provides.

    Unfortunately, I could not get past “step one:” ETCD was not running. Knowing that it was most likely a corruption of some kind and that I had a relatively up-to-date ETCD snapshot, I did not burn too much time before going to the restore.

    rke etcd snapshot-restore --name snapshot_name_etcd --config cluster.yml

    That command worked like a charm, and my clusters we back up and running.

    To Do List

    I have a few things on my to-do list following this adventure:

    1. Move ETCD snapshots off of the VMs and onto the SAN. I would have had a lot of trouble bringing ETCD back up if those snapshots were not available because the node they were on went down.
    2. Update my Packer provisioning scripts to include writing the startup configuration to the VM notes.
    3. Build an API wrapper that I can run on the server to manage the notes field.

    I am somewhat interested in testing how the AutomaticStartDelay changes will affect my server boot time. However, I am planning on doing that on a weekend morning during a planned maintenance, not on a random Thursday afternoon.

  • Packer.io : Making excess too easy

    As I was chatting with a former colleague the other day, I realized that I have been doing some pretty diverse work as part of my home lab. A quick scan of my posts in this category reveal a host of topics ranging from home automation to Python monitoring to Kubernetes administration. One of my colleague’s questions was something to the effect of “How do you have time to do all of this?”

    As I thought about it for a minute, I realized that all of my Kubernetes research would not have been possible if I had not first taken the opportunity to automate the process of provisioning Hyper-V VMs. In my Kubernetes experimentation, I have easily provisioned 35-40 Ubuntu VMs, and then promptly broken two-thirds of them through experimentation. Thinking about taking the time to install Ubuntu and provision it before I can start work, well, that would have been a non-starter.

    It started with a build…

    In my corporate career, we have been moving more towards Azure DevOps and away from TeamCity. To date, I am impressed with Azure DevOps. Pipelines-as-code appeals to my inner geek, and not having to maintain a server and build agents has its perks. I had visions of migrating from TeamCity to Azure DevOps, hoping I could take advantage of Microsoft’s generosity with small teams. Alas, Azure DevOps is free for small teams ONLY if you self host your build agents, which meant a small amount of machine maintenance.. I wanted to be able to self-host agents with the same software that Microsoft uses for their Github Actions/Azure DevOps agents. After reading through the Github Virtual Environments repository, I determined it was time to learn Packer.

    The build agents for Github/Azure Devops are provisioned using Packer. My initial hope was that I would just be able to clone that repository, run packer, and viola! It’s never that easy. The Packer projects in that repository are designed to provision VM images that run in Azure, not on Hyper-V. Provisioning Hyper-V machines is possible through Packer, but requires different template files and some tweaking of the provisioning scripts. Without getting too much into the weeds, Packer uses different builders for Azure and Hyper-V. So I had to grab all the provisioning scripts I wanted from the template files in the Virtual Environments repository, but configure a builder for Hyper-V. Thankfully, Nick Charlton provided a great starting point for automating Ubuntu 20.04 installs with Packer. From there, I was off to the races.

    Enabling my excess

    Through probably 40 hours of trial and error, I got to the point where I was building my own build agents and hooking them up to my Azure DevOps account. It should be noted that fully provisioning a build agent takes six to eight hours, so most of that 40 hours was “fire and forget.” With that success, I started to think: “Could I provision simpler Ubuntu servers and use those to experiment with Kubernetes?”

    The answer, in short, is “Of course!” I went about creating some Powershell scripts and Packer templates so that I could provision various levels of Ubuntu servers. I have shared those scripts, along with my build agent provisioning scripts, in my provisioning-projects repository on Github. With those scripts, I was off to the races, provisioning new machines at will. It is remarkable the risks you will take in a lab environment, knowing that you are only 20-30 minutes away from a clean machine should you mess something up.

    A note on IP management

    If you dig into the repository above, you may notice some scripts and code around provisioning a MAC address from a “Unifi IP Manager.” I created a small API wrapper that utilizes the Unifi Controller APIs to create clients with fixed IP addresses. The API generates a random, but valid, MAC Address for Hyper-V, then uses the Unifi Controller API to assign a fixed IP.

    That project isn’t quite ready for public consumption, but if you are interested, drop a comment on this post.

  • Who says the command line can’t be pretty?

    The computer, in many ways, is my digital office space. Just like that fern in your office, you need to tend to your digital space. What better way to water your digital fern than to revamp the look and feel of your command line?

    I extolled the virtues of the command line in my Windows Terminal post, and today, as I was catching up on my “Hanselman reading,” I came across an update to his “My Ultimate PowerShell prompt with Oh My Posh and the Windows Terminal” post that included new updates to make my command line shine.

    What’s New?

    Oh-My-Posh v3

    What started as a prompt theme engine for Powershell has grown into a theme engine for multiple shells, including ZSH and Bash. The v3 documentation was all I needed to upgrade from v2 and modify the powerline segments to personalize my prompt.

    Nerd Fonts

    That’s right, Nerd Fonts. Nerd Fonts are “iconic fonts” which build hundreds of popular icons into the font for use in the command line. As I was using Cascadia Code PL (Cascadia Code with Powerline glyphs), it felt only right to upgrade to the Caskaydia Code NF Nerd font.

    It should be noted that the Oh-My-Posh prompts are configured as part of your Windows Powershell prompt, meaning they show up in any window running Powershell. For me, this is three applications: Microsoft Windows Terminal, Visual Studio Code, and the Powershell Core command line application. It is important to set the font family correctly in all of these places.

    Microsoft Windows Terminal

    Follow Oh-My-Posh’s instructions for setting the default font face in Windows Terminal.

    Visual Studio Code

    For Visual Studio, you need to change the fontFamily property of the integrated terminal. The easiest way to do this is to open the settings JSON (Ctrl-Shift-p and search for Open Settings (JSON)) and make sure you have the following line:

    {
      "terminal.integrated.fontFamily": "CaskaydiaCove NF"
    }

    When I was editing my Oh-my-Posh profile, I realized that it might be helpful to be able to see the icons I was using in the prompt, so I also changed my editor font.

    {
      "editor.fontFamily": "'CaskaydiaCove NF', Consolas, 'Courier New', monospace"
    }

    You can use the Nerd Font cheat sheet to search for icons to use and copy/paste the icon value into your profile.

    Powershell Application

    With Windows Terminal, I rarely use the Windows Powershell application, but it soothes my digital OCD to have it working. To change that window’s font, right click in the window’s title and select Properties. Go to the Font tab, and choose CaskaydiaCove NF (or your installed Nerd Font) from the list. This will only change the properties for the current window. If you want to change the font for any new windows, right click in the window’s title bar and select Defaults, then follow the same steps to set the default font.

    Terminal Icons

    This one is fun. In the screenshot above, notice the icons next to different file types. This is accomplished with the Terminal-Icons Powershell Module. First, install the module using the following Powershell Command:

    Install-Module -Name Terminal-Icons -Repository PSGallery

    Then, add the Import-Module command to your Powershell Profile:

    Import-Module -Name Terminal-Icons

    Too Much?

    It could be said that spending about an hour installing and configuring my machine prompts is, well, a bit much. However, as I mentioned above, sometimes you need to refresh your digital work space.