A while back, I wrote a little about how the “Ship of Theseus” thought experiment has parallels to software design. What I did not realize is that I would end up running into a physical “Ship of Theseus” of my own.
Just another day
On a day where I woke up to stories of how a Crowdstrike update wreaked havoc with thousands of systems, I was overly content with my small home lab setup. No Crowdstrike installed, primarily Ubuntu nodes… Nothing to worry about, right?
Confident that I was in the clear, I continued the process of cycling my Kubernetes nodes to use Ubuntu 24.04. I have been pretty methodical about this, just to make sure I am not going to run into anything odd. Having converted my non-production cluster last week, I started work on my internal cluster. I got the control plane nodes updated, but the first agent I tried was not spinning up correctly.
Sometimes my server gets a little busy, and a quick reset helps clear some of the background work. So I reset… And it never booted again.
What Happened?
The server would boot to a certain point (right after the Thermal Calibration step), hang for about 10-15 minutes, and then report a drive array failure. Uh oh…
I dug through some logs on the Integrated Lights Out system and did some Google sleuthing on the errors I was seeing. The conclusion I came to was that the on-board drive controller went kaput. At this point, I was dead in the water. And then I remembered I had another server…
Complete Swap
The other server was much lighter on spec: a single 8 core CPU, 64 GB of RAM, and nowhere near the disk space. Not to mention, with a failed drive controller, I wasn’t getting any data off of those RAID disks.
But the servers themselves are both HP ProLiant DL380P Gen 8 servers. So I starting thinking, could I just transfer everything except the system board to the backup server?
The short answer: Yes.
I pulled all the RAM modules and installed them in the backup. I pulled both CPUs from the old server and installed them in the backup. I pulled all of the hard drives out and installed them in the backup. I even transferred both power backplanes so that I would have dual plugs.
The Moment of Truth
After all that was done, I plugged it back in and logged in to the backup server’s ILO. It started up, but pointed me to the RAID utilities, because one of the arrays needed rebuilt. A few hours later, the drives were rebuilt, and I restarted. Much to my shock, it booted up as if it were the old server.
Is it a new server? or just a new system board in the old server? All I know is, it is running again.
Now, however, I’m down on replacement parts, so I’m going to have to start thinking about either stocking up some replacements or looking in to a different lab setup.