The wrong way to reboot a server

Last Updated on November 27, 2018 by Dave Farquhar

In my day, I did plenty of hardware maintenance in the field. In fact, the only time one of my bosses ever saw me at work, I was swapping out failed memory in a server.

“How’d you know it needed to be done?” he asked.

“It told me.” That’s why I always loved HP Proliant servers. My boss looked confused at my answer but didn’t ask me to elaborate.

But not all of my field maintenance always went quite so smoothly.One day I had to do some very minor maintenance on a Dell 1U or 2U server, probably something like a Poweredge 2650. I think it was purely cosmetic, like switching out a slot cover so all of the slot covers would match when a general toured the datacenter. Someone suggested I do the work with the server still powered on, to avoid downtime.

Popping the cover off a server with the power going is never a good idea, but an order is an order and I was a contractor, and the second-most junior contractor in the office at that.

Against my better judgment, I popped the cover off the server and resolved to be very careful. My coworker, Marcus, assisted me. He was the most junior contractor in the office.

The server had a card plugged into one of its expansion slots. I remember it being a video card, but a network card would have made more sense. The cosmetic work I had to do required me to remove one screw and then replace it. No big deal. But of course, under pressure, I fumbled the screw.

I watched the screw fall from my hand onto the card, slowly, seemingly taking 100 years. Marcus and I reached out to catch it, but we were too slow. The screw struck the card. I looked at Marcus. Marcus looked at me. We expected a loud pop and some smoke and some disciplinary action.

Beep! The Dell rebooted. Marcus and I both glanced over at the screen and watched the server POST. This is a little worse than using the reboot command instead of init 6.

So then, with our uptime blown, we shut down the server and we did the work properly, with the server off. Then I replaced the cover and powered the server back up. It worked fine. Knowing that place, it’s probably still in service.

Marcus and I never told anyone about our little mishap, but we didn’t change any more slot covers either. If any of the servers were mismatched, the general was just going to have to live with it.

If you found this post informative or helpful, please share it!