If you’re looking for the least popular people in any given company, the people who push patches probably rank high on that list. I pushed patches for a living for nearly a decade, so I know. I was good at it though. Let’s talk about patch management best practices.
Who I am
Who am I, and why am I talking about patch management best practices? I got my start pushing patches in the late 1990s, working on Y2K projects. In 2003, when Microsoft introduced a regular patch release schedule and called it Patch Tuesday, patching fell to me because I was the junior sysadmin and the senior guys didn’t want it, and besides, I already had experience.
And for almost five years, I pushed patches on an Air Force contract. Having a clean vulnerability scan just before every Patch Tuesday was about the only way I could manage the tangle of requirements from various government agencies and the local management. So I was the rare guy who had all my systems up to date at the beginning of almost every month. For items that would be difficult to patch in a month, I had to submit a Plan of Action and Milestones, which was what the rest of the world calls a risk acceptance, and someone with a rank of Colonel or GS-15 had to sign off on it. I would generally get six months for Java updates, and 1-2 months when Microsoft issued a bad patch.
I estimate that in the first decade of this century, I remediated between 800,000 and 900,000 vulnerabilities.
Yes, I patched everything. And do you know how many support cases I opened regarding software updates and security patches? Zero. That’s why I decided to write about patch management best practices.
Have a test environment
One of the reasons we were able to patch everything and live to tell about it was because we had a test environment. Our test environment wasn’t an exact replica of production, and that was a problem, but most places have no test environment at all, because it’s too expensive.
In a perfect world, you should have a test environment configured identically to your production environment, just on a smaller scale. Same hardware, same operating systems and configuration. The reason for this is that the idea that a PC is a PC is a PC is a myth. Microsoft problems in particular tend to happen only on specific combinations of hardware. So if your test environment differs from production, you might not see the issue in testing, then get a rash of calls when a duff update breaks the keyboards on certain HP systems.
If you can’t have a test environment, get a subset of your production environment that gets updates first. Make sure you have a representative sample of machines. By that I mean at least one example of every variation of machine that exists in production. Be sure to include virtual machines, running on each hypervisor. Since different hypervisors emulate different hardware, a test on a VM running on Virtualbox isn’t a perfect substitute for one running on Vmware ESXi.
Conducting valid tests is one of the key patch management best practices.
Roll updates in phases
Roll your updates in phases, starting with your test environment. Make sure all of your updates take, and do some kind of user acceptance testing to ensure no ill effects. If you have enough time, you may want to take a week in testing before rolling to production.
One of my former employers rolled in three phases: test, then a larger pilot group, then the rest of production. Good security and patch management best practices means getting the updates down without bricking systems the business needs to make money.
Apply updates using an automated patching process
I first used an automated patching process in 1999, using a product that no longer exists, but it worked better than patching by hand. When I first started on that Air Force contract, I didn’t have an automated patching tool to use, so I scripted the updates. My success rate doing that was around 70 percent. That meant I had to do a lot of scrambling to get that last 30 percent by Patch Tuesday.
Most tools are somewhere between 85 percent and 95 percent effective. I think that’s terrible by modern software standards. But it’s quite a bit better than the 70 percent I got by just running the files via a script, which just automated the process of patching by hand.
Even with a tool, I estimate you need one full time employee for every 1,000-2,000 systems if your goal is fixing everything like I used to do. Which leads me into the most controversial thing I have to say about patch management best practices.
Don’t try to fix everything
Even though I came up through the school of thought that required you to fix everything, I don’t subscribe to that anymore. I worked in a large company that demanded exactly that, and all they got was a toxic work environment. Trying to fix everything with the technology we have available today while keeping within a reasonable IT budget is going to fail in anything resembling a medium or large-sized business.
Instead, attack your vulnerabilities from two sides and meet in the middle.
Deploy the new updates
Every month, deploy the new updates from Microsoft, Adobe, and your Linux vendors such as Red Hat. Consider skipping any updates known to be causing trouble and deploy those next month. Read what other people are saying on various tech support forums about the month’s batch of patches. Deploying the current month’s updates keeps you ahead of the curve. It keeps you from introducing new vulnerabilities into your network. I call this stopping the bleeding.
Deploy the updates addressing any existing high-severity vulnerabilities
Unfortunately the most popular deployment tool, Microsoft SCCM, has no concept of severity. So you may end up having to do a lot of manual picking and choosing to deploy those updates. A tool like Flexera Software Vulnerability Manager makes this much easier, since it does have the concept of severity.
Every security and patching tool measures severity a bit differently, so the criteria for high, medium and low will vary slightly. But there’s enough overlap that this will get you most of what your security team wants. It’s not uncommon to find that deploying the updates to fix all of our high-severity vulnerabilities ends up fixing half of your total vulnerabilities, or more.
Even if you’re facing millions of vulnerabilities, don’t fret. You can fix millions of vulnerabilities pretty fast by focusing on the few hundred thousand that are most likely to cause your company harm. You’ll find for most high-severity vulnerabilities you fix, several lower-severity ones come along for the ride.
I’ve never seen a large company try to fix everything at once and succeed. They’re much more likely to get paralyzed and just get worse. Don’t let lack of perfection get in the way of being good.
Getting to 100% is more work than getting to 90%
Back when I deployed patches, I had a process that worked over 90 percent of the time and required about five minutes of real work to prep. I would fire up my patching software, hit ctrl-A, right-click, select install missing patches, then go to lunch. When I got back from lunch, I would look for errors, then do it again.
If I just did that on Friday, then came in on Monday and rebooted systems, which typically took me about six hours to do, I would have successfully deployed about 90% of my updates.
Most months, I spent somewhere between 8-10 hours looking for updates that failed or only worked partially and manually fixing those. If a security guy ever tells you that you don’t patch, he or she is probably seeing this residual work I used to do by hand and couldn’t automate.
This is where the big savings in prioritization comes in. A high-severity update in active use in breaches right now is worth the manual work. A low-severity update with no reliable exploits associated with it isn’t worth that effort.
Cheat where you can
If you can cheat by deploying patches to a master image, then deploy the image out to multiple machines from there, do it. It’s less failure-prone than patching multiple times, and if any of the patches give you trouble and require manual intervention to get down, then you only have to do it once.
If you can prove a system is no longer in use, get it decommissioned. This saves the effort of testing and patching and fixing something if it breaks.
Don’t just patch your servers
Some companies try to save time and money by just scanning and patching their servers, since that’s where the data lives. Of course you should patch your servers, but only patching your servers is like locking your bedroom door at night but leaving the front door unlocked or even open. Most security incidents start on user workstations. If you deploy your updates to the workstations, you cut down on attackers’ options.
What I found when I moved from the infrastructure world into security was that when workstations get patched effectively, security incidents plummet.
Don’t cheap out
Sometimes companies will use WSUS just because it’s free. Yes, it’s free and does an OK job of deploying Microsoft updates. But most breaches involve third-party software, which WSUS won’t help with. The average cost of a data breach is $3.86 million, and patching is one of the very few things you can do to prevent a breach. If patching software that works better than 90 percent of the time and makes it easy to deploy just the high-severity updates costs a fraction of that, it’s a justifiable expense.
Look into SCCM but don’t stop there. Also look into the offerings available from companies like Ivanti, Flexera, Solarwinds, and IBM. If they automate more of the work by patching at a higher success rate and they cover the non-Microsoft software in your environment (including Linux), that’s functionality that’s worth paying a premium to get.
A skilled carpenter doesn’t buy important tools from Harbor Freight. Using Harbor Freight-quality tools for patching has no place in patch management best practices.
Use the risk acceptance process
Finally, the last thing I want to say about patch management best practices is how to use the risk acceptance process. It’s not just about getting out of patching, or getting out of migrating that ancient Windows NT 3.51 server nobody likes to talk about. When you can’t update Java, look into the Java Deployment Ruleset, which allows you to build whitelists and route Java apps to specific JREs. It doesn’t completely mitigate the problem but it goes a long way. Partner with security to deploy the Java Deployment Ruleset, then get a risk acceptance in place for those old JREs.
Work with your security and networking team to build a list of business-critical web sites that use Flash, then whitelist Flash content on those sites and block it from the rest. Now you’ve slammed the door on one of the easiest ways to get into corporate networks, and updating Flash becomes much less of an emergency every month. If you’re protecting Flash with a whitelist, a VP is probably willing to accept the risk associated with a 70% success rate in updating Flash.
Those are two ideas. You’ll come up with others.