Just turning on automatic updates is one of those bumper sticker-style solutions to IT problems that won’t go away. It sounds really good, and of course it would be cheap. And since nobody’s doing it, it sounds like a new idea. As someone who’s been working in this space more than 20 years, I can tell you there’s a reason nobody does it. And it’s a good reason. It’s even a reason most proponents of bumper sticker-style solutions love to cite as a reason not to do something: unintended consequences.
While allowing systems to auto update seems like a cheap way to solve a difficult IT problem, the unintended consequences can be devastating. There are reasons to do automatic updates in limited circumstances, but it’s easy to cause bigger problems than you solve.
Why you should listen to this rando pontificating about patching
You probably don’t know me from Adam, so let me tell you my qualifications. I’m a sysadmin turned vulnerability management professional. I cut my teeth on Y2K updates, and then after Bill Gates invented Patch Tuesday, I spent 8 years of my career pushing patches almost exclusively. From 2005 to 2009, I pushed patches to fix 800,000 vulnerabilities, with a success rate of 100 percent. I had to have a clean Tenable scan at the end of each Patch Tuesday cycle. If I did have any findings, I had to have a risk acceptance for it. And I had to ask for the risk acceptance early in the cycle, not the end.
This wasn’t my team. The other sysadmins had other responsibilities. I was the one pushing the updates. I did something that some people believe is impossible. It’s not impossible, but it’s difficult.
In 2009, I shifted to security roles and dabbled in vulnerability management prior to shifting to pure vulnerability management roles in 2013. I’ve specialized in vulnerability management ever since, including stints working at Qualys, a vulnerability management vendor, and deepwatch, a managed security provider. Between my positions at Qualys and deepwatch, I would estimate about 70 companies shared their struggles with patch and vulnerability management with me.
If there was an easy solution, one of those 70 companies would have found it.
The problem with automatic updates
People’s beliefs about updates tend to fall into two extremes: Updates break stuff, or updates never break stuff. The truth falls somewhere in between. The perception that they break stuff comes from enough people having a duff update render their home system unbootable. It’s rare, but it happens often enough that people remember the one time that broke more than they remember the countless updates that worked fine.
The other problem is Microsoft has cut way back on the quality control on their updates. And there’s a perception that Apple updates don’t break stuff since they control both the hardware and the software, but that’s no panacea either. It can take 6-12 months to sort out every unintended consequence of a major Apple OS update.
Just because the system booted doesn’t mean the update didn’t break something. For example, in April 2021, the combination of Apple’s Big Sur update and Google’s Chrome updates broke Mimecast at my workplace. Pushing untested updates in the name of security broke another security tool.
I found myself in the awkward position of being a vulnerability management guy telling IT to slow down. And yes, I sometimes inserted a couple of extra words between “slow” and “down.”
For me, the difference in automatic updates vs managed updates isn’t theoretical. A toxic mix of untested updates devastated my team. But guess what? From 2005 to 2009, when I was patching with a 100% success rate, guess how many outages I caused? Zero. Here’s how.
How I fixed 800,000 vulnerabilities with no outages
I personally fixed 800,000 vulnerabilities without causing a single outage. For that matter, my updates were only credibly accused of causing an outage once. Just once. Testing later exonerated that accused update. I was more frequently accused of rebooting during the day, though investigations exonerated me in that case too. I had scripts for properly rebooting systems, and I built in failsafes to allow me plenty of time to abort.
But the secret of fixing huge numbers of vulnerabilities shouldn’t be much of a secret. We tested and we paid attention. If a patch caused any problems in the lab, I wrote up a risk acceptance for 30-60 days so we could work with the vendor to sort things out.
I also watched the forums where sysadmins hang out to see what they were talking about. The last site of that type still standing that I know of is Patchmanagement.org, but that’s sufficient. Twitter is also a good place to watch.
If there were rumblings of a duff update, we had a decision to make. If there was evidence of active exploitation going on, we deployed it unless we could replicate the problem in our test environment. But in the absence of evidence the threat was imminent, I wrote up a risk acceptance. Qualys and Tenable have threat intelligence baked into their tools now. Use it. If Qualys doesn’t have Active Attacks or Malware RTIs associated with the vulnerability, or Tenable doesn’t give it a VPR score over 7, you have time to get it right.
You’ll probably find there’s only about one hair-on-fire emergency a year, if that.
What to do if you don’t have a test environment
Most companies aren’t willing to spend money on a test environment. Shareholders don’t see any value in it, so any proposal to build one is likely to get shot down quickly. And even the test environment I had when working for the government was flawed. The hardware wasn’t at all like the production environment, and production was physical machines while the test environment was mostly virtual. It was built on the premise that a PC is a PC is a PC, which isn’t true. Some problems only manifest themselves if certain combinations of drivers are present. But an imperfect test environment is way better than no test environent. And you can do almost OK without a test environment.
With no test environment, you need to have a pool of users volunteer. Find the most tech savvy users you can across each line of business and solicit their help. You may need to give them some incentive to participate. Do what it takes. Their feedback is invaluable and it’s cheaper than building a good lab.
Deploy the updates to this user acceptance testing (UAT) environment and wait a week. If there are no reports of problems, proceed to deploy to production. If there are reports of problems, either from your UAT team or your outside sources, write up a risk acceptance. Include the plan for remediating once a revised update or suitable workaround surfaces.
How I fixed 800,000 vulnerabilities at scale
My 100% success rate was a much bigger accomplishment than fixing 800,000 vulnerabilities. I could have fixed 600,000 vulnerabilities with minimal effort. Getting that last 20 percent was the hard part. If you just deploy the updates and reboot, you’ll get a success rate of between 75 and 90 percent, depending on the tool you use. That’s grunt work. The skill comes in investigating the patches that didn’t succeed, finding the root cause, and getting those patches down. Fixing a failed update requires skill similar to that required to conduct a forensic investigation. Knowing what to do with ETL files helps.
My standard advice is to deploy all the new updates in your environment as they come out (minus any troublemakers), plus the 10 most common updates in your environment that have Qualys Active Attacks or Malware RTIs or a Tenable VPR score over 7 or an EPSS score over 67. This is a sustainable approach to your tech debt that can succeed over the long haul. It’s no panacea, but it’s a plan your IT team can live with, especially if you give them hard data that proves their accomplishments every year. They’ll rack up a nice pinball score of closed vulnerabilities, and while I’m way more interested in MTTR and average vulnerability age than in counts, that pinball score will impress HR. Play the game. Help them get raises.
What about the 20 percent of vulnerabilities that fail? Circle back on them when the threat intelligence says they’re your biggest problem. I was able to fix vulnerabilities at a 100% success rate because I had fewer than 1,000 systems to deal with. If you don’t dedicate one FTE to patching every 1,000 systems in your environment, you won’t get the level of success I did. And most organizations aren’t willing to budget that.
A trick that works better than automatic updates
This argument doesn’t make me popular among security professionals, but I’ve been saying for a while that the best thing most companies could do for their security posture is to find a few involuntarily retired senior system administrators, hire them, and have them work on patching exclusively. Look for people with 20 years of experience and who made close to $100,000 a year.
Most companies can find a million dollars for a fancy new tool like XDR, but they can’t find a million dollars for labor. Having worked for Qualys and heard security directors complain about spending millions of dollars on tools and not having much to show for it, I’ve come to the conclusion that most companies need a patch deployment tool (I like Ivanti’s tools much better than SCCM), a vulnerability management tool like Qualys or Tenable, a tool like Nucleus to route the vulnerability data where it needs to go, and more labor.
When automatic updates make sense
When it comes to managed updates vs automatic updates, I will concede there is a time when automatic updates make sense. It can make sense to configure your web browsers to auto update, with the caveat that you need to allow more than one browser in your environment. It’s fairly unlikely that both Chrome and Firefox are going to release duff updates in the same month, so if Chrome destroys itself, your users can fall back on Firefox to access what they need.
Third party web browser updates are especially difficult to push out via SCCM, so letting them manage themselves can make sense. Then SCCM becomes the fallback if the browsers fail to update themselves.
This works best for companies with a lot of work-from-home employees. If you have tens of thousands of users in office buildings, make sure you have enough bandwidth. Browser updates can be pretty large. That’s one other reason not to allow unmanaged updates. Having each system download gigabytes of patches at will chews up a lot of bandwidth.
“My system at home updates just fine!”
The other argument I’ve heard regarding automatic updates vs managed updates, even from security professionals, is that their home systems update better than the systems at work. But they’re usually just using Automatic Updates to gauge that. Guess what? SCCM says it updates just fine too, but Qualys and Tenable often say otherwise.
I’ve run the Qualys and Tenable agents on some of my home systems for years. They aren’t succeeding 100 percent of the time. If you load a Qualys or Tenable agent on four or five home systems and measure their actual performance over time, you’ll find like I did that the automatic updates are succeeding around 80 or 90 percent of the time, which isn’t far from SCCM’s success rate.
And if you’re not convinced, check out the command line tool winget. Then tell me your home system updates just fine.