trackd_logo_dark-1
A cute puppy and vulnerability management - unlikely combination

If it Ain’t Breaking Stuff, Fix It

Author’s Note: You may be wondering why I chose a picture of an insanely adorable puppy as the image for this blog. Well, I couldn’t think of an image representative of the old adage “if it ain’t broke, don’t fix it,” and then concluded that you can never go wrong with a cute puppy.

Recently, Microsoft announced that some of its Windows Server 2022 February updates caused some issues, specifically, certain virtual machines were not re-booting as expected after the patch installation. The announcement is notable for two characteristics: 1) it’s usefulness, and 2) it’s rarity.  Certainly, remediation teams appreciate the heads up that a given patch is likely to cause an issue, and I’m sure they’d welcome such communication more often and more regularly. But, it begs the question of whether notifying remediation professionals that a patch was likely to break something would change their behavior or process materially. As the default assumption for most patches is that they’re likely to be disruptive, does the knowledge that a specific patch has a history of  disruption actually help materially reduce the cyber risk of the organization? It’s a bit like the police telling the community that an armed robbery suspect on the loose is “dangerous and shouldn’t be approached.” Most of us have already assumed as much.

We have a better idea.

Why?

The top two reasons vulnerability remediation teams give for slow patching are lack of resources and fear of disruption, seemingly different factors, but, in fact, they’re closely related. Resources (i.e. people) are required to deploy patches in test environments, test them, and then deploy them carefully to production with team members available to either reverse the update or deal with the fallout in the event of a failure or disruption. This patch deployment process is common, and is motivated by only one thing:  a constant fear of breaking stuff. If remediation teams weren’t concerned – or substantially less concerned – about patches resulting in network or select system downtime, they’d need fewer resources to patch, and their patching cadence would be substantially faster. That being the case, the top two reasons given for slow patching are essentially just one: fear of disruption.

So, if we can arm remediation teams with data about patches that have been shown to have a history of NOT being disruptive, that information has the potential to truly change behavior and address both of the factors that inhibit remediation teams from patching more aggressively. If we could assure a remediation professional that a given patch won’t cause a disruption, they could conceivably skip the extensive testing, and would no longer need team members on-call to respond to a potential major disruption. In theory, the patch could be designated to auto-update, re-deploying scarce remediation resources to patches known to be problematic, all the while reducing the MTTR (Mean Time to Remediate) and quickly closing the window for the bad guys to exploit the vulnerability. Multiply this approach by hundreds or thousands of vulnerabilities, and the cyber risk of the organization decreases exponentially.

To be sure, the amount of data needed to convince a remediation team to trust that a patch can be auto-updated with little or no precaution is likely to be considerable, and will vary from organization to organization. But as we like to say here at trackd, this is a human problem, not a technical one. Less than 2% of patches are rolled back in today’s vulnerability remediation environment, but the extreme caution triggered by patching’s tainted history (that 2% figure was much higher 15 years ago) still informs vulnerability remediation policy today. As is the case in just about any field, data is the key to overcoming perception, and data is exactly what we’re working on.