trackd_logo_dark-1
Vulnerability patching isn't sexy, and often the reasons for patches not working aren't either.

Why Patches Fail

Episode 1:  Pending Reboots

When most IT operators think about a patch failing, they – especially those with some experience and long memories – are justified in conjuring images of catastrophic disruptions and long nights repairing the damage. But in contemporary network environments, two facts are far more likely to influence patching reality:

  • Less than 2% of patches are ever rolled back, so failures, when they do rarely occur, are unlikely to bring down entire segments of the network
  • “Patch Failures” are often attributable to pedestrian reasons

In this blog series, we’ll look at some of the more mundane – albeit common and operationally-significant – reasons patches fail. Over the course of the next few months, we’ll discuss the most prevalent patching failure mechanisms that we’ve recorded among our trackd platform users:  

  • Windows Update Agent deferral windows
  • Insider and preview builds
  • Bad vendor data
  • Host later installed search failures
  • Host changing platforms over time
  • Supersedence (not really a “failure”, but can skew vulnerability metrics)

At trackd, we assess and catalog the patches attempted by our users with our platform that are unsuccessful as well as those that cause disruptions (note that the former means the patch application failed and there was no disruption, while the latter indicates the patch was successfully applied, but resulted in an operational disruption). In this inaugural version of our “Why Patches Fail” blog, we address one of the more common reasons patches are unable to be installed:  Pending Reboots.

It would be news to few that many patches require a device reboot for a patch to be completely installed, but in many cases, the device is never rebooted by either the user or the IT team. When that happens, not only does the device remain vulnerable, but subsequent patches cannot be installed on that machine until a reboot is performed.

Patching complexities can multiply quickly with pending reboots as an omnipresent variable. One example is a device with three vulnerabilities, each requiring a separate patch. The first one may install without incident, but absent a reboot, the following two will fail.

Pending reboots can also result in some strange patching use cases and operational anomalies.  Take the case where a single patch fixes multiple vulnerabilities across multiple devices. That patch may be successfully applied to Device 1 and Device 2. However, if Device 3 requires a reboot before the patch can be installed, it will fail. That will force the operations team to apply the same patch again to Device 3 (after a reboot).

Why are pending reboots such a frequent cause of patch failures, or, to put it another way, why aren’t devices routinely rebooted after a patch installation? Unsurprisingly, there are two primary reasons:

  • Users or operators are unaware a reboot is required
  • Concern for operational disruption

Although both are valid, operational disruption is the far more common reason for the pending-reboot challenge. IT operators are loath to force devices on their networks to reboot so as not to enrage users and their leadership, which could result in formal remote rebooting bans (we ran into one admin that had this occur because of a petulant executive). Thus, even frequent notifications imploring users to reboot their machines are ineffective when the threat of a forced reboot is taken off the table. The pending/forced reboot issue highlights the ongoing conflict between the need for security and the desire to minimize operational risk.

Another dimension to the forced-reboot challenge is that different operating systems handle post-patching reboots differently.  For example, macOS gives users no option to delay a reboot; if you update a MacOS device, an immediate reboot is required. Windows, on the other hand, gives users the option to forego an immediate reboot after a patch installation, and Linux, similar to Windows, doesn’t require users to reboot after an update either. These different OS reboot policies add another level of complexity to the cadence and timing of OS patching, and can result in subsequent patching failures.

The trackd platform not only highlights devices that require a reboot to indicate to admins that applying a patch might fail, but also collects patching telemetry data from devices with our agents installed. As of this writing, our platform reports that about 17% of all patching failures are the result of pending reboots, a figure that can result in countless hours of frustrating trouble-shooting and operational gymnastics to overcome.