Keeping software up-to-date in large-scale cloud infrastructure systems through software patching presents significant challenges, especially when compared to performing similar tasks in corporate computing environments or smaller-scale data centers. For example, compared to many traditional corporate data center installations, cloud infrastructure environments often include many different types of computing systems utilizing hardware from many different vendors. Further, these heterogeneous server computers may execute a large number of different operating systems—each requiring different patches—that further may execute many different types and/or versions of applications (e.g., web server applications, database server applications, etc.) that similarly require different patches. Moreover, the application of these different patches may need to be performed according to a particular ordering and updating. Further, these operating systems and/or applications may require the use of application-specific update tools. Accordingly, keeping the large numbers of applications executing across a wide variety of operating systems and server computing devices effectively patched can be tremendously difficult and typically involves a large amount of human effort.
Further complicating cloud infrastructure software patching is the need to make configuration type changes during these patching processes.
As described above, an administrator regularly needs to apply patches to an application deployment to address functional bugs, security or performance issues, and so on. Patches typically include updates to relatively small portions of binary files that were originally provided for an application via a product installer (or a previous patch), and when applied, these updates replace the previous copies on disk. However, in order for the patched deployment to continue functioning properly, it is often the case that these updated binaries are to be accompanied by changes to non-binary application artifacts which also drive the deployment.
Thus, configuration files, in-database data, and/or database schemas utilized by applications (herein referred to under the common label of “configuration data”) may all require specific actions to be performed effecting these changes, as well as changes to configurations present in other applications—such as those upon which a patched application depends or depend on it. Further complicating matters is that, depending on the nature of the application and the changes required, configuration changes (or “actions”) may need be performed while a deployed application is offline, after the deployed application is brought back online, or both.
Some approaches to performing application patching together with configuration patching include one or a combination of several techniques.
In some deployments, configuration-related changes are described as actions needing to be accomplished manually, and may be documented with an associated application patch (e.g., within an included README file) or with external reference (e.g., a support page on a vendor website).
Another approach is to utilize configuration-changing actions that may be semi-automated, meaning that tooling is provided that is to be invoked by the administrator at a defined point(s) in the patching process. The administrator may then need to provide information unique to the target deployment (e.g., file system paths, hostnames/ports, passwords) in order for this tooling to perform its job.
Another approach is to utilize configuration-changing actions that are fully automated, via the inclusion of a shell script (or similar) that a tool performing binary patching knows how to invoke.
However, these approaches have issues in terms of efficiency, repeatability, and traceability. These issues can be manageable if only one or a small number of patches need to be applied to a single product, but often a deployment is composed of a large set of discrete products and components, and a large number of patches to all those will be applied together. This greatly magnifies the problem—an administrator now has to manually orchestrate actions spread out among a large set of patches and their “manual” configuration changes, each of which may use a combination of approaches. Moreover, the documented per-patch steps can include a significant overlap in terms of common actions performed by each patch, such as a clearing of a cache that in reality may only need to be run once. Additionally, the correct, but often unstated, ordering of the actions to be performed among multiple patches is often critical to prevent issues, but administrators are typically left completely on their own, to muddle through the deployment patching process using trial and error, which can result in significant system downtime.
Accordingly, there is a tremendous need to find simplified mechanisms to allow for configuration patching of systems, including within large-scale cloud infrastructure systems.