Software-as-a-Service, or “SaaS,” is a software delivery model in which a service provider hosts a software application online (e.g., “in the cloud”) for remote access by one or more users. Examples of software applications that are commonly offered via this model include databases, enterprise resource planning (ERP) applications, document/content management systems, and so on. A virtual infrastructure that supports SaaS includes a number of virtual machines (VMs) that are each configured to run an instance of the offered software application. One aspect of managing such a virtual infrastructure involves upgrading the software application in each VM on a periodic basis to, e.g., patch bugs or add new features.
In current implementations, this software upgrade process is typically handled by an update agent resident in each VM. The update agent communicates with a central update server and searches for updates (also referred to as “patches”) that are applicable to the software application running in the VM. When the update agent finds a relevant patch on the central update server, the update agent downloads the patch and applies it within the VM.
While the foregoing approach works well for relatively small VM deployments, it can be problematic for large-scale VM deployments that are becoming increasingly common in virtual infrastructures that support SaaS. For instance, in a large-scale VM deployment, many VMs may attempt to download patches from the central update server concurrently. This significantly increases the network load on the central update server and can result in slow downloads, dropped connections, and other issues. Further, since the approach above requires each VM to download and apply a separate instance of a given patch, this approach can cause storage “bloat” due to multiple patch copies stored in backend storage, as well as host-side performance issues in scenarios where many VMs attempt to apply a resource-intensive patch at substantially the same time. Yet further, the application of a patch may fail for various reasons, such as a network or storage outage, configuration errors, etc. When a large number of VMs are in the process of applying a patch, it can be difficult to track the status of each VM in order to identify and address patch failures.