Distributed systems provide various computer services (including applications) to clients via a collection of nodes/servers, such as arranged as clusters. When computer software is deployed in production on distributed systems, it is difficult for enterprises and the like to keep the software up-to-date with the latest fixes and software improvements without disrupting the services offered by the distributed systems. As a result, to update distributed systems, administrators perform relatively elaborate planning aimed at updating software inventory on the nodes/servers in the distributed system collection without impacting the services offered by the collection, which is sometimes referred to as “being service-aware” with respect to updates.
Typical administrator steps include migrating/re-hosting of the services from and to each server/node in such a distributed system collection so that a node can be updated while ensuring, for example, that the fault-tolerant “critical mass” (e.g., a cluster quorum) for the overall collection holds through the updating process, and using node-centric updating software to update each node. Some administrators perform these tasks manually, while others use ad-hoc scripts to attempt to automate portions of the process. In some cases there may be an end-to-end tool for a specific type of clustered service coupled with a specific type of software update management software. In any event, such information technology (IT) processes are laborious, error-prone, require IT specialists to administer, and are expensive to maintain on an ongoing basis.
As the number of distributed systems grows, operational cost of these manual processes/scripts and the IT administrators who will need to run/maintain them becomes a significant operating expense burden for IT organizations. This is especially true for small and medium-sized businesses, and organizations that tend to have a number of branch offices without local IT experts available.