1. Technical Field
The present disclosure relates to cluster architectures/systems and more specifically to mitigating reduction in availability level during maintenance of nodes in a cluster.
2. Related Art
A cluster refers to a group of computers/digital processing systems (referred to as “nodes”) typically connected by fast networks, and which work together to operate as a single computer/system in providing services. Clusters are often used to improve the performance of services by having multiple nodes provide the same service, commonly to provide fault-tolerance by having redundant nodes and to provide fast response times by load balancing when servicing a large number of users. Clusters are commonly used to perform computational intensive tasks such as weather simulations, (air/rail) traffic management, etc.
Availability level of a cluster is a measure of the quality of services that can be provided by the cluster, typically, in terms of one or more of the number of softwares (offering the services) that can be executed simultaneously, the number of instances of each software/services executed, the number of service/user requests that can be handled at the same time, etc. In general, the availability level of a cluster is determined by the number of nodes that are available/ready for processing the user requests (in addition to the hardware/software capability of each node). As such, a larger number of processing nodes in a cluster generally results in a high availability level, while a smaller number of processing nodes results in a low availability level.
There is a general need to perform maintenance related tasks on various nodes of a cluster, for example, to ensure that the most recent version of the softwares/services is being executed (and offered to the users). Maintenance implies that one or more of software or hardware components is being repaired/replaced/patched/added/deleted, etc., such that the corresponding node may be required to stop (or at least substantially reduce) processing user requests.
It may accordingly be appreciated that maintenance of nodes may cause reduction in the availability level (since the number of nodes available/ready for processing the user requests is reduced). It is generally desirable that the reduction in availability level during maintenance of nodes in a cluster be mitigated (lessened).
In the drawings, like reference numbers generally indicate identical, functionally similar, and/or structurally similar elements. The drawing in which an element first appears is indicated by the leftmost digit(s) in the corresponding reference number.