When applications run on a cluster environment, any node shut down or start up introduces a brownout that causes the application throughput to drop and response time to get worse. While unexpected failures can always bring nodes down, sometimes nodes need to be brought down manually at will in order to patch or upgrade software or hardware on the nodes.
Many mission critical applications require fault tolerance and consistent, uninterrupted high performance. Even a scheduled maintenance down time, which must be performed at certain intervals, is detrimental to application performance. If the length of down time is directly linked to loss of revenue, then scheduled maintenance down time can cause a significant financial loss. Minimizing the impact of such downtime (also referred to as “brownout”) and bringing that impact down to a negligible level can help avoid loss of significant revenue in such systems.
Previous attempts to minimize the impact have included solutions that focus on fast application failover strategies. However, the rebuilding of resource mastership in a shared disk architecture where each cluster node is the master for a set of resources that need to be accessed by the applications has inherently introduced a brownout period that impacts application performance. Previous solutions have a brownout time which is directly proportional to the quantity of resources. Thus, the larger the system is memory-wise, the longer the brownout period.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.