In a cluster of nodes, availability of resources is an important consideration. As a result, some of the resources in the cluster are made redundantly available in order to increase the reliability and availability of the cluster. When one node or resource fails, a mechanism typically exists to enable the use of a similar resource on another one of the nodes.
Current systems stack the resources of a node. In a stack, resources develop dependencies on one another. Thus, a mid-tier resource may have a dependency on a lower-tier resource, while an application or other top-level program may have a dependency on the lower-tiered resources. In the past, managing the dependencies of the resources on the stack has been problematic when failure occurs.
One solution has been to reconstruct the stack of a node where failure has occurred entirely on a different node. Even when only one resource has failed, the solution typically provided is to reconstruct the entire stack elsewhere. As a result, the failure of the resource on one node causes a delay in the system's ability to provide redundant services for the node where failure occurred. The delay is often long enough to interrupt the quality and availability of the services being provided from the node cluster.
Another solution that has been tried in the past is to use the framework to facilitate the switch-over between nodes when failure occurs. In this type of solution, the framework that interconnects the nodes of the cluster rebuilds the stack of a node where failure occurred. In such systems, the availability of the node cluster depends on the responsiveness of the framework, and the framework's ability to reconstruct the stack of resources on a different node.