In computer networks and cloud computing environments (also referred to herein as “computing environments”), a plurality of resource consumers share a plurality of resources. Examples of a resource consumer include one or more jobs, files, data caches, databases, sets of data, applications, and/or sets of operations. Examples of a resource include one or more processors, servers, data storages, virtual machines, and/or platforms.
The resources may be associated with different failure domains. A failure domain includes a particular set of resources that are affected by a single point of failure. If a problem occurs with the single point of failure, then each resource in the failure domain also fails.
One performance objective of a computing environment, including a set of resources, is to maintain a particular level of resiliency. Resiliency is the ability of the computing environment to maintain an acceptable level of service subsequent to one or more resource failures.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.