In a typical cloud data center environment, there is a large collection of interconnected servers that provide computing and/or storage capacity to run various applications. For example, a data center may comprise a facility that hosts applications and services for subscribers, i.e., customers of data center. The data center may, for example, host all of the infrastructure equipment, such as networking and storage systems, redundant power supplies, and environmental controls. In a typical data center, clusters of storage systems and application servers are interconnected via high-speed switch fabric provided by one or more tiers of physical network switches and routers. More sophisticated data centers provide infrastructure spread throughout the world with subscriber support equipment located in various physical hosting facilities.
Within a data center or other massively distributed complex system, faults and failures are not equivalent. Faults may allow for the continued operation of components of the system that rely on the faulted component. However, faults may develop into and tend to indicate pending failure of one or more components of the system, which deleteriously affects the operation of the system.