This specification relates to resource management in data centers.
Data centers are often used by service providers to deliver Internet based services to users. A data center, such as a server farm, typically contains hundreds or thousands of processing devices. Within the data centers the processing devices are arranged (or grouped) in clusters. Each cluster is configured to perform a distributed task in a parallel or pipelined fashion.
Modern processors used in data centers have several cores within a CPU chip. Among these cores, some are more active than others at any given instance of time due to a variety of reasons. Furthermore, the core that is more active changes over time. Each CPU can tolerate a finite number of cycles before becoming unreliable. Additionally, the operating temperature of a CPU also contributes the CPU's wear. Thus, very active cores operating at high temperatures wear out faster and fail earlier than cores that are less active or that operate at lower temperatures. When any core fails, the whole CPU needs to be replaced despite the fact that majority of cores are still functional. This incurs a significant replacement cost and reduces the useful lifetime of a CPU chip. Because there are typically many thousands of processing devices in a data center, device management can be costly.