Large scale computing systems, sometimes called warehouse scale computers, are computing environments that are designed to host large scale services like cloud storage, web searching, and other data or computationally intensive applications. Large scale computing systems include multiple clusters of computing machines, each having a cluster manager. The cluster manager receives tasks from the large scale computing system and allocates the tasks among the computing machines in its cluster. Each computing machine houses a number of processors, or cores, in a number of central processing units (CPUs). For example, a computing machine may have 2-4 CPUs, and each CPU may have 4-8 processors.
Each task in a large scale computing system is a particular instance of an application, or executable binary code, in the large scale computing system. The task also includes a configuration file that specifies the machine level resources required by the application. The resources may include the number of processors, amount of memory, and disk space that is allocated to the application. Some applications may be latency sensitive, meaning that they have high quality of service standards and cannot tolerate substantial performance degradation. Examples of latency sensitive applications include Internet searches, online map functions, and e-mail services. Other applications are not latency sensitive and can tolerate greater interruptions in the quality of service. These applications are called batch applications, and some examples of batch applications include file backup, offline image processing, and video compression.
The cluster manager is responsible for allocating tasks among computing machines. Several tasks may be executed on one computing machine. However, as the number of tasks on a computing machine increases, each task may suffer performance degradation because the tasks share certain resources like memory and bus bandwidth. This may be problematic with latency sensitive tasks because the quality of service should be maintained within a certain threshold. Cluster managers tend to dedicate one computing machine to a latency sensitive task to ensure there is no performance degradation. However, such an allocation strategy ignores the possibility that other tasks may be executed on the same computing machine as the latency sensitive task without substantially disrupting its performance. The result is an under-utilization of resources because the cluster of computing machines is not operating at full capacity.