The present application relates generally to data processing. It finds particular application in conjunction with task scheduling in distributed compute systems using a map-reduce framework, and will be described with particular reference thereto. However, it is to be appreciated that the present application is also amenable to other like applications.
Map-reduce frameworks are a key technology for implementing big data applications. In these frameworks, a computational job is broken down into map and reduce tasks. The tasks are then allocated to a set of nodes (i.e., servers) so the tasks can be done in parallel. A map task processes a data block and generates a result for this block. A reduce task takes all these intermediate mapping results and combines them into the final result of the job.
A popular map-reduce framework is HADOOP. HADOOP comprises a storage solution known as Hadoop Distributed File System (HDFS), which is an open source implementation of the Google File System (GFS). HDFS is able to store large files across several machines, and using MapReduce, such files can be processed in a distributed fashion, moving the computation to the data, rather than the data to the computation. An increasing number of so called “big data” applications, including social network analysis, genome sequencing, and fraud detection in financial transaction data, require horizontally scalable solutions, and have demonstrated the limits of relational databases.
A HADOOP cluster includes a NameNode and many DataNodes (e.g., tens to thousands). When a file is copied into the cluster, it is divided into blocks, for example, of 64 megabytes (MBs). Each block is stored on three or more DataNodes depending on the replication policy of the cluster, as shown in FIG. 1. Once the data is loaded, computational jobs can be executed over it. New jobs are submitted to the NameNode, where map and reduce tasks are scheduled onto the DataNodes, as shown in FIG. 2.
A map task processes one block and generates a result for this block, which gets written back to the storage solution. The NameNode will schedule one map task for each block of the data, and it will do so by selecting one of the three DataNodes that are storing a copy of that block to avoid moving large amounts of data over the network. A reduce task takes all these intermediate mapping results and combines them into the final result of the job.
One challenge with map-reduce frameworks, such as HADOOP, is that most frameworks assume a homogeneous cluster of nodes (i.e., that all compute nodes in the cluster have the same hardware and software configuration) and assign tasks to servers regardless of their capabilities. However, heterogeneous clusters are prevalent. As nodes fail, they are typically replaced with newer hardware. Further, research has shown benefits to heterogeneous clusters, as compared to homogeneous clusters (see, e.g., Saisanthosh Balakrishnan, Ravi Rajwar, Mike Upton, and Konrad Lai. 2005. The Impact of Performance Asymmetry in Emerging Multicore Architectures. In Proceedings of the 32nd annual international symposium on Computer Architecture (ISCA '05). IEEE Computer Society; Washington, D.C., USA, 506-517). Intuitively, more specialized hardware can better suit a variety of differing job resource profiles. By failing to account for heterogeneity, known map-reduce frameworks are not able to match jobs to the best compute nodes, consequently compromising global metrics, such as throughput or maximum delay.
Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Randy Katz, and Ion Stoica. 2008. Improving MapReduce performance in heterogeneous environments. In Proceedings of the 8th USENIX conference on Operating systems design and implementation (OSDI'08). USENIX Association, Berkeley, Calif., USA, 29-42, investigates scheduling issues in heterogeneous clusters. However, it does not characterize HADOOP jobs, but rather proposes a scheduling strategy that speculatively executes tasks redundantly for tasks that are projected to run longer than any other.
Further, while tasks belonging to the same job are very similar to each other in terms of their individual resource profile. Tasks belonging to different jobs can have very different profiles in terms of their resource requirements, such as the degree to which they heavily utilize a central processing unit (CPU), memory, disk input/output (I/O) or network I/O. Jobs may also have certain service level requirements. Known map-reduce frameworks do not efficiently schedule tasks to satisfy service level requirements while optimally utilizing available resources.
The present application provides a new and improved system and method which overcome the above-referenced problems and others.