The present application relates generally to improving the throughput of a multi-server processing system. It finds particular application in conjunction with task scheduling in distributed compute systems using a map-reduce framework, and will be described with particular reference thereto. However, it is to be appreciated that the present application is also amenable to other like applications.
Map-reduce frameworks are a key technology for implementing big data applications. In these frameworks, a computational job is broken down into map and reduce tasks. The tasks are then allocated to a set of nodes (e.g., servers) so the tasks can be done in parallel. A map task processes a data block and generates a result for this block. A reduce task takes all these intermediate mapping results and combines them into the final result of the job.
A popular map-reduce framework is HADOOP® (registered TM of Apache Software Foundation). HADOOP® comprises a storage solution known as HADOOP® Distributed File System (HDFS), which is an open source implementation of the Google File System (GFS). HDFS is able to store large files across several machines, and using MapReduce, such files can be processed in a distributed fashion, moving the computation to the data, rather than the data to the computation. An increasing number of so called “big data” applications, including social network analysis, genome sequencing, and fraud detection in financial transaction data, require horizontally scalable solutions, and have demonstrated the limits of relational databases.
A HADOOP® cluster includes a NameNode (e.g. a node that keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept but does not store the data itself) and many DataNodes (e.g., a node that stores data). When a file is copied into the cluster, it is divided into blocks, for example, of 64 megabytes (MBs). Each block is stored on three or more DataNodes depending on the replication policy of the cluster, as shown in FIG. 1. Once the data is loaded, computational jobs can be executed over it. New jobs are submitted to the NameNode, where map and reduce tasks are scheduled onto the DataNodes, as shown in FIG. 2.
This is illustrated at a high level in FIG. 3. With reference thereto, NameNode 310 splits a job 330 into tasks 340. The tasks 340 are then assigned to individual DataNodes 320. There may be a multitude of DataNodes 320, and, in one embodiment, the multitude of DataNodes is in the range of a 10-1000 s of DataNodes.