The present invention relates generally to the field of task scheduling, and more specifically to scheduling MapReduce tasks.
MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster. A MapReduce program is composed of a Map procedure that performs filtering and sorting, and a Reduce procedure that performs a summary operation. A MapReduce system orchestrates the processing by marshalling the distributed servers, running the various tasks in parallel, managing all communications and data transfers between the various parts of the system, and providing for redundancy and fault tolerance. MapReduce systems are therefore responsible for efficiently managing workloads associated with a job.