Large-scale data processing needs are met using distributed and parallel computing data centers. MapReduce is a programming model used by these data centers for various applications such as indexing, mining, social networking, recommendation services, and advertising backends. MapReduce includes a map phase and a reduce phase. In the map phase, a dataset is partitioned into several smaller chunks that are assigned to individual nodes for partial computation of results. The results are computed at each node in the form of key-value pairs on the original data set. During the reduce phase, key-value pairs generated as a result of the map phase are aggregated based on the key-value pairs. Within a data center, a centralized master program orchestrates the assignment and scheduling of jobs, each of which includes several map and reduce tasks. The assignment and scheduling functions determine which tasks are assigned to a particular node. The master keeps track of the progress of individual tasks in order to determine where to assign and when to schedule individual tasks. Since several MapReduce jobs are often performed in parallel, scheduling problems often arise due to varying workloads and job size distributions.