MapReduce is a programming model for processing large data sets. The MapReduce programming model comprises a map procedure that performs filtering and sorting and a reduce procedure that perform a summary operation. Typically, a MapReduce job is performed on clusters of computers, such as clusters of storage servers in a distributed file system. For example, a file system may have clusters of storage servers, such that each cluster includes a master node and one or more worker nodes. During the “map” phase, a master node may receive a job request to perform an operation using a file or data located in the memory. The master node may divide the job into smaller sub-jobs, and may distribute the sub-jobs to the worker nodes. The worker nodes may process the sub-jobs in parallel and may pass the results back to the master node. During the “reduce” phase, the master node may collect the results of the sub-jobs and combine the results by performing a summary operation to form the output for the job request. MapReduce is useful in a wide range of applications, including distributed pattern-based searching, distributed sorting, web link-graph reversal, machine learning, and statistical machine translation.