Over the past decade, the MapReduce programming model has gained traction both in research and in practice. Mainstream MapReduce frameworks [Apache Hadoop; J. Dean and S. Ghemawat, Mapreduce: Simplified data processing on large clusters. OSDI'04, 2004] provide significant advantages for large-scale distributed parallel computation. In particular, MapReduce frameworks can transparently support fault-tolerance, elastic scaling, and integration with a distributed file system.
Additionally, MapReduce has attracted interest as a parallel programming model, independent of difficulties of distributed computation [C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating MapReduce for multi-core and multiprocessor systems. HPCA '07, pp. 13-24, 2007]. MapReduce has been shown to be expressive enough to express important parallel algorithms in a number of domains, while still abstracting away low-level details of parallel communication and coordination.