MapReduce is a programming paradigm for processing large datasets in a parallel manner on a large cluster of distributed processing devices. Such a large cluster of distributed processing devices may typically be known as a massively distributed computing platform (MDCP). A MapReduce job is a data processing job composed of a map phase that performs filtering and sorting on a large dataset and a reduce phase that performs a summary operation on the filtered/sorted dataset. A MapReduce system is a system that includes and controls the distributed processing devices as they execute the various tasks in parallel, and manages communications and data transfers between the distributed processing devices and other components of the system. A MapReduce system also provides redundancy and fault tolerance. One example of a MapReduce system is an open-source implementation known as the Apache Hadoop™ (Apache Software Foundation) framework. Pivotal HD™ is an example of a Hadoop™ based implementation available from EMC Corporation of Hopkinton, Mass. It is known that analyzing and optimizing MapReduce jobs are challenging tasks.