An effective framework involves distributed parallel computing, which operates to disperse processing tasks across multiple processors operating on one or more computing devices such that parallel processing may be executed simultaneously. Important implementations of large scale distributed parallel computing systems are MapReduce by Google®, Dryad by Microsoft®, and the open source Hadoop® MapReduce implementation. Google® is a registered trademark of Google Inc. Microsoft® is a registered trademark of the Microsoft Corporation in the United States, other countries, or both. Hadoop® is a registered trademark of the Apache Software Foundation.
Generally, MapReduce has emerged as a dominant paradigm for processing large datasets in parallel on compute clusters. As an open source implementation, Hadoop has become popular in a short time for its success in a variety of applications, such as social network mining, log processing, video and image analysis, search indexing, recommendation systems, etc. In many scenarios, long batch jobs and short interactive queries are submitted to the same MapReduce cluster, sharing limited common computing resources with different performance goals. It has thus been recognized that, in order to meet these imposed challenges, an efficient scheduler can be helpful if not critical in providing a desired quality of service for the MapReduce cluster.
Generally, it has been recognized that improving data locality for MapReduce jobs can be critical for the performance of large-scale Hadoop clusters, embodying the principle of moving computation close to data for big data platforms. Scheduling tasks in the vicinity of stored data can significantly diminish network traffic, which is crucial for system stability and efficiency. Though issues of data locality have been investigated extensively for map tasks, most conventional schedulers ignore data locality for reduce tasks when fetching the intermediate data, causing performance degradation.
Such a problem of reducing the fetching cost for reduce tasks has been identified recently. However, solutions proposed in that connection are exclusively based on a greedy approach, relying on the intuition to place reduce tasks to the slots that are closest to the majority of the already generated intermediate data. The consequence is that, in presence of job arrivals and departures, assigning the reduce tasks of the current job to the nodes with the lowest fetching cost can prevent a subsequent job with even better match of data locality from being launched on the already taken slots.