This invention relates to an efficient approach for utilization of job processing in a shared pool of resources. More specifically, the invention relates to assessing the virtual and physical topology of the shared resources and processing jobs responsive to the combined topology.
MapReduce is a framework for processing highly distributable problems across huge datasets using a large number of computer nodes. In instances where all of the nodes use the same hardware or a grid if the nodes use different hardware, the framework is commonly referred to as a cluster. Computational processing can occur on data stored either in a filesystem or a database. Specifically, a master node receives a job input and partitions the job into smaller sub-jobs, which are distributed to the other nodes in the cluster or grid. In one embodiment, the nodes in the cluster or grid are arranged in a hierarchy, and the sub-jobs may be further partitioned and distributed. The nodes responsible for processing the sub-jobs return processed data to the master node. More specifically, the processed data is collected and combined by the master node to form an output. Accordingly, MapReduce is an algorithmic technique for the distributed processing of large amounts of data associated with a job.
As described above, MapReduce enables distribution of data processing across a network of nodes. Although there is a convenience factor associated with use MapReduce, there is performance issues associated with current uses of MapReduce for processing jobs.