Computing systems and associated networks have revolutionized the way human beings work, play, and communicate. Nearly every aspect of our lives is affected in some way by computing systems. Computing systems are particularly adept at processing data. When processing large amounts of data (often referred to simply as “big data”) that itself might be distributed across multiple network nodes, it is often most efficient to divide data processing amongst the various network nodes. These divisions of logical work are often referred to as “vertices” in the plural, or a “vertex” in the singular. Not only does this allow for efficiencies of parallelizing, but it also allows for the data that is being processed to be closer to the processing node that is to process that portion of the data.
One common programming model for performing such parallelization is often referred to as the map-reduce programming model. In the mapping phase, data is divided by key (e.g., along a particular dimension of the data). In the reduce phase, the overall task is then divided into smaller portions that can be performed by each network node, such that the intermediate results obtained thereby can then be combined into the final result of the overall job. Many big data analytical solutions build upon the concept of map reduce.
When running a job, the speed at which the job completes will depend on the number of resources (e.g., processing nodes, cores, memory, network bandwidth, disk IO bandwidth and so forth). Some types of resources will have greater effect on the efficiency of job completion than others. For instance, processing cores and memory typically have greater impact on processing efficiency.
The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practiced.