Computing systems and associated networks have revolutionized the way human beings work, play, and communicate. Nearly every aspect of our lives is affected in some way by computing systems. Computing systems are particularly adept at processing data. When processing large amounts of data (often referred to simply as “big data”) that itself might be distributed across multiple network nodes, it is often most efficient to divide data processing amongst various network nodes. For instance, those various network nodes may be processing nodes within a cloud computing environment.
To divide data processing amongst the various processing nodes, the code is compiled into segments called vertices, with each vertex to be assigned for processing on a corresponding processing node. Not only does this allow for efficiencies of parallelizing, but it also allows for the data that is being processed to be closer to the processing node that is to process that portion of the data.
One common programming model for performing such parallelization is often referred to as the map-reduce programming model. In the mapping phase, data is divided by key (e.g., along a particular dimension of the data). In the reduce phase, the overall task is then divided into smaller portions that can be performed by each network node, such that the intermediate results obtained thereby can then be combined into the final result of the overall job. Many big data analytical solutions build upon the concept of map reduce.
The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practiced.