Data sets requiring analysis have greatly increased in size over the years, and computing systems and strategies have been designed to try and keep up with the increase in data set size. However, present systems continue to lag in performance behind the pace at which data set sizes increase.
MapReduce techniques as discussed, e.g., in U.S. Patent Application Publication No. 2008/0086442 and/or Dean et al., “MapReduce: Simplified Data Processing on Large Clusters,” OSDI 2004, provide one way to approach large data set processing. However, such existing techniques could be made faster and more efficient.
Furthermore, specific applications/algorithms, when implemented with a MapReduce programming model, may have synchronization points (barriers) within a workflow in which one stage cannot begin until another stage is completely finished processing. This may also cause inefficiencies.