Data analytics is the science of examining raw data in order to draw conclusions about that information. Data analytics is used in many sectors to improve decision-making. As the use of data analytics continues to grow, the volume of data to be analyzed grows as well. Furthermore, data is being stored longer and longer as the value of that data becomes increasingly appreciated.
Given the ever-increasing volume of data, and the heightened reliance on techniques, such as, for example, data analytics for sifting through it, tools have been developed to sort, analyze, and manipulate the data in an effort to make it more manageable. One such tool is MapReduce. MapReduce is a software framework introduced by Google Inc. in 2004 to support distributed computing on large data sets on clusters of computers. The framework is inspired by map and reduce functions commonly used in functional programming. MapReduce uses two primitive functions, “Map” and “Reduce,” to process incoming data.
Referring now to prior art FIG. 1, an existing system 100 for processing incoming data 102 in accordance with the MapReduce framework is illustrated. As shown in FIG. 1, incoming data 102 is broken-up into chunks, with each chunk being provided to one of a plurality of identical mapper modules 104a, 104n. The mapper modules 104a, 104n then perform processing on the incoming data 102, such as, but not limited to, filtering, transformation, or aggregation. The mapper modules 104a, 104n then transmit the mapped data 108 to the reducer module(s), such as reducer module 106. Although FIG. 1 only depicts two mapper modules 104a, 104n and a single reducer module 106, those having ordinary skill in the art will appreciate that any number of mapper modules and reducer modules may be employed. For example, in one embodiment, there are fewer reducer modules than mapper modules. The one or more reducer modules (e.g., reducer module 106) then process (e.g., aggregate) the mapped data 108 to provide reduced data 110.
Thus, conventional systems (e.g., system 100) for processing incoming data in accordance with the MapReduce framework are unidirectional. That is, in conventional systems the mapper modules transmit data to the reducer module(s), but the reducer module(s) do not transmit any data back to the mapper modules. The unidirectional nature with which existing systems process data in accordance with the MapReduce framework may result in less than optimal performance in many scenarios.
Accordingly, it is desirable to provide techniques for processing incoming data using a plurality of mapper modules and reducer module(s), such that the reducer modules can transmit data back to the mapper modules.