1. Field
The following description relates to a workload-aware distributed data processing apparatus and method for processing large data based on hardware acceleration.
2. Description of the Related Art
With the increasing development of Internet technology, a larger amount of data has been created and distributed over the Internet. In such an environment where a large amount of data is available for use, a variety of companies, for example, portal companies, raise their competitiveness by accumulating a massive amount of data, and extracting and providing significant information to a user on request. Accordingly, various research has been conducted on establishing a large-scale cluster at a reduced cost to enable distributed data processing and distributed data parallel processing.
A distributed parallel processing programming model supports distributed parallel computation of a large amount of data stored in a cluster that is formed of a large number of nodes at a low cost. The distributed parallel processing programming model includes two steps: a “Map step” based on a map function made by a user and a “Reduce step” based on a reduce function. These two steps are performed in turn. However, as the amount of data to be processed increases, reduction of data analysis time to improve performance is more difficult.