1. Field of the Invention
Embodiments of the present invention relate to an apparatus and method for processing a large amount of data.
2. Description of the Related Art
To process a large amount of data, for example, weather satellite image data, a distributed/parallel processing technology that configures a cluster using a plurality of servers, and enables the plurality of servers to simultaneously perform an operation is required. In addition, to provide a method of processing a large amount of data, a task may need to be efficiently distributed to each server and be processed by configuring a distributor in the case of using a server on a cluster.
In general, a client may request the distributor to execute a predetermined task. The task may wait in a waiting queue of the distributor for load balancing and may be allocated to a predetermined server and be executed based on a predetermined order. When an already waiting task is present in the waiting queue and an idle server is present, the waiting task may be allocated to the idle server and be executed.
Even though a super computer may be generally used to process a large amount of data, current development in the distributed/parallel processing technology has achieved a processing rate more than a processing rate of the super computer by configuring and simultaneously driving PC clusters. For example, Google is processing search queries using millions or more of Linux-based PCs. Even a large cloud service provider such as Amazon Corporation is providing a computing service by configuring a large cluster.
In a transaction environment, such as a bank, that requires frequent updating, deleting, and inserting data, a database management system may be required.
In the case of infrequently and collectively processing large bulk data, for example, in the case of constructing a database of a large search engine, a new paradigm is required. Google Corporation is using a MapReduce framework. Hadoop is a new framework that has been developed in an open source by copying the MapRduce framework of Google Corporation. Hadoop is used for various applications since it makes a MapReduce operation easy, for example, makes it easy to generate an inverted index table of webpages. However, the above schemes may be applicable only in a case in which precedence between tasks is not considered and thus, may inefficiently operate.