In order to improve throughput when executing a large-scale MapReduce job over a wide area, it is extremely important that no delays in communication occur. Conventionally, the following technique was developed in order to suppress a drop in throughput due to delays in communication.
For example, in an Open Science Data Cloud (OSDC), it becomes possible to execute Hadoop over a wide area by connecting data centers using dedicated lines.
Moreover, in a wide area network, a processing for Transmission Control Protocol (TCP) is sometimes the cause of delays. Therefore, the use of a protocol called UDP-based Data Transfer Protocol (UDT) that is based on User Datagram Protocol (UDP) is proposed. Furthermore, in a technique that is called Sector/Sphere, UDT is employed as the core technique in order to achieve a distributed file system and parallel data processing.
In the technique that is called CloudBLAST, the throughput is improved by using a Wide Area Network (WAN) technique in the transport layer, which is called ViNe (Virtual Network).
However, there are cases in which the utilization of the techniques such as described above is not always suitable for a MapReduce job. For example, not only is the cost of techniques such as described above high, but flexibly analyzing data that is generated daily at data centers located around the world is difficult. Furthermore, when the transport-layer protocol other than TCP is used, there is a possibility that the existing firewall framework may not be usable, so there is a security problem.
Moreover, there is no conventional art that can improve the throughput in the MapReduce processing that is executed over a wide area.