This disclosure is related to the field of network communication technologies in general, and data merging of distributed computing in particular.
As the internet technologies develop at a very fast pace, processing the tremendous amount of information that exists on the internet has become a difficult problem. If a single computer is used for processing a large amount of data, the computer is required to have very powerful processing abilities and port capabilities. However, not only are high performance single-unit computers very expensive, the processing methods that reply on a single computer are also very limited. In order to increase the processing abilities of computational task systems, distributed computing is used to divide a large problem into many small problems which are distributed to many computers. Distributed computing is a method of computer processing in which different parts of a program are run simultaneously on two or more computers that are communicating with each other over a network. Distributed computing can exploit idle resources of many interconnected computers on the Internet to process the large amount of information on the Internet.
In distributed computing, a project data that requires lots of computation is split into many small pieces, which are separately computed by many different computers. These different computers, called distributed nodes, send the distributed computational results back to a central computer. Upon uploading the distributed computational results, the central computer merges the results to obtain the final data or a solution. Accordingly, distributed computing usually consists of several components, including components for task splitting, task computation and result merging. For task splitting, different methods are used to split the task depending on the nature of the application. The goal is usually to achieve even task division while keeping each task unrelated with each other at the same time. After task splitting, tasks are assigned to different distributed nodes. For task computation, each distributed node performs corresponding distributed computation to obtain a computational result of a distributed sub-task. For result merging, processing results of different distributed nodes are merged in a server computer to obtain a final processing result.
In the current distributed computing technology, the processing methods for different applications are different. To process an application, the user must consider the details such as concurrent processing, fault-tolerance and load balance in distributed computing. As a result, the coding may become very complicated. Take an analysis of an access path for the web sites visited by Internet users as an example. This task requires writing the codes for task splitting, access path analysis and result merging. Within these codes, the program is also required to process data synchronization, concurrency, fault-tolerance and load. The coding for distributed computing is usually application-specific. For a new application, all these processes may have to be repeated, together with the consideration of the problems such as task splitting, result merging and data synchronization. This creates a burdensome environment for implementing distributed computing.