Many computer fields highly depending on communication have the demands of improving the communication performance, for example, parallel computation, data centers and distributed file systems, etc.
Parallel computation is an evolving computer field, and the parallel computation refers to executing the same task on a plurality of processors (the task is decomposed and specially adjusted) to obtain a result more quickly. The parallel computation is based on the following facts: a process of solving a problem can be generally divided into a plurality of smaller tasks, and these smaller tasks can be executed simultaneously by coordination.
A parallel computer executes a parallel algorithm. The parallel algorithm is decomposed into many small parts of tasks, these small parts of tasks are executed on many different processing devices and are finally summarized together to obtain a data processing result. In the specification, the numerous processing devices for executing the small parts of a parallel program are called “computing nodes”, and the parallel computer is composed of the computing nodes and other processing nodes (e.g., input/output nodes and service nodes).
In order to execute the parallel program, the nodes of the parallel computer often need to carry out a lot of data communication. Typically, the common mode of the nodes for carrying out the data communication is message passing.
As one of the main components of the parallel program, the communication influences the performance of the parallel program to a larger extent.
Currently, MPI (“message passing interface (Message Passing Interface)”) is a fact standard for communication of the computing nodes executing the parallel program on the parallel computer. The MPI is an existing parallel communication library and is a module of computer program instructions which carried out data communication on the parallel computer. The MPI is released by an MPI forum, and the MPI forum is an open group and has many organization representatives that define and maintain the MPI standard.
A communication function of the parallel program mainly includes a point-to-point communication function and a collective communication function. The point-to-point communication function can completely execute the data exchange between two parallel processes, including blocking communication (MPI_Send, MPI_Recv) and non-blocking communication (MPI_Isend, MPI_Irecv) and the like; and the collective communication function can realize the data exchange of a plurality of processes (process groups), including MPI_Barrier, MPI_Bcast, MPI_Allgather, MPI_Alltoall or the like. In this specification, the term “point-to-point communication function” and the term “point-to-point operation” are interchangeable, and the “collective communication function” are interchangeable with the term “collective operation” (also referred to as “collective action”, “group communication” and the like in the field sometimes).
Inventors have found that the expansion of the parallel computing capability of the parallel computer is often limited by the data communication performance. With the expansion of the scale, the proportion of communication time in total execution time increases, which is mainly caused by the communication time of the collective communication function, for example, the computation time only occupies 39.2% in 2048 processes, and to improve the expandability of the application, the communication performance of the system, particularly the collective communication performance, must be improved.
The inventors have also found that in an operating system mirror broadcast application of a cloud platform of the data center and a file backup application of the distributed file system, the communication performance also needs to be improved.