1. Field of the Invention
The present invention relates to a distributed-memory multiprocessor system in which memories are distributed over multiple processors, which ensures increased efficiency of writing into virtual global space at redundant execution.
2. Description of the Related Art
To increase the processing speed, computers are now in progress of multiprocessor version. As a memory system for such multiprocessor systems, a distributed memory system is adopted. In such a distributed memory system, data transfers occur frequently because of the division of data among processors. In particular, in distributing the processing of a huge or multi-dimensional array over processors in the field of scientific and engineering computations by, for example, FORTRAN programs, increasing the speed of data access has been desired.
In the parallel processing of a multiprocessor system, each processor may perform separate processing or the same processing redundant execution. Actually, it seems to be unnecessary that all processors perform the same process redundantly. In order to make only one processor among the processors perform the process, however, the parallel process should be stopped and switched to the process by one processor, thereby decreasing the performance efficiency of the system. Therefore, redundant execution is generally performed in a parallel process of a multiprocessor system. Speeding up such redundant execution, which is effective for all the processors, is also very effective in speeding up the overall processing.
FIG. 1 illustrates an example of a prior art. This figure shows a space structure of storage areas in a distributed-memory multiprocessor system.
Virtual global space 2 is a virtual storage space to which each processor is allowed in common to make access. Program 1 is executed by each processor. Local spaces 3-1 to 3-4 are accessed by processors P1 to P4, respectively.
When a distributed-memory multiprocessor system processes a huge quantity of data or a multi-dimensional array of data using a scientific and engineering computing program, such data will be distributed among processors. In this case, each processor may simultaneously perform the same processing such as data initialization. It is called redundant execution that each processor performs the same processing.
In the system shown in FIG. 1, the processors P1 to P4 carry out redundant execution of the program 1. Here, A is a global variable and b is a local variable. The program 1 instructs the processors P1 to P4 to write the value of the variable b which they have in their respective local spaces 3-1 to 3-4 into the variable A in the virtual global space 2.
In the conventional system shown in FIG. 1, all the processors P1 to P4 perform the writing into the same global variable A at redundant execution. However, the writing into the virtual global space 2 needs processor-to-processor data transfers when a processor has not a write area within its memory. Thus, with the writing into the virtual global space 2 at redundant execution, processor-to-processor data transfers occur frequently, reducing the processing speed.
A drawback with the conventional system is that, since all the processors perform the same processing at redundant execution, each processor performs the writing into the global variable A simultaneously and thus communications overhead increases. This results in reduced processing efficiency at redundant execution.
It is therefore an object of the present invention to provide a multiprocessor system which permits the speed of writing into virtual global space at redundant execution to be increased.
The present invention is directed to a distributed-memory multiprocessor system in which multiple processors each having a respective individual memory are integrated with one another by a communications network so that fast redundant execution of parallel processing is permitted. A memory area of each processor is divided into a private area to which only that processor is allowed to make access and a shared area to which other processors as well as that processor are allowed to make access. The private area corresponds to local space inherent in an individual processor. The shared area forms a virtual global space together with the shared areas associated with the other processors, to which each processor is allowed in common to make access. A range of the virtual global space can be set up arbitrarily. That is, it can also be set up on the memories of processors other than processors adapted for parallel processing.
A central processing unit (CPU) in each processor calls, for example, a write processing library from the memory and executes it redundantly, serving as a right-of-write determination processing section and a write processing section. The right-of-write determination processing section includes a representative determination processing section and a variable location determination processing section.
When writing into a variable within the virtual global space at the stage of redundant execution processing in which some of multiple processors perform the same processing in parallel, the representative determination processing section of each processor makes a determination of whether or not that processor is a representative of the processors in redundant execution. The representative determination processing section informs the writing processing section of a result of that determination.
When writing into a variable within the virtual global space at redundant execution, the variable location determination processing section of each processor makes a determination of whether or not that variable is present in a memory area of processors in redundant execution. When the variable is present in one of the memory areas of the processors, the variable location determination processing section makes a further determination of whether or not that variable is present in the memory area of that processor having the variable location determination processing section. The variable location determination processing section informs the write processing section of results of the determinations.
The write processing section of each processor is responsive to a determination result of the representative determination processing section to actually write into the variable within the virtual global space when that processor is the representative. Otherwise, it does not perform the writing into the variable. Thus, data transfer is made to the memory area to be written into from only the representative processor. As a result, access competition to the communications network among the processors is avoided, thus increasing the speed of the redundant execution processing. When the variable to be written into is present in the memory of the representative processor, it is necessary only that data copy be made within that processor. That is, in this case, data transfers over the communications network is not needed.
In addition, the writing processing section of each processor is responsive to a determination result of the variable of location determination processing section to actually write into the variable, when the variable to be written into is present in the memory area of that processor. Otherwise, it does not perform the writing into the variable. When the variable to be written into is present in the memories of the processors in redundant execution, therefore, processor-to-processor data transfers become unnecessary, which further increases the speed of the redundant execution processing. When, on the other hand, the variable is present in a memory of a processor other than the processors in redundant execution, the representative processor simply makes data transfers.
According to the distributed-memory multiprocessor system of the present invention, processor-to-processor data transfers can be decreased, which would require an appreciable amount of execution time in redundant execution processing. Thus, a substantial reduction in execution time is permitted. In particular, processor-to-processor data transfers need not be made when the destination of data is among processors in redundant execution. Further, communications overhead for processor-to-processor data transfers is prevented, thus improving the processing performance of the system.