Upon execution of parallel programs, barrier synchronization is required among processors in the shared memory multiprocessor system in which processors share the memory.
In the prior art, the process called lock procedure has been executed in order to maintain coherence for such barrier synchronization.
This process is intended to execute the exclusive access control for data and enables a plurality of processors to exclusively read variables called the lock variables. Moreover, in accordance with the state of these lock variables, completion or incompletion of the barrier synchronization has been determined. However, in order to realize such exclusive read and write operations, it is required to execute an instruction which requires a longer processing time such as the test and set instruction being prepared in the processor. In addition, such exclusive process has disadvantage that the processing time remarkably increases as the number of processors for executing the barrier synchronization increases.
The method of barrier synchronization using lock variables is described on the pages 559–561 of the non-patent document, “COMPUTER ORGANIZATION & DESIGN:THE HARDWARE/SOFTWARE INTERFACE” (by David A. Paterson/John L. Hennessy, translated by Mitsuaki Narita, published by Nikkei B P, April 1996).
Moreover, in an example of the method, a local memory is provided in each processor, a counter is stored in a shared memory of each processor, and each processor is capable of setting synchronization with this counter (patent document 1). In this method, a processor is defined as a master processor and the other processors are defined as slave processors and the master and slave processors are synchronized on the basis of values of the counter on the shared memory.
[Patent Document 1]
JP-A No. 305546/1997
The barrier synchronization process among the multiprocessors through exclusive control by lock process in the former prior art requires a longer time.
Particularly, when the barrier synchronization is often required, efficiency of parallel process will be remarkably lowered due to the influence of lock process and in some cases, high-speed process cannot be realized because of the serial execution.
A reason of such phenomenon may be concluded to rise of the problem that the lock process is always conducted as the data read and write process on the main memory and a longer time is required for execution of one test-and-set instruction.
Moreover, the former prior art also has a problem that memory access performance is lowered because the data read and write operations resulting from the lock process are conducted for only one address on the main memory.
In addition, the latter prior art has a problem that, although one processor is defined as the master processor and the other processors are defined as the slave processors and thereby synchronization between these processors can be ensured, synchronization cannot be assured among the slave processors because the counter of shared memory is utilized only for synchronization between the master processor and the slave processors, and high-speed operation cannot be realized when this method is adapted to the parallel process.