1. Field of the Invention
The present invention generally relates to a synchronization communication mechanism, and more specifically to a synchronization communication control mechanism employed in a multi-processor system.
2. Description of the Related Art
In multiple multi-processor systems, there are some possibilities that high-speed shared registers called "communication registers" are used to hold shared variables for executing synchronization controls, mutual exclusion controls, or communication controls among processors. This communication register is required such that the accessing time thereof is shortened, as compared with a storage unit, and/or the throughput thereof is relatively high. Thus, the respective processors execute the communications through such a communication register, so that the data processing speeds can be increased. Since sufficient parallelism could not be substantially achieved in the multi-processor system in the synchronization control, the mutual exclusion, or the communication control, these controls may greatly affect the performance of the overall system as the parallelism is increased. As a consequence, the arrangement of the communication register would greatly affect the improvement of the performance of the multi-processor system.
A description will now be made of the barrier synchronization as one example of above-described synchronization control.
The barrier synchronization implies such a process operation that all of a plurality processors are waiting in a barrier synchronization routine until all of these processors execute this barrier synchronization routine. This barrier synchronization routine is represented in FIG. 9. It is assumed that the number of processors for executing the barrier synchronization is stored in the communication register for storing therein the word of #0 as an initial value, a non-zero value is stored in the communication register for storing the word of #1, and zero values are stored into scalar registers S0 and S1.
The below-mentioned commands should be interpreted:
FDCR S0, CR#0: after the value of the word #0 in the communication register is stored into the scalar register S0, the value of the word #0 in this communication register is decreased by 1.
BL S0, loop 1: when the value of the scalar register S0 exceeds the zero value, the process operation is branched to loop 1.
SCR S1, CR#1: the value of the scalar register S1 is stored into the word #1 of the communication register.
B looped: the process operation jumps to looped without any condition.
LCR, S2, CR#1: the value of the word #1 in the communication register is stored into the scalar register S2.
BNE S2, loop 1: if the value of the scalar register S0 is equal to any values other than a zero value, then the process operation is branched to loop 1.
When the respective processors enter into the barrier routine, the value of the word #0 in the communication register is first saved to the scalar register S0, and then is decremented. Since the number of the processors has been stored as the initial value into the word #0 of the communication register, when all of the processors enter into this barrier routine, the value of the word #0 in the communication register becomes zero. Finally, the processors other than the processor which has entered into this barrier routine jump to loop 1, and wait in this loop until the final processor enters into the routine. It can be judged as to whether the processor corresponds to the final processor by checking the value of the word #0 in the communication register, which has been read by the FDCR command. If the checked processor corresponds to the final processor, then the zero value is written into the word #1 in the communication register, which will then be announced to other processors.
In the above-described conventional multi-processor system, only one request among a plurality of communication register access requests issued from a plurality of processors is accessible to the communication register unit at the same time. This may cause large overhead in the synchronization, mutual exclusion, and communication controls using the communication registers.
In this case, after the processors except for the final processor have executed the FDCR command, the final processor repeatedly executes the LCR command within loop 1 until this final processor causes the value of the word #1 to be zero value. This repeat execution is referred to as "spin lock". Since the spin lock is performed by all of the processors which have entered into the routine, the access operations to the communication registers are concentrated, so that large access contention may occur. Because of this access contention, the FDCR command access which is executed by the processor that has entered into the barrier synchronization routine should be brought into the waiting condition. In the worst case, the waiting time may reach the time period defined by the quantity of processors which is under spin lock condition and waits for the barrier synchronization.
Referring now to a time chart shown in FIG. 10, when the above-described barrier synchronization is executed by four processors, each of these processors sequentially decrements the word #0, and thereafter each processor checks as to whether or not the operations of the other processors are completed. As a consequence, when the barrier synchronization is performed by these four processors, 8 cycles are required to accomplish the synchronization. In other words, (2.times.N) cycles are required for N processors. It should be noted that symbol "N" indicates an integer.