Currently, parallel processing is executed for high performance computing.
“Parallel processing” in computers is for making one task run in a plurality of processor cores. This is a technique for improving a processing efficiency by using the fact that a task for solving a problem can in most cases be divided into smaller tasks.
To execute parallel processing for one task, “synchronization” for making processes of processor cores wait for one another is requested. A barrier in parallel processing is one synchronization method, by which execution of a thread or a process is suspended at a particular point in a source code in order to wait for other threads or processes, and the execution is resumed when all the other threads have arrived at a barrier.
FIGS. 1A to 1C illustrate conventional synchronization within a barrier bank.
FIG. 1A illustrates an initial state, FIG. 1B illustrates the synchronization being performed, and FIG. 1C illustrates completion of the synchronization.
In the barrier bank 1001, which is a minimum range of barrier synchronization, processor cores 1002-a (a=1 to 4) and a barrier synchronization mechanism 1003 are provided. The barrier bank 1001 is provided within a central processing unit (CPU).
Each of the processor cores 1002-a has a simultaneous multi-threading function, and has hardware threads 1004-a-b (b=1, 2) for executing a thread.
In FIG. 1, one processor core provides two hardware threads. Software (such as an operating system (OS)) that issues a software thread handles each of the hardware threads 1004-a-b as a logical CPU (virtual CPU).
The barrier synchronization mechanism 1003 has a barrier state 1005 and a bitmap group 1006.
The barrier state 1005 is information used to control the barrier synchronization. A value of the barrier state 1005 is “0” or “1”.
The bitmap group 1006 includes a plurality of bitmaps. Each of the bitmaps is information indicating that the hardware thread 1004 has arrived at a barrier synchronization point. The same number of bitmaps are prepared as the number of hardware threads 1004 within the barrier bank 1001, and they are respectively allocated to the hardware threads 1004. Namely, each of the bitmaps indicates that an allocated hardware thread 1004 has arrived at a barrier synchronization point. A value of each of the bitmaps is “0” or “1”.
In the initial state illustrated in FIG. 1A, the barrier state 1005 and all the bitmaps are “0”.
Assume that each of the hardware threads 1004 executes a thread and the hardware threads 1004-1-1, 1004-3-1 and 1004-4-2 have arrived at a barrier synchronization point in FIG. 1B.
The hardware threads 1004-1-1, 1004-3-1 and 1004-4-2 read the barrier state 1005, and respectively write a value obtained by inverting the read value to the respectively allocated bitmaps. Here, the hardware threads 1004-1-1, 1004-3-1 and 1004-4-2 respectively write “1” to the bitmaps respectively allocated to the local hardware threads.
Hereinafter, the hardware threads 1004-1-2, 1004-2-1, 1004-2-2, 1004-3-2 and 1004-4-1 arrive at the barrier synchronization point, and respectively write “1” to the bitmaps allocated to the local hardware threads.
In FIG. 1C, the barrier synchronization mechanism 1003 changes the barrier state 1005 to “1” upon detecting that all the bitmaps are written to “1”.
Each of the hardware threads 1004 verifies that the barrier state 1005 has become equal to the written value (namely, “1”) of the bitmap, and resumes the process up to the next barrier synchronization point.
Conventionally, the barrier synchronization is implemented by an centralized barrier synchronization management mechanism.
The centralized barrier synchronization management mechanism has a problem such that, as the number of threads for which barrier synchronization is performed increases, the degree of complexity of the barrier synchronization management mechanism grows, the degree of realization of the mechanism decreases, and the number of threads—requested to be simultaneously processed grows, leading to a longer processing time.    [Patent Document 1] Japanese Laid-open Patent Publication No. 2006-259821    [Patent Document 2] Japanese National Publication of International Patent Application No. 2004-529414    [Non-patent Document 1] “Evaluation of Barrier Synchronization Mechanism Considering Hierarchical Processor Grouping”, Kaito YAMADA, et al., IEICE Technical Report, Vol. 108, No. 28, ICD2008-20, pp. 19-24, May, 2008.