For example, Japanese Patent Application Laid-Open Publication No. H02-105961 (Patent Document 1) describes a method of carrying out synchronization between processors in a multiprocessor system having “N” number of processors connected to a system bus, which is provided with a broadcasting function, via a bus interface. Specifically, each of the processors has an N-bit synchronization register in which bits correspond to the N number of processors, respectively. Each of the processors sets ‘1’ to the corresponding bit of the synchronization register when a phase of its own is completed and also sends notifications to the other processors via the system bus, and the other processors update the synchronization registers in response to the notifications. Consequently, the processors can carry out synchronization processing when all the bits of the synchronization registers are recognized to be ‘1’.
Japanese Patent Application Laid-Open Publication No. H10-091591 (Patent Document 2) describes a method of carrying out a barrier synchronization between clusters by providing an inter-cluster communication register among the plurality of clusters each of which including a plurality of processors. In the inter-cluster communication register, a cluster number is set and each representative processor present in each cluster decrements the cluster number by one, and barrier synchronization processing is completed at the point when the cluster number becomes 0.
Japanese Patent Application Laid-Open Publication No. 2000-305919 (Patent Document 3) and Japanese Patent Application Laid-Open Publication No. 2005-071109 (Patent Document 4) describe methods of carrying out a software synchronization by providing synchronization flag regions corresponding to processors, respectively, in a shared memory of a multiprocessor system. Furthermore, Japanese Patent Application Laid-Open Publication No. 2006-259821 (Patent Document 5) describes a multiprocessor system having caches in hierarchical structure with a method of carrying out synchronization by utilizing the caches of hierarchical structure. Specifically, for example, a primary cache is provided in each of CPU0 and CPU1 in a processor module, and a common secondary cache is provided in an upper level of the two primary caches; in this case, the synchronization of a plurality of threads executed by CPU0 is carried out by flag variables reserved in the primary caches, and CPU0 and CPU1 are synchronized by a flag variable reserved in the secondary cache.
In addition, “Fast barrier synchronization hardware”, C. J. Beckmann, C. D. Polychronopoulos, Proceedings of Supercomputing '90, November 1990, p. 180-189 (Non-Patent Document 1) describes a configuration comprising one P-bit register which is provided commonly to P number of processors, a detection circuit which detects the situation that all the values of the P-bit register become zero and transmits detection signals thereof to the P processors, etc. When barrier synchronization is to be carried out after parallel processing is executed by the P processors, each of the processors writes zero to the corresponding bit of the P-bit register at the point when the processing of the processor of its own is finished. When processings of all the processors are completed, detection signals are transmitted to all the processors, thereby enabling barrier synchronization. Note that, Non-Patent Document 1 also shows a configuration in which a register array comprising (P-1) number of sets of P-bit registers is provided in order to process multiple loops by parallel processing.