The present invention relates to a parallel computer system that uses master and worker nodes to perform computation process by repeating parallel process and synchronization process. The present invention also relates to barrier synchronization of such a parallel computer system.
The recent remarkable progress in IT devices, such as storage and network, has resulted in the increase in the magnitude of the problem to be addressed and in the size of the data to be processed. This leads to a growing demand for computer systems having more powerful processing capability to perform processes such as, for example, Fast Furrier Transform process, process using genetic algorithm, and simulation process. However, after the operation frequency of microprocessor has peaked in 2004, the approach to improve the processing capability of the computer system has been greatly changed from the approach to high frequency application to the application to large scale parallelization. For this reason, the parallelization processing technology will be more and more important in the future computer system.
In general, the computer system for performing large-scale parallel process is configured as a computer cluster having plural servers connected via a high-speed network. The computer cluster generally includes a master node for managing the flow of the computation process, and plural worker nodes (also referred to as slave nodes) for actually performing the computation process. This is called master-worker method. The master-worker method achieves parallel process through implementation of a computation process (hereinafter referred to as task) assignment to the worker nodes from the master node, a task process on the worker nodes, and a synchronization process (called barrier synchronization) for waiting for the completion of the task process assigned to all the worker nodes. Here, the synchronization process has an important role in terms of ensuring the order of operations in a program. In general, the synchronization process is realized by a communication process (called synchronous communication) to the master node from the worker node that has completed the task process, and by checking whether the entire task process is completed by the flag management and the use of a counter on the master node (hereinafter referred to as counting process). Note that the task process is the process that the worker node should complete during the period from a synchronization point to the next synchronization point.
Examples of documents describing the processing technology of the related art include Japanese Patent Application Laid-Open Publication Nos. 2001-51966 and 2005-71280.