The present invention relates to a distributed processing control method and a distributed processing system, and, more particularly, to a distributed processing control method and a distributed processing system for effecting the distributed parallel processing of a program such as an application program in a distributed processing system provided with a plurality of processors.
Conventionally, in calculation for the simulation of hydrodynamics or the like by solving a partial differential equation based on a technological calculation or a finite element method or the like, it has been necessary to prepare an extremely large number of arrays. Since, in this case, the amount of calculation is increased, the capacity of memory should be increased and a large memory area prepared in order to hold arrays in memory, and management becomes difficult. For the amount of calculation, a multiprocessor system is used and processing is done in parallel, so that processing speed can be improved and the apparent amount of calculation decreased.
However, even if a multiprocessor system is used and the number of CPUs is increased so that a large memory area is prepared, respective CPUs are connected to a common bus and have access to shared memory, raising the risk of memory contention.
In recent years, the distributed processing system has attracted public attention in which a plurality of CPUs is distributed and provided with local memory in order to effect distributed parallel processing without using shared memory as in the case of the multiprocessor system. This distributed processing system is constituted by connecting a plurality of processors together into a system. Where parallel execution of one application program is effected, the methods below are generally employed.
According to the first method, for parallel execution of an application program, a programmer considers beforehand conditions such as the number or location of processors, the type or amount of data to be allocated to each processor, and the type of calculation executed in each processor, and writes a program based on the considered conditions. The program is then executed so that one application program is executed in parallel on the distributed parallel processing system. In this case, an executable module may be required to be activated on the plurality of processors or data communication between processors may be required at the location of the processor designated by the programmer. Generally, an execution-time library for implementing the activation of this executable module or the communication of required data or the like is previously prepared. The programmer describes a program including a routine to access the execution-time library. Accordingly, the execution sequence of calculation or time of data communication are determined by the programmer.
The second method for parallel execution of an application program is such that the programmer uses a parallel description language for the distributed processing system to describe a program and executes an executable module generated by its language processing system in the distributed processing system. In this case, the programmer generally designates the configuration, e.g., the number of processors, of the distributed processing system for parallel processing of the application program or the distributed arrangement of data or the like upon execution in a static manner (so as to have content which does not change in a time series) or dynamically (so as to have content corresponding to a time series change) in the program. Also, in the second method, the execution-time library for distributed parallel processing is prepared beforehand similarly to the first method. A calling system therefore is often automatically embedded into the executable module by the language processing system. That is, the execution sequence of calculation or the time of data communication is automatically determined by the language processing system.
In addition, a method other than the above described methods is exemplified. This method is such that a program described for a conventional single processor is converted to an executable module for the distributed processing system by directly or indirectly using an automatic parallel conversion program or the like and the obtained executable module is executed. In this case, the programmer may describe the program for the single processor by employing a conventional program language. The programmer must consider the configuration or the like of the distributed processing system upon activation of the automatic parallel conversion program, but need not consider the execution sequence of calculation or the time of data communication or the like.
Since the respective processors of the distributed processing system are not usually connected to shared memory or use a common communication line, there is less restriction on the number of processors connected thereto, but constraint arises about the overhead in communication among the respective processors.
Accordingly, where distributed parallel processing of an application program is done in a distributed processing system provided with a plurality of processors as mentioned above, an important problem arises in that the overhead required for data communication between the respective processors must be reduced. To achieve this purpose, the following methods are mainly used:
The first method is one such that the language processing system or the automatic parallel conversion program or the like analyzes the amount of communication of data, the other party of communication, the time the data is required, or the like, therein, so that it automatically schedules and generates corresponding execution code in the executable module. That is, scheduling is static and calculation and data communication occur at execution in accordance with a previously designated order. When this method is used, processing at execution can be simplified.
As a technique corresponding to such a first method, a program is allocated to processors in accordance with priority in order to effect scheduling relative to a plurality of processors in a parallel computer (see PUPA 5-151180). This technique relates to a scheduling method for a parallel computer provided with a plurality of processors comprising the steps of storing programs executable in each of these processors, providing a plurality of queues having a lock for exclusive processing to prevent programs from being simultaneously allocated to processors, locking those required among the plurality of queues when the program is allocated to a processor, comparing programs with the highest priority among the respective queues, selecting the program with the highest priority among queues, releasing the lock on queues, and allocating the program to the processor.
However, the plurality of queues in this scheduling has no private matrixes corresponding to respective processors and simply gives priority to an executable program by applying a lock thereto if necessary.
Further, the first method of effecting scheduling in a static manner has the following disadvantages: (1) In a case where the configuration of the distributed processing system for executing an executable module is determined at execution, this method cannot handle a case in which the other party for communication or the like can only be determined upon execution. (2) In the distributed processing system, a time lag may arise between sending and receiving sides due to influences such as an error in a communication path or the dispatch of processes in an operating system. When, in the application program, sending and receiving are repeated and scheduling is static, overhead often becomes large.
The second method is such that the language processing system or the automatic parallel conversion program or the like tabulates information, including communication data for the other party in communication, in a table, and scheduling is managed dynamically referencing this table upon execution. This method can handle a case in which information in the table can be only determined upon execution. Since scheduling is dynamic, disadvantages resulting from the first method can be overcome.
In a technique corresponding to the second method, a linkage table in which transmission information is registered is referenced to control communication (see PUPA 1-251266). According to this technique, in a system for communicating a message to a plurality of processors, an identification number indicating the type of message is applied to the message to be communicated. Each processor is provided with a linkage table in which transmission information indicating whether or not the message is sent from each output port for each type of message is registered. Each processor references the corresponding transmission information in the linkage table from the identification number of the received message and controls the message communication based on the content thereof.
However, the table disclosed in this technique is a multiprocessor system which is connected in the form of a mesh, only indicates which direction the path of a message may be directed to, up, down, left, or right, and does not designate any other information.
The second method also has a disadvantage in that execution-time library processing becomes complicated and thus cannot be easily used.