The present invention relates to a parallel processor system including a current processor group including current processors and a standby processor group including standby processors and, in particular, to a parallel processor system and a change-over control method of the parallel processor system in which a change-over control operation is achieved from the current processor group to the standby processor group when a failure occurs in some processors in the current processor group.
According to a conventional parallel processor system, when a failure occurs during a job in a processor or some processors, the faulty processor or processors is or are removed from the running system configuration such that the job processing is executed by the remaining processors in a degenerated state. However, in the degenerated system operation, the number of available processors is decreased and hence the job processing performance is lowered in the overall parallel processor system. In some cases, this possibly leads to a case in which some jobs cannot be conducted fully or properly.
To prevent the disadvantageous event, there has been described in the JP-A-3-132861 a technology in which a plurality of processors constituting a parallel processor system are grouped into blocks, each including several processors such that anywhere from one processor to several processors (of which the number is less than that of the processors in the pertinent group) are assigned as standby processors for the group. When a processor fails in a block, the processor is replaced with the standby processor thus prepared in advance.
Moreover, there has also been known a technology in which a parallel processor system includes a group of current processors to ordinarily execute job processes and a group of standby processors of which the number of processors is equal to that of the current processors. When a failure occurs in a processor of the current processor group, a change-over operation is conducted to substitute the standby processor group for the current processor group so as to continuously achieve the job processing.
In both of these technologies, when a processor fails, a standby processor or a standby processor group is used to continue the job process. Consequently, the number of processors responsible for execution of the processing is not decreased and hence the overall processing performance of the parallel processor system is kept unchanged.
In the first technology above, when the number of failed processors exceeds that of standby processors of the pertinent processor group, the number of processors to actually execute processing is resultantly decreased as compared with the number of processors available in the normal state. This leads to a problem of deterioration in the processing performance of the overall parallel processor system.
Moreover, in the second technology described above, even when a failure occurs in a plurality of processors, the number of available processors is not lowered. However, even when only a processor fails, the current processor group is replaced with the standby processor group. Consequently, for example, with respect to the job processing capacity, even when the job can be satisfactorily carried out by the remaining processors, there is effected a switching operation from the current processor group to the standby processor group, causing a problem that unnecessary change-over operations are frequently accomplished.