In recent years, due to the remarkable increase in performance of computer systems and the speeding up of networks, a parallel computing system which performs parallel processing of a plurality of application programs simultaneously has come up and speeding up of the processing is realized in a multi-processor system in which a plurality of computer systems are connected with networks. In such circumstances, there is required a provision of a processing system which can operate with higher reliability and without stopping the processing, even if a failure occurs in a complicated parallel computing system in which a plurality of application programs are distributed to a plurality of computer systems and processed in parallel.
Hence, as the inventions disclosed in Japanese Unexamined Patent Publication (Kokai) No.H1-217642, No.H1-217643, and No.H3-132861, there have come up the invention in which a spare element processor is provided, and when a failure occurs, the processing is continued by switching the element processor to the spare element processor, and the invention which can deal with a failure of a managing node by multiplexing the managing nodes with multiple processors.
Furthermore, as an invention for aiming at the improvement in reliability resulting from the non-stop operation in a multi-processor system, there is the invention entitled “management processing system of a server program” disclosed in Japanese Unexamined Patent Publication (Kokai) No.H3-3351519, but this is the invention in which at least two bi-space management servers, including a currently used bi-space management server and a standby bi-space management server, for managing a currently used server and a standby server, are prepared, and existing information in the currently used bi-space management server is written in a memory, and if the currently used bi-space management server is shut down, the standby bi-space management server serves as the currently used bi-space management server.
In the conventional parallel computing system, however, as shown in FIG. 1, a managing node 100 including a job scheduler 110 for allocating jobs to computing node group and a computing node managing program 120 manages all the computing node groups 130 which perform calculation and processing. As described above, in the prior art in which the managing node and the computing node group are integrated with each other and the function to continue the processing is provided only by switching the processor or the managing node to a spare processor or a spare managing node, there is a case where if a failure occurs in a part of the network, or if a failure occurs in a bigger system unit of other components, such as a trouble of a power supply, a job already queued cannot be handed over to the computing node group capable of processing it, and a part of the processing cannot be continued, so that a whole system is influenced.