1. Field of the Invention
The present invention relates to a management system and method for a parallel computer system, and more particularly to techniques suitable for application to a management system for a parallel computer system capable of controlling maintenance and management of a plurality of nodes constituting the parallel computer system even if main processors at the nodes are not running.
2. Description of the Related Art
Many maintenance and management systems and methods have been proposed for a computer system having a plurality of computers.
A console display for concentrated maintenance and management of a plurality of UNIX machines is disclosed in JP-A-6-214763. With this technique, a load increase in maintenance and management can be prevented and a single console is used for a plurality of UNIX machines.
In summary of this technique, a center console at a server for concentrated maintenance and management of a plurality of UNIX machines is provided with a destination table which stores command destinations for each type of maintenance and management, and each command is executed by using the destination table.
A console switching control method for a composite computer system constituted by a plurality of computers is disclosed in JP-A-5-120247 which can avoid erroneous maintenance and management of each computer connected to a single system console.
In summary of this technique, service processors of a plurality of computers are connected to a switch system to which the system console is connected. The system console sequentially switches computers having output messages by using identifiers discriminating between computers to thereby share the single system console by a plurality of computers. Maintenance and management of each computer by the system console is permitted when an identifier used for switching to each computer coincides with an identifier of the switched computer itself connected to the system console.
A concentrated message management system for concentrically managing massages from a plurality of computers constituting a distributed processing system is disclosed in JP-A-5-20281.
In summary of this technique, a concentrated management node is selected from a plurality of computers connected to the network, and the concentrated management node concentrically manages operation state messages at other nodes.
The present inventors have studied the above conventional techniques and found the following problems.
The above-described conventional management system for the computer system constituted by a plurality of computers utilizes the functions provided by network software running on each computer under management. Therefore, if the computer under management is not running, if the operating system is not running, or if the network software is not active, maintenance and management are impossible.
In the case of the single console display for concentrated maintenance and management for a plurality of UNIX machines, it is required for each computer to run a UNIX operation system. If this operating system is not running, the concentrated maintenance and management by the console display are impossible.
In the case of the switch system at the system console for a composite computer system, the switch system is connected between the system console and each service processor. Therefore, specific hardware of the switching system is required.
In the case of the concentrated message management system, messages are transmitted to the concentrated management node from a plurality of computers. Therefore, if the system of this concentrated management node is shut down, concentrated message management is impossible. Furthermore, since each message is transmitted via nodes of the network, the state of each node cannot be managed by the concentrated management node if the operating system of each node and the network are not active.