The invention relates to a method and system for monitoring and managing a diverse hardware platform from a central control node in a distributed, parallel, heterogeneous computing environment. More particularly, the invention relates to an intermediary program for use with a hardware monitor program in a distributed, parallel, heterogeneous computing environment, such as the IBM RISC System/6000 Scalable POWERparallel Systems (SP), that emulates network frame hardware by making any diverse node hardware appear like any other node hardware that is part of a network frame in a heterogeneous computing environment.
A parallel, distributed computing system, such as the IBM RS/6000 SP, is a computer network consisting of linked mini-computers or personal computers directed to the sharing of information created and maintained at each of the mini-computers or workstations within the network. In the IBM RS/6000 SP, a central control node (the xe2x80x9cControl Workstationxe2x80x9d or xe2x80x9cCWSxe2x80x9d) serves as the monitor and control node for the entire system. The CWS is the single point of control for both the hardware and software of the entire distributed system, that is, respectively, the network frame hardware and network frame software. All of the distributed systems (nodes) are physically connected to the CWS by means of one or more communications channels. The hardware functions of the nodes are monitored and managed directly from the CWS through this physical link. A distributed node can be powered on, powered off, reset, etc. from the central CWS. In addition, hardware aspects such as temperature, voltage, cooling fan speed, etc. are monitored for all nodes and maintained on the CWS.
A software program which runs continually on the CWS is responsible for this monitoring and management of the node hardware and provides an interface for administrator interaction with the nodes hardware. On the RS/6000 SP this program is the hardware monitor. Software functions can also be centrally managed from the CWS. For instance, the nodes can be installed, rebooted, and shutdown from the one central control node. Administrative functions such as starting up applications, retrieving node-specific information, and the management of system-wide high-availability software is also performed from the central control node.
These administrative functions are embodied in client requests which are sent directly to the network frame, who, in turn, sends the request to the intended diverse node. These client requests are translated into frame commands by the hardware monitor. These commands include power on/off, reset, etc. The responses, packaged by the hardware monitor, are state data returned by the network frame in the form of a frame packet. State data includes power status, temperature, voltages, fan speeds, etc.
One problem arising in a distributed computing environment is the collection of nodes that may be heterogeneous in nature, including differing hardware types and may run a variety of different operating systems, enablement and application software. The hardware monitor is responsible for the hardware monitoring and management of each of the different types of hardware nodes. The functions to be performed on each node are generally the same, but due to the diverse nature of the hardware, the protocols used to manage and monitor these different types of nodes are different. Implementing the different protocol interfaces within the hardware monitor program would complicate that program and its internal structure. Also, adding support for a new node hardware type would tend to be complicated. Since changes would affect the code path for all other node types, implementing new node support would risk affecting already stable function for existing node hardware types. Hence, a method and system is needed that can emulate network frame hardware and introduce hardware monitor support for a diverse hardware so that diverse node hardware appears to function as any other node hardware in a network frame.
There is therefore a need for a method and system for monitoring and managing a diverse hardware platform from a central control node in a distributed, parallel, heterogeneous computing system.
There is also a need for a method and system for expanding the single point of control in a distributed, parallel, heterogeneous computing system to diverse hardware in a more xe2x80x9crisk-aversexe2x80x9d fashion.
There is another need for a method and system that defines and maintains a single protocol which a hardware monitor can use to monitor and manage a diverse hardware platform.
There is also yet another need for a method and system for expanding the notion of a single point of control to heterogeneous hardware without affecting the reliability, availability, scaleability, and performance of established software supporting that hardware.
There is also a need for a method and system to insulate the hardware monitor from large and potentially risky changes and segregate the vast majority of new code into the present invention.
There is furthermore a need for a method and system whose application effectively emulates the function provided by network frame hardware.
There is also yet another need for a method and system that makes any diverse hardware appear to function as network frame hardware in a distributed, parallel, heterogeneous computing environment.
An exemplary embodiment of the present invention is a method for facilitating communication between a hardware monitor program for monitoring and management of node hardware and for providing an interface for administrative interaction with the node hardware in a network frame and diverse node hardware. The method includes receiving a request encoded in a hardware monitor protocol from the hardware monitor. An intermediary program decodes the request encoded in the hardware monitor protocol. The intermediary program encodes the request using a diverse hardware protocol of the intended diverse node hardware. The intermediary program sends the request encoded in diverse hardware protocol to the intended diverse node hardware.
Once a request is received, a response encoded in a diverse hardware protocol is sent by the intended diverse node hardware to the intermediary program. The intermediary program receives and decodes the response. The intermediary program encodes the response using a hardware monitor protocol. The intermediary program sends the response encoded in the hardware monitor protocol to the hardware monitor.
Another exemplary embodiment of the present invention is a system having a diverse hardware platform monitored and managed from a central control node in a computing environment. The computing environment utilizes a distributed, parallel, heterogeneous computing environment having a network frame and network frame hardware. The network frame hardware utilizes at least one central control node. The central control node executes both a hardware monitor program for monitoring and managing node hardware and for providing an interface for administrative interaction with the node hardware and an intermediary program for facilitating communication between the hardware monitor and the plurality of diverse node hardware. The intermediary program emulates network frame hardware by facilitating communication between the hardware monitor and diverse node hardware.
These and other features and advantages of the present invention will be apparent from the following brief description of the drawings, detailed description, and appended claims and drawings.