1. Technical Field
The invention relates generally to monitoring a problem condition in a communications system, and more particularly, to a communications protocol implementation that monitors the communications system for the presence of the problem condition.
2. Background Art
A systems network architecture (SNA) network provides high availability for mainframe systems, such as a zSeries eServer offered by International Business Machines Corp. of Armonk, N.Y. (IBM). Operating systems, such as IBM's z/OS exploit features of the SNA network to provide high performance for applications executing in a mainframe system. However, workloads processed by these mainframe systems are increasingly being driven by client requests flowing over an internet protocol (IP) network infrastructure. As a result, a lot of emphasis has been placed on ensuring that the z/OS IP network infrastructure delivers the same high availability attributes as those provided by the SNA network.
The use of a dynamic virtual IP address (DVIPA) is an important virtualization technology that assists in providing high availability z/OS solutions using IP networks in a cluster system (sysplex) environment. DVIPA provides an ability to separate the association of an IP address with a physical network adapter interface. To this extent, DVIPA can be viewed as a virtual destination that is not bound to a particular system/network interface, and therefore is not bound to any failure of any particular system/network interface. This results in a highly flexible configuration that provides the high availability on which many z/OS solutions depend.
DVIPA can be deployed using one of various configurations. Each configuration provides protection against a failure of a system, network interface and/or application. For example, in multiple application-instance DVIPA, a set of applications executing in the same z/OS image are represented by a DVIPA. This DVIPA allows clients to reach these applications over any network interface attached to the z/OS image and allows for automatic rerouting of traffic around a failure in a particular network interface. Additionally, should the primary system fail or enter a planned outage, the DVIPA can be automatically moved to another system in the sysplex. Further, a unique application-instance DVIPA can be associated with a particular application instance in the sysplex. In this case, the DVIPA can be dynamically moved to any system in the sysplex on which the application is executing. This DVIPA provides automatic recovery in scenarios where a particular application or system fails. In particular, a new instance of the application running on another system can trigger the DVIPA to be moved to the other system, allowing client requests to continue to be able to reach the application. Still further, a distributed DVIPA represents a cluster of one or more applications executing on various systems within a sysplex. In this case, new client transmission control protocol (TCP) connection requests can be load balanced across application instances active anywhere in the sysplex, thereby providing protection against the failure of any system, network interface and/or application in the sysplex, while also providing an ability to deploy a highly scalable solution within the sysplex.
IP requires a single owner of each IP address. Consequently, when DVIPA is implemented, a single system in the sysplex is responsible for ownership of each DVIPA. The owner system of each DVIPA advertises its ownership to other routing daemons in the network. This advertisement is performed using a dynamic routing protocol, such as OSPF, via a routing daemon (for example, OMPROUTE), that is associated with each system. In particular, the routing daemon broadcasts (advertises) the DVIPA to other routing daemons in the network. DVIPA technology provides high availability by automatically detecting the failure of a major component, such as a hardware system, an operating system, a TCP/IP protocol stack, a network adapter or an application, and automatically initiating recovery actions. To this extent, ownership of the DVIPA can move to a backup system and the routing daemon on the backup system will broadcast ownership of the DVIPA. In this manner, client requests can continue to be processed successfully by the sysplex. As a result, DVIPA provides high availability TCP/IP communications to an application running in a sysplex environment when a major component fails.
However, no mechanism monitors the health of each routing daemon. Consequently, if a routing daemon has problems, DVIPA information may no longer be advertised, incorrect DVIPA information may be advertised to other routing daemons, or the like. To this extent, a need exists for an improved communications protocol implementation that monitors the health of a communications system, such as a routing daemon.