In the field of networked computer systems high availability, one form of high availability software that is provided is known as clustering software. Clustering software manages the networking operations of a group, or cluster, of networked computer systems, and attempts to ensure the highest availability of running applications for external system users despite networking hardware or software failures. One of the functions of clustering software is to detect and recover from a network fault such as a link failure, a failure of a connection to the computer network, in a network interface card (“NIC”) configured for operation in a computer system on the cluster of networked computers. This function is often referred to as network fault monitoring.
Clustering software systems have been designed and built for various types of computer networking protocols, including Ethernet, and for various network computer operating systems, including versions of UNIX such as Hewlett-Packard's HP-UX. In the HP-UX operating system, network fault monitoring is accomplished through the use of the Data Link Provider Interface (“DLPI”). The DLPI is a set of Application Programming Interfaces (“API”) that operate at the second lowest, or data link, layer of a computer system's networking protocol stack.
The layers of a networking protocol stack, according to the Open Systems Interconnect (“OSI”) seven layer model (established by the International Organization for Standardization (“ISO”) in 1978), typically consist of the following layers moving from bottom (closest to the hardware) to top (closest to the user): a physical layer comprising the networking hardware used to make connections to the network (example physical layer protocols include token ring and bus); a data link layer which splits data into frames for sending on to the physical layer and receives acknowledgement frames, and also performs error checking (the data link layer may comprise the driver software for the NIC); a network layer, or communications subnet layer, which determines the routing of data packets from sender to receiver (the most common network layer protocol is Internet Protocol (“IP”)); a transport layer, or “host-host layer,” which determines how to minimize communications errors and establish point to point connections between two host computers such that messages between the two host computers will arrive uncorrupted and in the correct order (an exemplary transport layer protocol is Transmission Control Protocol (“TCP”), another is User Datagram Protocol (“UDP”)); a session layer; a presentation layer; and an application layer which is concerned with the user's view of the network.
DLPI is used by the clustering software on the HP-UX operating system to monitor for network faults. The clustering software generates DLPI traffic across all NICs being monitored and collects resulting data in a Management Information Base (“MIB”), compliant with the Simple Network Management Protocol (“SNMP”), for all data packets sent and received by the NICs. The statistics tracked by the MIB can then be used to determine if each NIC is up or if it is down.