A. Technical Field
The present invention relates to detection of failure in a network communication, more particularly, to systems and methods for detecting failure of node, link, or both using software-defined networking infrastructure.
B. Description of the Related Art
In general, the conventional network elements spend a lot of resources to determine end-to-end connectivity between nodes. FIG. 1 shows a schematic diagram of a conventional network system/topology 100. As depicted, the two nodes 102 and 108 communicate to each other through an Ethernet cloud 104, where multiple switches 106 are used to form multi-hop topology between the two nodes. Each of the nodes 102 and 108 may be a server and multiple applications 103 and 109 may be installed in the nodes 102 and 108, respectively. For brevity, only two nodes are shown in FIG. 1.
Each of the switches 106 is coupled to a route controller 112 in a network controller 111, as indicated by lines 120, and sends a status notice to the route controller 112. The route controller 112 monitors the connectivity between the switches 106 within the Ethernet cloud 104, but not the status of nodes 102 and 108. As such, the nodes 102 and 108 should keep on checking the aliveness of the communication session therebetween to avoid data loss when one of the nodes and/or a switch on the communication path is down.
Aliveness of the communication session(s) between the nodes 102 and 108 may be checked at various levels, such as link level, protocol level, or application level. Typically, the aliveness check is performed by use of hello packets, where a hello packet (or, equivalently keepalive packet) is data sent to test the connection between two nodes. For instance, when the node 108 receives a hello packet from the node 102, the node 108 sends a return packet to the node 102. When the aliveness/connectivity check between two nodes is based on the conventional hello/return packets, the total time and resource consumption for connectivity checks in the system 100 is directly proportional to the total number of nodes/links in the system 100. Furthermore, when the aliveness/connectivity check is performed at the application level, each end application pair is responsible for maintaining connectivity check by periodically exchanging hello packets. For instance, in FIG. 1, the node 102 having n number of applications/virtual machines running thereon and the node 108 having m number of applications/virtual machines running thereon may exchange n×m number of hello packets to check the aliveness of the communication session between them. This conventional approach requires unnecessary traffic bandwidth of the system 100 and CPU processing of the nodes 102 and 108. Accordingly, there is a need for efficient systems and methods for detecting and eliminating failed node/interconnects between end nodes.