As known in the art, Software Defined Networking (SDN) is a computer networking paradigm in which the system(s) that make decisions about where traffic is sent (i.e., the control plane) are decoupled from the system(s) that forward traffic to their intended destinations (i.e., the data plane). By way of example, FIG. 1A depicts a simplified representation of an SDN network 100 comprising an SDN controller 102 and three network switches 104, 106, 108. In this example, SDN controller 102 constitutes the control plane of network 100 and is responsible for, e.g.: (1) maintaining a global view of network 100; (2) determining (via one or more applications running on, or in communication with, controller 102) forwarding rules to be followed by switches 104-108 in order to achieve a desired network behavior; and (3) causing those rules to be programmed into the hardware forwarding tables of switches 104-108. Switches 104-108 constitute the data plane of network 100 and are responsible for, e.g., forwarding, at line rate, network traffic in accordance with the forwarding rules determined by SDN controller 102.
In current SDN networks, the detection of network faults is handled centrally by the SDN controller via Link Layer Discovery Protocol (LLDP). An example of a conventional fault detection method 150 that can be performed by SDN controller 102 of FIG. 1A using LLDP is depicted in FIG. 1B. At step (1) (reference numeral 152), SDN controller 102 constructs and sends out an LLDP packet with a “packet_out” message to each connected switch. SDN controller 102 typically performs this step every second.
At step (2) (reference numeral 154), each switch (104, 106, 108) receives the LLDP packet sent by SDN controller 102 and forwards the packet on all of its outgoing ports (to other switches in the network).
Finally, at step (3) (reference numeral 156), each switch (104, 106, 108) receives the LLDP packets forwarded by other switches and sends those packets back to SDN controller 102. If there are no topology failures in the network, SDN controller 102 should receive these return packets approximately every second (i.e., at the same rate that the packets were sent out at step (1)). If SDN controller 102 does not receive a return packet from a particular switch within a predefined LLDP timeout period (e.g., 3 seconds), SDN controller 102 can conclude that one or more ports or links along the path from that switch have failed.
While the fault detection method shown in FIG. 1B is functional, it suffers from a number of limitations. First, since method 150 requires that SDN controller 102 send out LLDP packets on a continuous basis to switches 104-108 and monitor for the receipt of those packets before determining whether a fault has occurred, method 150 cannot easily scale to support a very large network or to support faster detection times. For instance, if SDN controller 102 increased the rate at which it sent out LLDP packets in order to improve detection times, SDN controller 102 would also need to be able to process the incoming return packets at that higher rate, which may not be possible. Similarly, if network 100 increased in size to encompass more switches, SDN controller 102 would need to be able to handle the greater volume of outgoing and incoming LLDP traffic caused by the additional switches.
Second, since SDN controller 102 acts as the point-of-detection, SDN controller 102 must communicate with the affected switch(es) upon detecting a fault into order to initiate a repair (e.g., provisioning and switch-over to a backup path). This extra communication step can slow down the overall repair process.
Third, method 150 of FIG. 1B can only be used to detect faults that affect the integrity of a network topology, such as port, link, or node failures. Method 150 cannot detect flow-specific failures that do not affect the network topology, but may nevertheless result in unexpected forwarding behavior (e.g., a mis-programmed flow or incorrect flow priorities).