In a conventional network architecture, there is a coupling of the forwarding and control planes in that both control and data packets are transmitted on the same link, hence the control traffic and data traffic is equally affected when a failure occurs. To simplify, control traffic is the traffic between a network element, referred to herein as a controller, that controls how flows of data are to be processed and forwarded and a forwarding element, referred to herein as a switch. Data traffic is the data payload that is sought to be transferred from one node to another node in a network. Throughout this application, forwarding element(s) are referred to as switch(es). However, the use of the term switch shall not be construed to limit such forwarding elements to Ethernet or layer 2 switches.
This coupling of the forwarding and control planes in a conventional network architecture usually results in an overly complicated control plane and complex network management. Disadvantageously, this is known to create a large burden and high barrier to new protocols and technology developments. Despite the rapid improvement in line speeds, port densities, and performance, the network control plane mechanisms have advanced at a much slower pace than the forwarding plane mechanisms. To overcome the cited disadvantages, the OpenFlow split architecture protocol has been developed.
A split-architecture network design introduces a separation between the control and forwarding components of a network. Among the use cases of such architecture are the access/aggregation domain of carrier-grade networks, enterprise networks, Internet service provider (ISP) networks, mobile backhaul networks, cloud computing, multilayer (L3 & L2 & L1, OTN, WDM) support networks and data centers, all of which are among the main building blocks of a network architecture. Therefore, proper design, management and performance optimization of these networks are of great importance.
Unlike the conventional network architecture which integrates both the forwarding and the control planes in the same network element, a split architecture network decouples these two planes and executes the control plane on servers that might be in different physical locations from the forwarding elements. The use of a split architecture in a network enables the simplification of the switches implementing the forwarding plane and shifts the intelligence of the network into a number of controllers that oversee the switches. The control traffic (sent as, e.g., flow entries, packets, frames, segments, protocol data units) in split-architecture networks can be transmitted on different paths from the data traffic (sent as, e.g., packets, frames, segments, protocol data units) or even on a separate network. Therefore, the reliability of the control plane in these networks is no longer directly linked with that of the forwarding plane. However, disconnection between the control plane and the forwarding plane in the split architecture could disable the forwarding plane; when a switch is disconnected from its controller, it cannot receive any instructions on how to forward new flows, and becomes practically offline.
In a split architecture network, the controller collects information from switches, and computes and distributes the appropriate forwarding decisions to the switches. Controllers and switches use a protocol to communicate and exchange information. An example of such protocol is OpenFlow (see www.openflow.org), which provides an open and standard method for communication between a switch and a controller, and it has drawn significant interest from both academics and industry.
FIG. 1 is a diagram 100 showing an overview of the OpenFlow interface between a switch 109 and a controller 101. Switch 109 is a component of network elements 105. Controller 101 communicates with switch 109 over secure channel 103 using the OpenFlow protocol. The flow or forwarding table 107 in an OpenFlow switch is populated with entries from controller 101, as seen in FIG. 2 consisting of: a rule 201 defining matches for fields in packet headers; an action 203 associated to the flow match 204; and a collection of statistics 205 on the flow 206.
When an incoming packet matches a particular rule, the associated actions are performed on the packet. As seen in FIG. 2, a rule 201 contains key fields 202 from several headers in the protocol stack, for example Ethernet MAC addresses, IP address, IP protocol, TCP/UDP port numbers as well as the incoming port number. To define a flow, all the available matching fields may be used. But it is also possible to restrict the matching rule to a subset of the available fields by using wildcards for the unwanted fields.
The de-coupled control platform of the split architecture eases the task of modifying the network control logic and provides a programmatic interface upon which developers can build a wide variety of new protocols and management applications. In this model, the data and control planes can evolve and scale independently, while the cost of the data plane elements is reduced.
It is well known that link and switch failures can adversely affect network performance. For example, a failure of a few milliseconds may easily result in terabyte data losses on high-speed edges. Studies of the resilience of networks have historically assumed an in-band control model, meaning that the control plane and data plane have the same resilience properties. Although split-architecture networks use an out-of-band model, link and switch failures are still a concern as a single controller is directly coupled by a link to each network element acting as a switch. In such a network, if the link between the controller and switch fails, the switch is unable to update its forwarding table and eventually fails.
In conventional networks, where both control and data packets are transmitted on the same link, the control and data information are equally affected when a failure happens. When used in a split architecture, disconnection between the controller and the forwarding plane could disable the forwarding plane as when a switch is disconnected from its controller, it cannot receive any instructions on how to forward new flows, and becomes practically offline.
In the existing split-architecture network design proposals and preliminary implementations, each switch is pre-programmed with a path to reach the controller. Upon a link or node failure, the switch relies on the controller to detect such failure and re-compute the new path for the switch. Detection of any failures in switches or links by the controller must be based on some implicit mechanisms, such as when Hello messages are not received by the controller from a switch. This introduces large delays in the network for detecting the exact location of the failure and re-establishing the controller-switch connections. If no backup path can be configured for a switch, then the connection of the switch to the controller will be interrupted in case of a failure in the primary path to the controller.
A link failure can occur over a link transporting control traffic, data traffic or both and it indicates that traffic traversing a link can no longer be transferred over the link. The failure can be either of a link between two switches or of a link between one controller and the switch to which it connects. In most cases, these links fail independently.
A switch failure indicates that a network element or forwarding element is unable to originate, respond, or forward any packet or other protocol data unit. Switch failures can be caused by software bugs, hardware failures, misconfigurations, and similar issues. In most cases, these switches fail independently.
Special failure cases include connectivity loss between a switch and a controller: A switch can lose connectivity to its controller due to failures on the intermediate links or nodes along the path between the switch and the controller. Whenever a switch cannot communicate with its assigned controller, the switch will discard all the packets on the forwarding plane managed by the controller, even though the path on the forwarding plane is still valid. In other embodiments, a subset of the traffic can be forwarded on forwarding plane or similar limited functionality can continue for a limited amount of time until a connection with an assigned controller or another controller is re-established. Therefore, this can be considered as a special case of switch failure.
For the most part, controllers and switches are tasked with minimizing the distance between nodes using a routing protocol such as Open Shortest Path First (OSPF). OSPF is currently the most popular interior gateway routing protocol. OSPF (see IETF RFC 2328) is a link-state protocol in which a router broadcasts its neighbors' link-state information to all the nodes in the routing domain. Using this information every router constructs the topology map of the entire network in the domain. Each router maintains a link-state database which reflects the entire network topology. Based on this topology map and the link cost metrics, the routers determine the shortest paths to all other routers using Dijkstra's algorithm. This information is in turn used to create routing tables that are used for forwarding of IP packets.
The primary disadvantage of using a shortest-path routing protocol is that it does not consider network resilience or protection. In evaluating a network design, network resilience is an important factor, as a failure of a few milliseconds may easily result in terabyte data losses on high-speed links. As used herein resilience is the ability to provide and maintain an acceptable level of service in the face of faults and challenges to normal operation. A network element or forwarding element that has greater resilience is better protected from faults and challenges to normal operation than a network element of forwarding element that has lesser resilience. As used herein failure probability is the frequency with which an engineered system or component fails, expressed as the number of failures per hour, or the probability of each node fails in the long time.
What is desired is a method and apparatus that that generates a controller routing tree based on resilience or protection factors and provides back-up links between a switch and a controller. Such desired controller routing tree would be generated in a controller based on information communicated between the switch and controller, used to configure secondary outgoing links in a switch to serve as backup paths between the switch and controller, the switch operable to detect a link or node failure and cause a back-up path from the switch to the controller to be selected.