In a conventional network architecture, there is a coupling of the forwarding and control planes in that both control and data packets are transmitted on the same link, hence the control traffic and data traffic is equally affected when a failure occurs. To simplify, control traffic is the traffic between a network element, referred to herein as a controller, that controls how flows of data are to be processed and forwarded and a forwarding element, referred to herein as a switch. Data traffic is the data payload that is sought to be transferred from one node to another node in a network. Throughout this application, forwarding element(s) are referred to as switch(es). However, the use of the term switch shall not be construed to limit such forwarding elements to Ethernet or layer 2 switches.
This coupling of the forwarding and control planes in a conventional network architecture usually results in an overly complicated control plane and complex network management. Disadvantageously, this is known to create a large burden and high barrier to new protocols and technology developments. For the most part, controllers and switches are tasked with minimizing the distance between nodes using a routing protocol such as Open Shortest Path First (OSPF). OSPF (IETF RFC 2328) is a link-state protocol in which a router broadcasts its neighbors' link-state information to all the nodes in the routing domain. Using this information every router constructs the topology map of the entire network in the domain. Each router maintains a link-state database which reflects the entire network topology. Based on this topology map and the link cost metrics, the routers determine the shortest paths to all other routers using Dijkstra's algorithm. This information is in turn used to create routing tables that are used for forwarding of IP packets.
The primary disadvantage of using a shortest-path routing protocol is that it does not consider network resilience or protection. In evaluating a network design, network resilience is an important factor, as a failure of a few milliseconds may easily result in terabyte data losses on high-speed links. As used herein resilience is the ability to provide and maintain an acceptable level of service in the face of faults and challenges to normal operation. A network element or forwarding element that has greater resilience is better protected from faults and challenges to normal operation than a network element or forwarding element that has lesser resilience. As used herein failure probability is the frequency with which an engineered system or component fails, expressed as the number of failures per hour, or the probability of each node fails in the long term.
Despite the rapid improvement in line speeds, port densities, and performance, the network control plane mechanisms have advanced at a much slower pace than the forwarding plane mechanisms. To overcome the cited disadvantages, the OpenFlow split architecture protocol has been developed.
A split-architecture network design introduces a separation between the control and forwarding components of a network. Among the use cases of such architecture are the access/aggregation domain of carrier-grade networks, enterprise networks, Internet service provider (ISP) networks, mobile backhaul networks, cloud computing, multilayer (L3, L2 and L1, OTN, WDM) support networks and data centers, all of which are among the main building blocks of a network architecture. Therefore, proper design, management and performance optimization of these networks are of great importance.
Unlike the conventional network architecture which integrates both the forwarding and the control planes in the same network element, a split architecture network executes the control plane on control elements (e.g., a controller) that might be in different physical locations from the forwarding elements (e.g., switches). The use of a split architecture enables the simplification of the switches implementing the forwarding plane and shifts the intelligence of the network into a number of controllers that oversee the switches. The control traffic (sent as, e.g., flow entries, packets, frames, segments, protocol data units) in a split-architecture network can be transmitted on different paths from the data traffic (sent as, e.g., packets, frames, segments, protocol data units) or even on a separate network. Therefore, the reliability of the control plane in these networks is no longer directly linked with that of the forwarding plane. However, disconnection between the control plane and the forwarding plane in a split architecture network could disable the forwarding plane. When a switch is disconnected from its controller, it cannot receive any instructions on how to forward new flows and becomes offline for all practical purposes.
In a split architecture network, the controller collects information from switches, and computes and distributes the appropriate forwarding decisions to the switches. Controllers and switches use a protocol to communicate and exchange information. An example of such protocol is OpenFlow (see www.openflow.org), which provides an open and standard method for communication between a switch and a controller, and it has drawn significant interest from both academia and industry.
FIG. 1 is a diagram 100 showing an overview of the OpenFlow interface between a switch 109 and a controller 101. Switch 109 is a component of network elements 105. Controller 101 communicates with switch 109 over secure channel 103 using the OpenFlow protocol. The flow or forwarding table 107 in an OpenFlow switch is populated with entries from controller 101, as seen in FIG. 2 consisting of: a rule 201 defining matches for fields in packet headers; an action 203 associated to the flow match 204; and a collection of statistics 205 on the flow 206.
When an incoming packet matches a particular rule, the associated actions are performed on the packet. As seen in FIG. 2, a rule 201 contains key fields 202 from several headers in the protocol stack, for example Ethernet MAC addresses, IP address, IP protocol, TCP/UDP port numbers as well as the incoming port number. To define a flow, all the available matching fields may be used. But it is also possible to restrict the matching rule to a subset of the available fields by using wildcards for the unwanted fields.
The de-coupled control platform of the split architecture eases the task of modifying the network control logic and provides a programmatic interface upon which developers can build a wide variety of new protocols and management applications. In this model, the data and control planes can evolve and scale independently, while the cost of the data plane elements is reduced.
It is well known that link and switch failures can adversely affect network performance. For example, a failure of a few milliseconds may easily result in terabyte data losses on high-speed edges. A link failure can occur over a link transporting control traffic, data traffic or both and it indicates that traffic traversing a link can no longer be transferred over the link. The failure can be either of a link between two switches or of a link between one controller and the switch to which it connects. In most cases, these links fail independently.
A switch failure indicates that a network element or forwarding element is unable to originate, respond, or forward any packet or other protocol data unit. Switch failures can be caused by software bugs, hardware failures, mis-configurations and similar issues. In most cases, these switches fail independently.
Special failure cases include connectivity loss between a switch and a controller: A switch can lose connectivity to its controller due to failures on the intermediate links or nodes along the path between the switch and the controller. Whenever a switch cannot communicate with its assigned controller, the switch will discard all the packets on the forwarding plane managed by the controller, even though the path on the forwarding plane is still valid. In other situations, a subset of the traffic can be forwarded by the forwarding plane or similar limited functionality can continue for a limited amount of time until a connection with an assigned controller or another controller is re-established. Therefore, this can be considered as a special case of switch failure.
Conventional split architecture design assumes the use of either fully in-band or fully out-of-band connectivity between forwarding and control planes. As used herein, in-band connections mean that data and control traffic share the same physical connections and out-of-band connections mean that data and control traffic share different physical connections. In conventional networks, where both control and data packets are transmitted on the same link, the control and data information are equally affected when a failure happens. When used in a split architecture, disconnection between the controller and the forwarding plane could disable the forwarding plane as the switch is unable to receive any instructions on how to forward new flows.
In conventional split-architecture network designs, each switch is pre-programmed with a path to reach the controller. Upon a link or node failure, the switch relies on the controller to detect such failure and re-compute the new path for the switch. Detection of any failures in switches or links by the controller must be based on some implicit mechanisms, such as when Hello messages are not received by the controller from a switch. This introduces significant delays in the network as it must detect the exact location of the failure and then re-establish the controller-switch connections. If no backup path can be configured for a switch, then the connection of the switch to the controller will be interrupted.
Studies of the resilience of networks have historically assumed an in-band control model, meaning that the control plane and data plane have the same resilience properties. The existing work on the connectivity between the control plane and forwarding plane in the split architecture assumes either fully in-band or fully out-of-band connections. In the fully in-band scenario, a single infrastructure is used for both data and control traffic. In the fully out-of-band scenario, the control traffic is carried over a separate network from the data network. While the latter scenario provides a more reliable connection to the switch for control traffic, it can be very costly to set up a completely separate network for the control traffic. Although split-architecture networks use an out-of-band model, link and switch failures are still a concern as a single controller is directly coupled by a link to each network element acting as a switch. In such a network, if the link between the controller and switch fails, the switch is unable to update its forwarding table and eventually fails.
When using a split architecture in the access/aggregation network environment, the advantages of sending control traffic out-of-band may not always hold. First, the network can be geographically distributed. Thus, a direct link between every switch to the controller may require long-distance fiber and costly deployment. Second, even in a single geographic location, when the size of the network grows to a large scale, building a separate out-of-band dedicated network for the control plane can be expensive. What is desired is a hybrid design for connection between the controller and the switches that is capable of incorporating both in-band and out-of-band models.