More than ever, data centers are attracting increasing interest as several network functionalities are expected to be provided by servers according to the Network Function Virtualization (NFV) paradigm. Data centers are also the places where the “Cloud” has its computational and storage engines. However, there are still a lot of challenges to be solved in data center deployments, especially on performance, energy efficiency, resiliency, scalability, and on how to efficiently transport data inside a data center and among data centers.
One of the main issues of data centers is that there are tens of thousands of individual flows simultaneously exchanged among machines. Not only is the volume of flows a challenge to network control, but the fact that they need to be continually updated. In order to achieve this, a common approach is to decouple the control function from the forwarding mechanism by the abstraction of forwarding functionalities. FIG. 1 is a schematic diagram of a typical arrangement, illustrating a controller (1), a switch (2) and a connecting link (3). The switch has a plurality of flow tables that are used to perform packet lookup and forwarding. Each entry in a flow table typically comprises match fields (e.g. ingress port, source address, etc.), counters and instructions (e.g. forward the packet to a given port, drop the packet, continue the processing in the next table, etc.) that are applied to matching packets. The controller (1) can add, update and delete flow entries in the flow tables.
The connecting link (3) may use the OpenFlow protocol. Initially OpenFlow (OF) was used in research and academic areas and has been used as an enabler for network demonstrations in experimental fields, but it is now finding its way into the market. OF is well suited for data centers due to the relatively low-priced hardware and high flexibility.
Typically, the switch initiates a Transmission Control Protocol (TCP) connection to the controller using the transport port 6653. When an OF connection is first established each side of the connection sends a HELLO message for protocol version negotiation. After the switch (2) and the controller (1) have exchanged HELLO messages and successfully negotiated a common version number, the controller sends a FEATURES_REQUEST message to identify the switch and read its basic capabilities and the switch responds with an FEATURES_REPLY message.
FIG. 2 is a signaling diagram illustrating this process. The controller (1) sends a HELLO message (4) to the Switch and the Switch (2) sends a HELLO message (5) to the Controller. A FEATURES_REQUEST message (6) is sent to the switch by the controller and the switch replies with a FEATURES_REPLY message (7). OF connection maintenance is done by the underlying TCP connection mechanisms and by periodic ECHO messages exchange. Flow modification messages are sent from the switch to the controller to modify entries in the latter's flow entries, when a change in packet forwarding is required.
An issue arises when a break in the link (3) between the switch and the controller occurs. A loss of IP connectivity can be caused by a failure in the data center internal network, including hardware or software faults or can be caused by network overload or other events. It may also be required for software or hardware upgrade.
A break in the connection between controller (1) and switch (2) means that the switch does not receive flow modification messages from the controller. After a disconnection the switch must immediately enter in one of the following modes, depending upon configuration:                “fail secure” mode: the only change to switch behavior is that packets and messages destined to the controllers are dropped.        “fail standalone” mode: the switch is free to use flow tables in any way it wishes, the switch may delete, add or modify any flow entry.        
While disconnected, the switch periodically attempts to re-establish the connection to the controller. When the OF channel is re-established, it is necessary to re-synchronize the states of the switch and the controller, i.e. ensure that the flow tables in the switch have entries corresponding to the flow modification messages which have been sent to it by the controller. In order to achieve this synchronization, the controller (1) has two options:    1. Retrieve all flow entries with a FLOW_STATS_REQUEST to re-synchronize its state with the switch state.    2. Delete all flow entries with a FLOW_MOD request to start from a clean state on the switch. Then, reinstall all flows.
FIG. 3 is a signaling diagram for the first option. The process starts with a FLOW_STATS_REQUEST message (8) from the controller to the Switch. This asks the switch to provide the current status of its flow tables. The switch responds with a FLOW_STATS_REPLY message (9) for each of its flow entries. The total resynchronization time Tsync1 (10) is from the sending of the FLOW_STATS_REQUEST message to the receipt of the last FLOW_STATS_REPLY. By this process, the status of the switch's flow tables is reconstructed by the controller.
FIG. 4 is a signaling diagram for the second option. The process starts with a FLOW_MOD_DELETE (11) message being sent from the controller to the switch, instructing the latter to delete all its flow table entries. New FLOW_MOD messages (12) are then sent to the Switch. A FLOW_MOD message is sent for every flow entry. The synchronization time Tsync2 (10) starts with the sending of the FLOW_MOD_DELETE message and ends with the receipt of the last FLOW_MOD message.
Both options are time and bandwidth consuming and scale with the number of flows. In a data center there would be tens of thousands of individual flows running through a network at any given moment. Throughout the re-synchronization phase, the switch is not operational and cannot be used for new services or traffic recovery.