Software defined networking (SDN) comprises a plurality of hosts in communication over a physical network infrastructure, each host having one or more virtualized endpoints such as VMs or containers that are connected to one another over logical overlay networks that are decoupled from the underlying physical network infrastructure. One common characteristic of software defined networking is a separation of the control plane from the data plane. Control planes in a network are concerned with determining the logical overlay network topology and maintaining information about network entities such as logical switches, logical routers, and virtualized endpoints, etc. The logical topology information is translated by the control plane into network configuration data, such as forwarding table entries to populate forwarding tables at the virtual switches at each host. In large data centers having hundreds or thousands of hosts and/or logical networks, the processing and communication requirements are such that a single computer system is incapable of performing all the necessary tasks for all the hosts and networks. To address this problem various techniques for scaling out the control plane have been implemented. For example, to distribute some of the processing load to the hosts, the control plane may be divided into a central control plane (CCP) as well as local control planes (LCP) at each host.
Sharding is a mechanism used to provide high scalability and availability of a CCP by identifying “master” nodes among a plurality of nodes within the CCP for handling data from particular sources or of particular types. One type of sharding is logical sharding, which typically involves assigning (e.g., using a hash algorithm to determine an assignment of) one node of a CCP (also referred to as a CCP node) as the logical master of each specific logical network entity, such as a logical switch or logical router, in the network. The hash algorithm may be based on hashing of unique logical entity identifiers, and the assignments may be determined by one or more CCP nodes and shared by all CCP nodes in the form of a sharding table, which may comprise a table including logical entity identifiers and CCP node identifiers. The sharding table may also be published to a plurality of transport nodes (e.g., hosts), which may comprise physical or virtual devices, such as hypervisors running on host machines, configured to implement logical entities. The transport nodes may use the sharding table to determine which CCP node is the master of a given logical entity. A logical entity reports network configuration data only to the CCP node which is its logical master, and the CCP node stores this network configuration data and provides it to other logical entities for which it is the logical master, as well as to transport nodes.
While logical sharding works well in many instances, it can raise heretofore-unrecognized problems in certain corner cases. A first corner case can occur when a sharding change results in two logical entities being reassigned from an old logical master to a new logical master. Such a sharding change can occur, for example, when a node is added or removed from the CCP cluster. There may be a brief interval where the first logical entity has not yet connected to the new logical master, and so the new logical master has not yet received network configuration data from the first logical entity. If the second logical entity connects to the new logical master during this interval, the new logical master will not have a complete set of relevant network configuration data to provide to the second logical entity. This may affect existing traffic, as the new logical master and the second logical entity will have an incomplete set of network configuration data until the first logical entity connects to the new logical master.
A second corner case is when a first and second logical entity have both been reassigned from an old logical master to a new logical master, but only the first logical entity has received the change and moved to the new logical master. The first logical entity will have sent a flush message to the old master upon disconnecting, and so the old logical master will have deleted the network configuration data from the first logical entity. As a result, the second logical entity will not have access to the network configuration data from the first logical entity until it becomes aware of its master change and moves to the new logical master.
A third corner case can occur when a new logical entity joins the network during the interval between the time that a sharding change occurs and the time when all logical entities have moved to their new logical master. In this case, the new logical entity cannot get a full picture of all of the relevant network configuration data from either the old logical master or the new logical master until all logical entities have completely moved to their new logical master.
Each of these corner cases, though rare, can result in traffic flapping, which means that two or more alternating versions of network configuration data may be published by a CCP node. Because of this, transport nodes may be unable to appropriately report or receive all relevant network configuration data about logical entities during a sharding change. Consequently, a sharding mechanism is needed which will allow for consistent processing of network configuration data during sharding changes.