Cloud services are driving the creation of data centers that hold tens to hundreds of thousands of servers. Additionally, a data center concurrently supports multiple distinct applications, some of which require bandwidth-intensive all-to-all communications among servers in the data center. The large scale and the development of applications bring challenges to the network fabric of a data center.
Objects of a data center network involve interconnecting a number of data center servers and providing efficient and fault-tolerant routing and forwarding services to high-level applications. There are mainly two choices for data center network fabric, i.e. layer-2 network and layer-3 network.                In a layer-2 network, a data center is regarded as a single Ethernet. Ethernet makes network management easy (plug-and-play, seamless virtual machine migration, etc). However, Ethernet cannot scale to networks with tens of thousands of servers.        Layer-3 approach can overcome the scalability problem, but it sacrifices Ethernet's simplicity and imposes administrative burden.        
New methods and systems are proposed by researchers to address the scalability problem of Ethernet, for supporting a “plug-and-play”, large-scale data center network.
Ethernet is a common LAN (Local Area Network) technology in use today. It identifies nodes in the network with MAC addresses. Different from a hierarchical structure of IP addresses, a MAC address has a flat structure and is unique worldwide. A forwarding table in a switch stores mapping records from destination MAC addresses to outgoing ports.
FIG. 1 is a schematic diagram for illustrating creation of a forwarding table by a switch in a self-learning manner. When a data frame arrives at a switch, the switch checks source MAC address of the data frame and stores in a forwarding table a mapping between the source MAC address and a port into which the data frame comes. By virtue of the self-learning mechanism and use of MAC address, network management becomes easier and the switch is capable of “plug-and-play”.
However, Ethernet cannot be scaled to networks with tens of thousands of servers for the following reasons. Firstly, MAC addresses are not hierarchical and thus multiple MAC addresses cannot be aggregated together. Since a forwarding table of a switch stores mapping records between destination MAC addresses and outgoing ports, MAC addresses of all hosts in the entire network need to be stored in the forwarding table of each switch. Buffer size of the switch limits the number of hosts in the network. Secondly, for a data frame having an unknown destination MAC address (the destination MAC address is not stored in the forwarding table), the switch sends (broadcasts) it to all ports except the port at which the data frame arrives, as shown in FIG. 2. In addition, some basic network services (such as Address Resolution Protocol, Dynamic Host Configuration Protocol, etc) are performed by broadcasting. This broadcasting mechanism also restrains Ethernet from expanding to a large scale. Finally, Ethernet uses Spanning Tree Protocol (STP) to avoid loops, but forwarding along a single tree results in inefficient routing and unbalancing link loads.
Thus, a data center network cannot be constructed as a large LAN. One solution is to employ a mixture of layer-2 and layer-3 configurations. That is, a data center is constructed as a plurality of LANs which are connected by IP routes. Each LAN consists of tens or hundreds of machines and forms an IP subnet. The mixture of layer-2 and layer-3 configurations can overcome the scalability problem, but it sacrifices Ethernet's simplicity and imposes administrative burden. An object of the present disclosure lies in solving the scalability problem of Ethernet so as to support a “plug-and-play”, large-scale data center network.
Reference 1 proposes PortLand protocol, i.e. a set of layer-2 addressing, routing and forwarding protocols for data center networks. According to this protocol, Pseudo MAC (PMAC) addresses are assigned to all hosts in the network to encode their position in the topology. PMAC addresses enable efficient forwarding with less switch states. Below, a more detailed description of the PortLand system is given.
FIG. 3 is a schematic diagram of architecture of a PortLand system.
In the PortLand system, each end host is assigned a PMAC (for example, PMAC “00.00.01.02.00.01” is assigned to an end host having MAC address of “00.19.B9.FA.88.E2” and IP address of “10.5.1.2”, as shown in FIG. 3), which encodes the location of the end host in the topology. The PMAC addresses but not the actual MAC addresses are stored in the forwarding table for forwarding data packets. PMAC addresses are hierarchical and thus the switches can maintain smaller forwarding tables. Additionally, the PortLand system introduces a centralized fabric manager (FM) 300 which maintains configuration information and state of the network. Contrary to the pure broadcasting mechanism in conventional Ethernet, FM 300 can facilitate ARP resolution and improve performance of fault tolerance.
The PortLand protocol comprises a set of addressing, routing and forwarding protocols based on Fat Tree topology (see Reference 3). In Fat Tree topology, switches are divided into three layers: edge layer, aggregation layer and core layer. All switches in respective layers are identical, each including k ports. FIG. 3 shows a Fat Tree topology with k=4. The fat tree is split into k individual pods. As shown in dotted line boxes of edge layer and aggregation layer in FIG. 3, a pod is formed by k switches in each box. In general, the fat tree can support non-blocking communications among k3/4 hosts by using 5k2/4 individual k-port switches.
Edge switches assign respective 48-bit PMAC addresses to all directly connected hosts, respectively. PMAC addresses are hierarchical and encode locations of hosts in the topology. The format of PMAC address is as follows:                pod.position.port.vmidwherein        ‘pod’ has 16 bits and represents the pod number of the edge switch,        ‘position’ has 8 bits and indicates the position of the edge switch in the pod,        ‘port’ has 8 bits and represents the port number of the edge switch the host connected to, and        ‘vmid’ has 16 bits and indicates the number of virtual machine on the physical machine (there may be a plurality of virtual machines on the same physical machine).        
In the example shown in FIG. 3, the third host from left has a PMAC address of “00.00.01.02.00.01”, wherein                pod ‘00.00’ indicates pod 0,        position ‘01’ represents position 01 in pod 0,        port ‘02’ indicates that the end host is connected to port 02 of the edge switch, and        vmid ‘00.01’ indicates that virtual machine in the end host has a number of 1.        
As shown in FIG. 3, the PMAC addresses but not the actual MAC address are stored in the forwarding table (shown in the table on right side of FIG. 3) for forwarding data packets. Contrary to the flat structure of MAC addresses, PMAC addresses are hierarchical and thus the switches only need to maintain smaller forwarding tables. The forwarding tables of switches in respective layers are as follows:                In core switch: pod.*outgoing port For example, ports 0 and 1 of the core switch of FIG. 3 on the most right side are connected to pod 0 and 1 respectively. Therefore, all data packets having destination MAC addresses “00.00.*” are sent to port 0, which means that all data packets destined to pod 0 are sent to port 0. Similarly, all data packets destined to pod 1 are sent to port 1.        In aggregation switch: pod.position.*outgoing port For example, the aggregation switch of FIG. 3 on the most right side belongs to pod 3, i.e. 00.11. Thus, all data packets having destination MAC addresses “00.11.*” are sent to this pod 3, wherein all data packets having destination MAC addresses “00.11.00.*” are sent to the edge switch with position ‘0’ (i.e. port 0), and all data packets having destination MAC addresses “00.11.01.*” are sent to the edge switch with position ‘1’ (i.e. port 1). Data packets having other destination MAC addresses are sent to other pods through port 2 or port 3 as uplink data packets.        In edge switch: pod.position.port.*outgoing port For example, the edge switch of FIG. 3 on the most right side belongs to pod 3, position 1, i.e. 00.11.01. It has two hosts directly connected thereto, wherein all data packets having destination MAC addresses of “00.11.01.00.*” are sent to port 0, and all data packets having destination MAC addresses “00.11.01.01.*” are sent to port 1. Data packets having other destination MAC addresses are sent to other hosts through port 2 or port 3 as uplink data packets.        
The PortLand protocol comprises a set of layer-2 addressing, routing and forwarding protocols for data center networks. However, the PortLand protocol is limited to this Fat Tree topology and cannot be applied to other topologies. For example, the format of PMAC used by the PortLand protocol, pod.position.port.vmid, cannot be directly applied to Clos topology as proposed in Reference 2.
FIG. 4 is a schematic of Clos topology. Clos topology is also divided into three layers, wherein Top of Rack (ToR) switches are similar to edge switches in Fat Tree topology which are connected to end hosts directly. All switches in Fat Tree topology are identical. In Clos topology, however, core switches and aggregation switches are switches with 10G ports, and ToR switches are connected to the aggregation switches via 10G uplink ports but to hosts via 1G downlink ports.
There is no concept of pod in Clos topology. Thus, the format of PMAC (pod.position.port.vmid) cannot be utilized directly. If an aggregation switch corresponds to a pod (for example, aggregation switches S3 and S4 correspond to pod 0 and pod 1 respectively), the ToR switches cannot be addressed. Since each ToR switch is connected to two aggregation switches, such as ToR switch S7 connected to aggregation switches S3 and S4, the pod number of ToR switch S7 may be 0 or 1. Consequently, the PortLand protocol proposed in Reference 1 cannot be directly applied into Clos topology proposed in Reference 2.