Data center can have tens of thousands of servers that provide a variety of services to customers of the data center. When providing these services, servers typically need to communicate (by sending packets of data) with one or more other servers or external computing devices. For example, if a group of servers perform a parallel algorithm, each server may need to notify the other servers that its portion of the algorithm has been completed. As another example, servers that host an e-commerce web site need to receive communications from computing devices accessing the web site and sent responsive communications to the computing devices. In addition, the servers that host the web site may need to communicate with other servers that host a database of products for sale and other servers that host a database of order and payment information. With large data centers, it is important that the communications be delivered both reliably and in a timely manner.
To support such delivery, a data center includes a network interconnection system. Although the network interconnection system could employ a full mesh connection, the number of such connections is O(n2), where n is the number of servers. To avoid such a large number of connections, a typical network interconnection system includes various routing devices, such as routers and switches, that are arranged hierarchically. With a hierarchical arrangement, each server is connected to at least one routing device at the lowest level and the routing devices at the lowest level are connected to routing devices at the next highest level, and so on up the hierarchy to root routing devices.
FIG. 1 illustrates a network interconnection system of a data center with hierarchically arranged routing devices. The network interconnection system 100 includes four levels 110, 120, 130, and 140. The inter-data-center level 110 is the first level (or root, top, or highest level) and includes an inter-data-center set 111 of routing devices such as routing devices 111a through 111b. The routing devices of the inter-data-center level provide connections to other data centers and the Internet. The data center level 120 is the second level (or next lower level) and includes data center sets 121 through 129 of routing devices. Data center set 121 includes routing devices 121a through 121b, and data center set 129 includes routing devices 129a through 129b. The routing devices of the data center level are connected to the routing devices of the inter-data-center level. The cluster level 130 is the third level (or next lower level) and includes cluster sets 131 through 139 of routing devices. A cluster is a collection of servers whose communications are routed through a cluster set. Cluster set 131 includes routing devices 131a through 131c, and cluster set 139 includes routing devices 139a through 139c. The routing devices of the cluster level are connected to the routing devices of the data center level. The leaf level 140 is the fourth level (or lowest level) and includes leaf sets 141 through 149 of routing devices. Each leaf set may include only one routing device such as routing devices 141a, 142a, and 149a, which may be top-of-rack switches. The routing devices of the leaf level are connected to the routing devices of the cluster level. The routing devices of the leaf level are connected to the individual servers in the rack (e.g., via a local area network). Other example network interconnection systems may include more or fewer levels depending on the size of the network, the bandwidth of the connections, timing constraints, and so on. For example, some network interconnection systems include a level between the inter-data-center level and the data center level, referred to as a border level. The border level may connect data centers within a geographic region.
A network interconnection configuration for a data center defines the topology of the network interconnection system. The network interconnection configuration specifies the number of levels, number of sets in each level, and number of routing devices in each set. For example, a network interconnection configuration may specify five levels with the first level having one set of three routing devices, the second level having four sets of four routing devices, the third level having eight sets of ten routing devices, and so on. The network interconnection configuration also specifies the connections between the levels. For example, the network interconnection configuration may specify that each routing device of the second level is connected to each routing device of the first level. The network interconnection may also specify connections between the third level and the second level as follows. Each routing device of a set in the third level is connected to only one set of the second level but is connected to every routing device in that set at the second level. Furthermore, the network interconnection configuration may also specify that at least one routing device in each set at the third level is connected to each of the sets of the second level, which means that the number of routing devices in a set at the third level is greater than or equal to the number of sets at the second level. If the third level has cluster sets with eight routing devices each and the second level includes four sets, then the connections for the routing devices of a cluster set may be a specified in Table 1.
TABLE 1Routing DeviceConnects ToCluster Set 1, device aEvery device in Data Center Set 1Cluster Set 1, device bEvery device in Data Center Set 1Cluster Set 1, device cEvery device in Data Center Set 2Cluster Set 1, device dEvery device in Data Center Set 2Cluster Set 1, device eEvery device in Data Center Set 3Cluster Set 1, device fEvery device in Data Center Set 3Cluster Set 1, device gEvery device in Data Center Set 4Cluster Set 1, device hEvery device in Data Center Set 4
Each routing device and server in a data center has an address. Most data centers are Internet Protocol (“IP”) networks that employ an IP addressing scheme to address devices such as IP version 4 (“IPv4”) or IP version 6 (“IPv6”). IPv4 specifies an IP address of 32 bits that is divided into a network address portion and a host address portion. IPv4 addresses are typically represented by four numbers that vary from 0 to 255 and are separated by periods, such as “10.168.1.1.” IPv4 originally allowed the network address portion to be 8, 16, or 24 bits, referred to as class A, B, or C IP addresses, respectively. The combination of IP address and class uniquely identifies a host, which is represented as “10.168.1.1/B” for an IPv4 class B address. Because these three network address sizes meant that many host addresses might go unused, IPv4 was updated to employ the Classless Inter-Domain Routing (“CIDR”) in which the network address could vary in size from 1 to 31 bits. The combination of the IP address and the number of bits in the network address, referred to as a network address mask, uniquely identifies a host, which is represented as “10.168.1.1/20” for an IPv4 address with a 20 network address. The 32-bit IP address of IPv4 was thought at one time to be large enough to uniquely identify all hosts of each network. Because of the rapid growth of the Internet and computer networks for both organizations and individuals, a 32-bit IP address proved to be not large enough. IPv6 was developed to overcome the 32-bit limitation of IPv4. IPv6 specifies that an IP address has 128 bits and can thus address over 1028 more addresses than IPv4.
Each routing device has a unique IP address and has some number of ports through which direct connections are made to other devices (e.g., routing devices or servers). Table 2 illustrates an example of the IP address of the device to which the ports of a routing device may be connected.
TABLE 2PortIP Address1100.0.0.12100.0.0.23100.0.0.34100.0.0.45192.168.0.06192.168.0.17156.0.0.18156.0.0.1Table 2 indicates that port1 is connected to the device with the IP address of 100.0.0.1. Table 2 also indicates that port7 and port8 are both connected to the device with the IP address of 156.0.0.1.
The routing devices use routing tables to control the routing of packets through the appropriate connections to ensure that the packets get from their source devices to their destination devices. Each packet includes a destination address (e.g., IP address) and typically includes a source address. As a packet is routed, each routing device through which the packet is routed is considered to be a “hop” along the path of connections between routing devices from the source address to the destination address. The routing table of a routing device specifies, for each possible destination address, the next hop to which the routing device will send that packet. Table 3 illustrates an example routing table.
TABLE 3Address RangeVia10.0.0.24/31100.0.0.1 (port1) or100.0.0.2 (port2)10.0.0.0/24100.0.0.1 (port1) or100.0.0.2 (port2) or100.0.0.3 (port3) or100.0.0.4 (port4)120.0.128.0/25192.168.0.0 (port5) or192.168.0.1 (port6)other156.0.0.1 (port7) or156.0.0.1 (port8)
The routing table of Table 3 maps address ranges to ports through which packets with a destination address within the range or to be mapped. In this example, the addresses are CIDR IPv4 addresses. The address ranges are specified by an IP address and a mask. The first rule (i.e., entry) of Table 3 specifies the address range of “10.0.0.24/31.” Because the mask is 31, the addresses in the range have the same higher order 31 bit as 10.0.0.24. Since only the lowest order bit can vary, the range has two addresses: 10.0.0.24 and 10.0.0.25. The second rule specifies the address range of “10.0.0.0/24.” Because the mask is 24, only the lower 8 bits can vary, and the range includes 256 addresses from 10.0.0.0 to 10.0.0.255. The ranges “10.0.0.24/31” and “10.0.0.0/24” both include the addresses 10.0.0.24 and 10.0.0.25. However, when a routing device receives a packet, it applies the rules in order of longest mask. So in this case, even though both ranges include addresses 10.0.0.24 and 10.0.0.25, the routing device will apply the first rule, which has the longest mask, to a packet, and if it has one of the addresses in the range, it sends the packet to a next hop as specified by the rule. Each rule includes the “via” or next hops to which a packet with a destination address within the range of the rule is to be routed. The first rule specifies that the next hop is either through port1 or port2, and the third rule specifies that the next hop is both port5 and port6 (i.e., the packet is sent via both connections).
A data center is a dynamic in the sense that clusters of servers may be added, removed, or resized as needed to support the computing needs of customers. As customer's needs change, the routing tables of the routing devices need to be updated to meet the needs of the customers. In addition, various problems in a data center may result in the network interconnection system not functioning as intended. For example, if a routing device fails, a routing device connected to the failed routing device may update its routing table so that the failed routing device is not a next hop. If a packet can get to its destination only via the failed routing device, then the packet is undeliverable. Even if a packet could get to its destination using a different routing device (e.g., because of built-in redundant paths), the benefits of having the redundant paths (e.g., increased overall bandwidth) may be lost. Similarly, if a desired connection between routing devices is never made or fails (e.g., because a technician mistakenly removed the connection), packets may not reach their destination. Because of the size and complexity of a network interconnection system, it can be very difficult and time-consuming to manually verify the correctness of the network interconnection system. Currently, problems are typically detected only after an incident has occurred, such as a routing device logging an undeliverable message. When such a problem is detected, a technician may be assigned to investigate and correct the problem.