A typical data center is a collection of servers that run applications that service requests of clients that may be connected to the data center via the Internet. The applications of a data center may provide services for instant messaging, electronic mail, searching, gaming, serving web pages, and so on. A data center may also host internal services such as a distributed file system.
Because of the rapid growth in the number of users of the Internet and in the number of application services provided to those users, the number of servers in large data centers needed to support such rapid growth is growing at a very rapid rate. For example, one search service has more than 450,000 servers in its data centers, with an average of over 15,000 servers per data center. The number of servers in the data centers appears to be doubling every 14 months.
Because the servers of a data center need to communicate with each other, the servers are interconnected via a network architecture. Some of the goals of establishing a network architecture are scalability, fault tolerance, and high network capacity. Scalability refers to the ability of the network to support a large number of servers and allow for incremental expansion of the network. Fault tolerance refers to the ability of the network to continue functioning in the presence of server, communication link, and server rack failures. (A server rack failure may occur when a rack that houses many servers loses power.) High network capacity refers to the communication bandwidth needed to support the applications of the data center.
The network architecture of typical data centers is generally a tree-based architecture. At the lowest level of the tree, servers are in a rack (e.g., 20-80 servers) connected to a rack switch. At the next higher level, server racks are connected using core switches, each of which connects up to a few hundred server racks. A two-level tree architecture thus can support a few thousand servers. To sustain the rapid growth in demand for servers, more high levels are needed that use faster and more expensive switches.
The tree-based architecture does not scale well in terms of supported bandwidth. The core switches, as well as the rack switches, are bandwidth bottlenecks in a tree-based architecture. The aggregate bandwidth of the servers in a rack is typically one or two orders of magnitude larger than the uplink speed of a rack switch. The bandwidth bottleneck is even more severe at higher level core switches. The tree-based architecture is also susceptible to a “single point of failure.” A single failure at a rack switch may disconnect the server rack from the network, whereas a single failure at a core switch may result in thousands of servers being unable to communicate to each other. Although the chances of a “single point of failure” impacting a tree-based network can be reduced by using redundant switches, this redundancy does not solve the problem because a failure can still occur and disconnect thousands of servers from the network.