1. Field of the Invention
The invention relates to networking.
2. Description of the Related Art
As networks grow every larger and more complicated, delays are induced in more locations and more physical hardware is required, which has costs in terms of both money and space. It would be desirable to reduce cost, space and delays in a network.
Server development has been enhanced by the inclusion of Peripheral Component Interconnect Express (PCIe) links inside the server. As shown in FIG. 1, a modern processor 100 may include several PCIe root complexes. Memory 102 is directly connected to the processor 100. A series of PCIe devices 104 can be directly connected to the processor 100 or to a PCIe fabric switch 106, which is connected to the processor 100. The PCIe devices 104 can be of various functions, such as storage controllers for direct attached storage, network interface controllers (NICs) for Ethernet connections to a local area network (LAN), host bus adapters (HBAs) for Fibre Channel (FC) connections to a storage area network (SAN) and host channel adapters (HCAs) for InfiniBand connections for clustering.
There have been efforts to use PCIe as a cluster interconnect, as shown in FIG. 2. Each server or host 200 is connected to an edge PCIe fabric switch 204. A layer of core PCIe fabric switches 206 then links together the edge PCIe fabric switches 204. Shared I/O 202, such as storage, NICs or HBAs, is also connected to an edge PCIe fabric switch 204, which are connected to the core PCIe fabric switches 206. This configuration allows very high speed, very low overhead communication between the hosts 200 in the cluster and high speed access to the shared I/O 202.
FIG. 3 illustrates proposed rack scale use of PCIe interconnects. This is a variant on the cluster interconnect of FIG. 2, just configured for use in normal data center racks. A series of host chassis 302, such as 1 U high chassis for higher density, are used to provide the basic processing capability. A host chassis 302 includes a host 304, primarily the processor 100 and memory 102, and a PCIe retimer 306. As the PCIe links will be longer than if located entirely on a normal motherboard, retiming is necessary. A storage chassis 308 includes a storage controller 310, typically a RAID controller; a storage array 312, an array of hard drives to provide bulk storage; and a PCIe retimer 306. The storage chassis 308 provides a direct attached bulk storage function. A flash chassis 314 includes a series of solid-state disk (SSD) controllers 316, which are connected to an array of flash memory devices 318. The SSD controllers 316 are connected to a PCIe bridge 320 as illustrated as the exemplary SSD controllers 316 are not PCIe compatible. If the SSD controllers were PCIe compatible, then PCIe retimers could have been used. The flash chassis 314 provides high speed, non-volatile storage for use by the processors 100, often in online transaction processing (OLTP) applications. A graphics processing unit (GPU) chassis 322 includes an array of GPUs 324, which can be used for high speed array and vector processing, for example. The GPUs 324 are connected to a PCIe bridge 326. At the top of the rack (TOR) is an interconnect chassis 328. The illustrated interconnect chassis 328 includes one HBA 330 and two NICs 332. The HBA 330 is connected to a SAN fabric 334, to which conventional external storage 336 is connected. The NICs 332 are connected to a LAN 338 to provide general Ethernet connectivity, for example to the Internet. A PCIe fabric switch 340 connects to the HBA 330 and NICS 332 in the interconnect chassis 328 and to the PCIe retimers 306 and PCIe bridges 320 and 326 to provide overall interconnection of the various chassis to provide a complete computer system.
While the rack configuration of FIG. 3 is an advance over using a series of individual hosts, each having processor, memory, storage, HBA and NIC, with TOR switches for the SAN and LAN, it is really nothing more than an exploded and reconfigured host, with all of the attendant delays and slowdowns associated with a typical server. Thus, while it is an improvement, there are still many delays present in interconnecting with other devices.