A conventional Peripheral Component Interconnect (PCI) bus is a local parallel bus that allows peripheral cards to be added into a single computer system. Examples of commercially available peripheral cards with a PCI bus interface are SCSI (data storage) cards, wireless LAN add-in cards, digital TV tuner add-in cards, USB, FireWire 1394 controllers, Gigabit Ethernet (GbE) add-in cards, etc. The PCI bus communicates with a single CPU or multiple CPUs of the computer system through a PCI-bridge controller. Several PCI bridges may exist in a computer system and couple a diversity of input/output (I/O) devices with the single CPU or multiple CPUs of the computer system.
A PCI-Express (PCIe) is a modification of the conventional PCI bus. Rather than the shared, parallel bus structure, PCIE uses a point-to-point high-speed switched architecture. Each PCIE link is a serial communications channel between two devices. The bandwidth of a PCIE link may be linearly scaled by adding signal pairs to form multiple lanes. A lane is defined as two unidirectional differential pairs that provide 2.5 Giga transfers per second per direction (GTps), 5.0 GTps or 8.0 GTps in each direction. Up to 32 of these lanes may be combined in x2, x4, x8, x16 and x32 configurations, creating a parallel interface of independently controlled serial links. Connections between PCIe I/O devices are made through ports. Ports are connected through the scalable serial link. Each byte (8 bits) is transmitted, with 8b/10b encoding and parallel-to-serial conversion, across one lane. The characteristics of the PCIE port and the lane data transfer rates are in compliance with PCI-Express Base Specification standardized by PCI-SIG. FIG. 9 shows a data transmission scheme in a PCIe link having one lane.
The PCIe architecture is organized in layers. Packets are initiated at the transaction layer. The transaction layer receives read and write requests from the software layer and creates packets for transmission to the link layer. Each packet has a unique identifier (source ID) that enables response packets to be directed to the correct originator. The packet format supports 32-bit memory addressing and extended 64-bit memory addressing. The link layer is responsible for data integrity and adds a sequence number and a cyclic redundancy check (CRC) to the transaction layer.
The PCIe architecture allows the interconnection of multiple devices to each other. The device that provides this connectivity is called a switch. A switch may contain several ports. A bridge is a switch that contains only two ports. Throughout this specification, the term switch includes both switches and bridges. A switch is called non-blocking when devices connected to it can communicate with each other at their full physical transfer rate. A switch is called blocking when devices cannot communicate at their full physical transfer rate across the switch. Commercially available PCIe switches support 24-lane 3 ports and 48-lane 12 ports, respectively, for example.
The PCIe switches may be connected to each other to provide increased connection capability in order to accommodate more devices. An ensemble of interconnected switches is defined as a switch fabric, which contains multiple ports for connecting upstream devices such as root complexes (CPUs with memories) and downstream devices such as add-ins, high-performance I/O devices. The term switch and switch fabric will be used alternatively hereinafter as they mean a switching structure providing physical ports and logic to forward data between devices, which are connected to the ports.
Many communications and networking applications such as Gigabit Ethernet (GbE), 10 Gigabit (Gb) Ethernet, Fibre Channel, and InfiniBand require higher bandwidth I/O. Switches must be scalable to adapt to the increased need in bandwidth. Bandwidth is defined as the rate at which data are moved across a physical connection. For example, the bandwidth of a GbE is 1,000 Mbps, and the bandwidth of a 10 Gb Ethernet is 10,000 Mbps. A PCIe x4 port provides an aggregate bandwidth of 10 Gbps over four full-duplex point-to-point connection (1 physical connector) running at 2.5 Gbps per lane.
In order to adapt to a large variety of applications, where some are time sensitive or time critical such as real-time audio processing or video compression/decompression and distribution, a data communication or computing server system must have scalable capability. And switches are often a limiting factor in the ability for communications or networking systems to scale, i.e., connections cannot be easily added to support increasing demand in bandwidth.
FIG. 10 shows a conventional switching structure in which 12 devices (servers or end-points) are connected to a 48-lane switch, each with a x4 port. Switches typically support 48 lanes, and connections to servers or endpoint are typically x4 ports. The lane efficiency, which is the ratio of the number of lanes that connect to end-points or servers to the number of lanes on all switches in the system, is a measure of the utilization of the switches in the system. The lane efficiency of this conventional system having one 48-lane switch is 1.0, for example, since all ports on the switch connect to endpoints or servers.
FIG. 11 shows another conventional configuration of three 48-lane switches connected together to enable connection of 24 devices. In this example, three 48-lane switches are used, and the switches are interconnected with x8 ports in a mesh pattern allowing 24 devices to be interconnected to each other, each device having a x4 port. The lane efficiency of this system is (24 devices×4 lanes/device)/(3 switches×48 lanes/switch) or 96/144=0.67. However, this configuration is blocking, meaning that it is unable to transfer traffic from end to end at the fullest possible bandwidth. For example, if three devices connected to Switch 1 attempt to send data to three devices connected to Switch 3, the required bandwidth is 12 lanes of data, but only 8 lanes connect the two switches.
In order for this blocking structure to become non-blocking, three additional 48-lane switches have to be added, for a total of six switches. The six switches are organized in two cross-coupled layers, with two switches in the upper layer and four switches in the lower layer (FIG. 12). Each lower-layer switch connect to each of the upper-layer switches with a x12 port. The resulting efficiency of switch lane use is (24 devices×4 lanes/device)/(6 switches×48 lanes/switch) or 96/288=0.33.