A standard Peripheral Component Interconnect (PCI) bus is a local parallel bus that allows peripheral cards to be added into a single computer system. Examples of commercially available peripheral cards with PCI bus interface are SCSI (data storage) cards, wireless LAN add-in cards, analog and digital TV tuner add-in cards, USB, FireWire 1394 controllers, Gigabit Ethernet add-in cards, etc. The PCI bus communicates with a single CPU or multiple CPUs of the computer system through a PCI-bridge controller. Several PCI bridges may exist in a computer system and couple a diversity of input/output (IO) devices with the single CPU or multiple CPUs of the computer system.
A PCI-Express (PCIe) is a modification of the standard PCI bus. The PCIe uses a point-to-point high-speed serial communication link instead of a bus structure. In order to maintain software compatibility, it is architected with the same PCI tree structure IO interconnect topology. Consequently, a PCIe link is equivalent to a logical PCI bus, i.e., each link is assigned a bus number by the enumerating software.
PCIe was originally designed for desktops connecting a root complex (a host CPU with memory) with downstream IO devices, but has since found applications in servers, storages devices, and other communications systems. The base PCIe switching structure of a single root complex has a tree topology, which addresses PCIe endpoints through a bus numbering scheme. Currently, PCIe does not permit sharing of PCI adapters in topologies where there are multiple hosts with multiple shared PCI buses. PCIe peripherals such as Fibre Channel host bus adapters, Infiniband host channel adapters, Gigabit Ethernet network interface cards are integrated into a physical server system. This makes the IO system very inflexible as the server IO capability cannot be scaled in real-time or shared with other servers in a multi-root computing system.
CPU computational power has been doubling every 18 months following Moore's Law. Upgrading the network infrastructure by replacing the current IO interface modules with state-of-the art modules is one way to keep up with the CPU capability increase. As physical servers, especially blade servers, have limited hardware space to accommodate IO ports, and state-of-the-art IO adapters are expensive, engineers are looking for ways to share physical IO resources in multi-root server computing systems.
PCI-SIG Working Group is developing a new specification that adds IO virtualization capability to PCIe. The new specification, still in the development process, specifies two levels of IO virtualization: the single-root IO virtualization (SR-IOV) and the multi-root IO virtualization (MR-IOV). The SR-IOV provides a standard mechanism for endpoint devices to advertise their ability to be simultaneously shared among multiple virtual machines running on the same hardware platform (one host CPU). The MR-IOV allows sharing of an IO resource between multiple operation systems running on multiple hardware platforms (multiple host CPUs).
In order to support the multi-root topology, PCIe switches and IO devices should be MR-aware (i.e., they are capable of supporting a multi-root system). MR-aware IO adapters and PCIe switches must have additional register sets to support the various root-complex routings, and an MR-aware PCIe switch must contain two or more upstream ports. The MR-IOV specification requires modifications in the data link layer. A change is also necessary in the configuration software to configure the switch fabric and the MR-aware endpoint devices.
The adoption of MR-IOV requires modification in hardware and software. For that reason, MR-aware endpoint devices may not be available for a long time.
An alternative solution is to use non-transparent bridges which are interposed between root complexes and endpoint devices. The non-transparent bridge is a bridge that exposes a Type 0 control-and-status register (CSR) header on both sides and forwards transactions from one side to the other with address translation. Because it exposes a Type 0 CSR header, the non-transparent bridge appears to be an endpoint to discovery and configuration software. Since devices on one side of the bridge are not seen or exposed on the other side as in a conventional PCI bridge, this bridge is referred to as a non-transparent bridge. Non-transparent bridges add expense and complication to the PCIe system and require higher layer applications to properly complete discovery and enumeration of a system (FIG. 1).
Therefore, current IO adapters and current PCIe devices do not have IO virtualization capabilities. Existing IO adapters and PCIe switches are designed to be controlled by a single device driver in a single operating system. PCI-SIG Working Group is developing a new specification for multi-root IO virtualization (MR-IOV). The deployment of MR-IOV capable systems requires new hardware and software. MR-IOV switches and endpoints are currently not available. Non-transparent bridges are an interim solution for sharing IO resources in a multi-root server system. Its deployment requires additional installation of dedicated software in each server to access endpoints across the non-transparent bridge.
Motivation is high to have a system and method for sharing endpoints among multiple servers located in different root complexes without the need of modifying existing operating systems or deploying new MR-aware switches and MR-aware endpoint devices.