1. Field of the Invention
This invention relates generally to the field of computer system architecture and more particularly, to an architecture that allows mapping between computing nodes and shared or non-shared I/O devices.
2. Description of the Related Art
Computing systems often contain multiple compute nodes. For example, computing systems may include multiple CPUs, one or more multi-core CPUs, CPUs that operate in multiple operating system domains, and/or multiple single-board computers configured as blades and mounted in a common chassis or drawer. In addition, computer nodes may be interfaced to multiple I/O devices. I/O devices may be any devices that allow data to be transferred to or from the compute nodes. For example, compute nodes may be coupled to one or more network interfaces such as Ethernet, storage area network interfaces such as Fibre Channel, graphics cards, USB or Firewire controllers, etc. In addition, redundant connections may also be desired to improve availability and reliability of the I/O interfaces. In modern computer systems, an interface subsystem placed between the compute nodes and the I/O devices may include a variety of chipsets connecting a host bus on the compute node side to one or more I/O buses on the other side, such as ISA, EISA, PCI, PCI-X, compact PCI, AGP, etc.
In order to make more effective use of the I/O devices in a system, the interface subsystem may be designed to permit compute nodes to share I/O devices. For instance, in a computer system that uses multiple blades to increase the available processing power, instead of placing I/O interface chipsets and I/O devices on each blade, each blade may interface to a set of shared I/O cards through a midplane that includes hardware to replace the function of the interface chipsets. The resulting architecture may provide a lower overall system cost, higher configuration flexibility, and more complete utilization of I/O devices. One skilled in the art will appreciate that a system of blades coupled to I/O devices through a midplane is but one example of an architecture in which I/O interface chipsets are separate from the compute nodes. What should be appreciated is that regardless of the type of compute nodes and I/O devices provided, some type of I/O interface permits the I/O devices to be shared. Further, the I/O interface may allow compute nodes to be designed, manufactured and sold separately from the I/O devices. Still further, the I/O interface may provide switching between compute nodes and I/O devices. Still further, the I/O interface may allow multiple compute nodes, operating independently and having one or more operating system domains, to share I/O devices as if the devices were dedicated to them.
In addition to the foregoing design considerations, efficient I/O interfaces are typically implemented in hardware or a combination of hardware and software. In the following descriptions, such I/O interfaces may be described as virtualization hardware, although it is understood that some functions of the I/O interface may comprise software and/or hardware. Virtualization hardware may typically include one or more switches to interconnect the compute nodes with the I/O devices. These switches combine together to create a virtual view of a switch fabric for each compute node. That virtual view may or may not correspond to the physical fabric layout.
One implementation of virtualization hardware uses the PCI Express (PCIe) protocol to interconnect compute nodes and I/O devices. In this implementation, the virtualization hardware presents a virtual view of a PCI Express system to each compute node. This virtual view contains virtual PCI Express switches for some or all of the physical PCI Express switches in the fabric. The virtual view also contains virtual I/O devices for some or all of the I/O devices in the fabric.
In a classic PCI Express (PCIe) I/O system there is one host processor (root) and several I/O devices. The root is associated with a single operating system and each I/O device is bound to that root. As processing power has increased, it has become possible to run multiple independent operating systems on a host processor. This introduces the problem of sharing an I/O device among multiple operating systems. Technology has also advanced in the I/O devices. I/O device bandwidth has evolved to where a single I/O device has more bandwidth than is needed by a single host processor. To be cost efficient it is advantageous to be able to share a single I/O device among multiple host processors, each of which may support multiple independent operating systems. The PCI Special Interest Group (PCI SIG) has defined two independent I/O virtualization standards to solve both of these problems:                SR-IOV—allows a single host processor supporting multiple operating systems to share a single I/O device.        MR-IOV—allows multiple host processors to share a single I/O device.        
FIG. 1 illustrates a prior art embodiment of an SR-IOV system 100. In the illustrated embodiment, a single host processor 110, a PCIe switch 130, and SR PCIe endpoints 140, 150, and 160 are shown. PCIe switch 130 includes ports 131-134. Host processor 110 includes operating systems (OS) 115 and 116 that are managed by a hypervisor 114. Host processor 110 also includes a PCI root port 112, through which host processor 110 is coupled to switch port 131. Switch ports 132, 133, and 134 are coupled to SR PCIe endpoints 140, 150, and 160 respectively. From each OS's point of view, each endpoint is a dedicated resource to that OS. This allows existing software to be used without modification to communicate with the endpoints. In reality, however, multiple OS's are sharing the I/O device. Note a conventional PCIe switch may be used in an SR-IOV system. The endpoints have SR-IOV extensions to allow multiple OS's to share the endpoint resources.
FIG. 2 illustrates a prior art embodiment of an MR-IOV system 200. In the illustrated embodiment, single host processors 210 and 220, an MR PCIe switch 230, and MR PCIe endpoints 240, 250, and 260 are shown. PCIe switch 230 includes ports 231-235. Host processors 210 and 220 share the endpoints 240, 250, and 260. Each of host processors 210 and 220 may include one or more operating systems and a hypervisor (not shown). In addition host processor 210 includes a PCI root port 212 and host processor 220 includes a PCI root 222 through which to communicate with ports 231 and 232, respectively, of switch 230. Switch ports 233, 234, and 235 are coupled to MR PCIe endpoints 240, 250, and 260 respectively. Note both switch 230 and endpoints 240, 250, and 260 have MR-IOV extensions.
FIG. 3 illustrates one embodiment of an SR-IOV software system 300. In the illustrated embodiment a single host processor 310 hosts three OS's 320, 330 and 340 that share a single SR endpoint 360. Each of OS's 320, 330, and 340 includes a respective one of virtual function (VF) drivers 322, 332, and 342. A physical function (PF) driver 352 may run in a Hypervisor 350 that manages endpoint 360 and may provide services to the VF drivers. Note all the OS's are running in a single host processor that, in one embodiment, may be a symmetric multiprocessor using a variety of multi-core multi-chip or multi-thread techniques. Note that the PF driver 352 may also be located outside the Hypervisor 350, for example in a designated OS (such as OS 320). Such a designated OS may also contain a VF driver.
One problem is the endpoint extensions for SR-IOV and MR-IOV are different, so with current designs endpoint manufacturers have to generate two different products if they desire to satisfy both markets. The SR extensions have a lower implementation cost than the MR extensions. It would be desirable for endpoint manufacturers to only have to implement one set of extensions (preferably SR extensions because they have a lower cost) that could solve both the SR and MR system solutions. Accordingly, what is needed is a design that can enable an SR endpoint to function as an MR endpoint in an MR system. In addition, it is desirable that the design enables other types of endpoints to operate in MR-IOV systems.