A virtual machine involves—a “virtualization”—in which an actual physical machine is configured to implement the behavior of the virtual machine. Multiple virtual machines (VMs) can be installed on a physical host machine, referred to as a ‘host’, which includes physical system hardware that typically includes one or more physical processors (PCPUs) and physical memory and various other physical devices, such as an IO storage adapter to perform protocol conversions required to access a remote storage such as over a storage access network (SAN). The virtual machine includes virtual system hardware that ordinarily includes one or more virtual CPUs (VCPUs), virtual memory, at least one virtual disk, and one or more virtual devices all of which may be implemented in software using known techniques to emulate the corresponding physical components. A VM typically will have both virtual system hardware and guest system software including virtual drivers used for various virtual devices. One or more layers or co-resident software components comprising a virtualization intermediary, e.g. a virtual machine monitor (VMM), hypervisor or some combination thereof acts to instantiate and provision VMs and to allocate host machine resources dynamically and transparently among the VMs so that their respective guest operating systems can run concurrently on a single physical machine.
Multi-core multi-processor systems are becoming increasingly common in commercial server systems because of their performance, scalability and modular design. These systems often include multiple cache nodes at various levels of a cache hierarchy. Caches are commonly used to temporarily store values that might be repeatedly accessed by a processor, in order to speed up processing by avoiding the longer step of loading the values from memory. Multi-level cache hierarchies can be provided where there are several levels of interconnected caches. For example, in a processor system having two cache levels, a level 2 (L2) cache may act as an intermediary between memory and one or more level 1 (L1) caches. A multi-processor system may include a last-level cache (LLC), which is shared by multiple core processors of the system. The LLC ordinarily is the closest cache to system memory and typically is the largest member of a cache hierarchy.
A host machine may employ an IO storage adapter to act as an interface to transfer data between the machine's IO bus and SCSI storage, for example. The IO storage adapter may include physical resources such as one or more processors, memory and other computing devices so as to perform various computing and communication functions. The IO storage adapter may implement a port based SCSI transport protocol, such as Fiber Channel, iSCSI or SAS to exchange data over a network. In accordance with the iSCSI transport protocol, for example, a SCSI initiator is responsible for packaging a SCSI command descriptor block (CDB) perhaps with the aid of a machine's operating system and sending the CDB over an IP network. A SCSI target receives the CDB and sends it to a SCSI logical unit managed by the SCSI target. The SCSI target sends back a response to the CDB that include a completion status that indicates the final disposition of the command.
FIG. 1 is an illustrative drawing showing a virtualization intermediary 110 that manages access to physical storage located within a storage area network (SAN) 106 through an IO storage adapter 102. Processing of IO requests and of IO completions is allocated among multiple physical processors PCPUs (not shown) of a multi-processor host machine 105. Unfortunately, the transfer of data over an IO storage adapter 102 between a multi-processor host machine 105 and storage 106 can result in non-uniform access to cache nodes (not shown) shared among the multiple PCPUs, which in turn can degrade IO performance. In the illustrated example, a VM 104 issues an IO access request to a virtual disk, blocks of which are carved out from logical device 108, that is disposed within the virtualization intermediary 110 and logically speaking, behind the adapter 102. Persons skilled in the art will appreciate that logical devices provisioned to the host, are the physical objects managed by the virtualization intermediary 110. The virtualization intermediary 110 in turn uses these logical disks to create the virtual disks provided to VMs. A virtualization intermediary 110 includes a request queue 112 to receive IO requests directed from the VM 104 to a logical unit within the storage 106 and that includes a completion queue 114 to receive responses directed from the storage 106 to the VM 104. The request queue 112 and the completion queue 114 are disposed in host memory 107 managed by the virtualization intermediary 110. The virtualization intermediary 110 includes a VMK (virtual machine kernel) hardware adapter (HA) driver 116 to communicate IO commands with the adapter 102. The adapter 102 includes DMA support 118 to exchange actual data directly to host system memory 107.
In operation with a host machine having a PCI bus, for example, the virtualization intermediary 110 may pseudo-randomly from among multiple PCPUs (PCPU0-PCPU3) 103 of the host 105 to issue a stream of IO request commands within the request queue 112. The storage 106, in turn, sends a response that includes completion information to the adapter 102. The adapter 102 notifies the virtualization intermediary 110 of receipt of such completion response by issuing an interrupt on a vector assigned to a PCI function containing the completion queue 114. The interrupt vector assigned to the IO storage adapter 102 is managed by the virtualization intermediary 110 so as to cause it deliver each IO completion interrupt to whichever PCPU is identified by the virtualization intermediary 110 to be least loaded across the entire host machine 105. Since the distribution of load on the PCPUs and interrupt delivery is pseudo-random, this approach often results in IO completion processing being allocated in a pseudo-random manner to the available PCPUs.
Referring to the request queue 112, the first request command CR1 in the illustrated series of commands is issued on PCPU P1. The second request command CR2 in the series is issued on PCPU P0. The third request command CR3 in the series is issued on PCPU P2. The fourth request command CR4 in the series is issued on PCPU P3. The fifth request command CR5 in the series is issued on PCPU P0.
Referring now to the completion queue 114, the first completion response CC1, which corresponds to the first request command CR1, is issued on PCPU P3. The second completion response CC2, which corresponds to the second request command CR2 is issued on PCPU P2. The third completion command CC3, which corresponds to the third request command CR3 is issued on PCPU P0. The fourth completion command CC4, which corresponds to the fourth request command CR is issued on PCPU P0. The reply to the fifth command CC5, which corresponds to the fifth request command CR5 is issued on PCPU P1. Note that responses need not be received in the same order in which the requests were made.
A request command and a corresponding completion response may have a producer-consumer relationship and each may require access to the same given data structure. For example, in the course of PCPU P1 processing of the request command CR1 in the request queue 112, information may be stored in a cache node (not shown) shared among the PCPUs 103 that may be needed for processing by PCPU P3 of corresponding completion command CC1 in the completion queue 114. Since different processors process the request command CR1 and the corresponding completion CC1 and these different processors may not share a common cache node, there may be a need to for PCPU P3 to access main memory 107 to obtain the given data structure needed to process the completion command Cc1. This in turn could result in the need for more processor cycles to process the completion than if the needed information could be obtained directly from a shared cache node.
Thus, there has been a need for improvement in the processing of IO transmissions.