The invention relates to a mechanism for scheduling virtual interfaces.
Today's computer systems are heavily virtualized to make use of the growing number of cores per processor, e.g. consolidating many servers into one. While processor virtualization is available in all general purpose processors today, one of the major remaining challenges is the virtualization of I/O devices such that all virtual machines, or even all consumers, have their private interface to I/O devices, bypassing the hypervisor for better performance. The third generation of PCI Express therefore introduces I/O virtualization extensions (SRIOV, MRIOV).
However, hardware virtualization is limited by resources in the I/O controller. In this regard, e.g. InfiniBand provides virtual interfacing without dedicated hardware support in the I/O controller and is thus able to provide more virtual interfaces.
The virtual interfaces (VIs) are conventionally implemented using memory queues, Consumers put work requests on the queues and ring their device doorbell to notify the device of new work. Each virtual interface further requires its own context in the device.
There are two major challenges for designing a virtualized I/O device, no matter if hardware or software virtualization is used:
First, the device needs to support a large number of virtual interfaces, which usually exceeds the storage space that can be implemented in a device. Therefore, devices may use small on-chip caches for a limited number of virtual interface contexts, and store the remainder in large backing store memory.
The second challenge is implementing the device processing such that it is able to support all load scenarios up to the case where all VIs have outstanding work while also scaling performance.
For the processing part, there are two device architecture alternatives: monolithic queue processors or pipelined processing.
A common prerequisite for both is that during processing, virtual interface context data needs to stay in the device cache to avoid inconsistencies and stalls.
Many device implementations use monolithic processing units for processing, because it is easier to design, debug and make late changes. Especially network processing is very data dependant, requiring several data fetches during processing, at least two for work request and payload data. Data misses that need to be resolved by system memory requests stall processing in a monolithic design, such that queue processors are most of the time just waiting for data. Thus, the major problem of this approach is efficient scaling as monolithic queue processors are inefficient in resource usage compared with a pipeline and contend for shared resources. On the other hand, it is quite easy to keep track of the virtual interfaces processed. The device only needs to check the virtual interfaces in the different processors.
A pipelined design is much more efficient in terms of resource usage and performance scaling. However, it is more difficult to keep track of the virtual interfaces that are currently in flight in the pipeline, either in a processing unit or in a queue. Checking all pipeline stages is very tedious and slow, limiting scalability. However, non-extensive checking potentially allows virtual interface castouts while there is work remaining for the virtual interface (VI) due to race conditions.