Symmetric multiprocessing (SMP) is a well-known computer architecture whereby a single operating system instance controls multiple processors (CPUs) that are each connected to shared main memory. Each CPU is an execution engine with its own instruction pipeline, and can be one core of a multi-core processor. For example, a quad-core processor can be said to have four CPUs, and a computer system having four quad-core processors, therefore, has sixteen CPUs. The operating system can assign multiple threads to a corresponding multiplicity of CPUs, which execute the threads simultaneously in tandem.
In virtualization technology, a virtual machine (VM) is created as a software abstraction of a physical computer system, in which virtual resources of the VM are mapped by virtualization software, commonly referred to as a hypervisor, to underlying physical resources. The SMP architecture can be virtualized in this manner such that a particular VM has a plurality of virtual CPUs (VCPUs) each executing threads assigned to the VCPUs by a guest operating system that runs inside the VM. The hypervisor then assigns the corresponding VCPUs (or other abstraction of tasks) to underlying physical CPUs. There need not be a one-to-one relationship between VCPUs running in a VM and physical CPUs on the host (i.e., the hardware platform supporting the VM). In fact, a host having a single CPU can support VMs having a plurality of VCPUs, and vice versa.
In conventional systems, operating systems generally assume that the processors that the operating system manages run at approximately the same rate. For non-virtualized systems, the processors managed by the operating system are physical, are under the direct control of the operating system, and generally run off the same clock. However, in a virtualized environment, the processors managed by a guest operating system are abstractions that are scheduled by the underlying hypervisor that time-slices physical CPUs (PCPUs) so that the PCPUs can be shared across a number of VMs and host processes. At any particular point in time, a particular VCPU may be scheduled, descheduled, preempted, or blocked (i.e., waiting for some event). Therefore, inappropriate scheduling of VCPUs belonging to a VM can cause one VCPU to run faster than another VCPU, violating the assumption of the guest operating system, and potentially leading to errors or panic by the guest operating system.
The term “skew” is used herein to refer to the difference in execution time of one VCPU relative to another VCPU associated with an SMP VM. Skew can be expressed as a time measurement, which indicates an amount of progress one VCPU has made in comparison to another VCPU. The hypervisor uses well-known techniques, such as physical performance counters in the CPUs, to measure execution time and therefore skew. In prior systems, progress is determined by periodically sampling the state of each VCPU to determine whether the VCPU is running, and if so, incrementing a value. Skew is then calculated as the difference between values corresponding to different VCPUs.
Co-scheduling is implemented to reduce skew. Strict co-scheduling involves forcibly stopping sibling VCPUs when a particular VCPU falls too far behind, and restarting all of the VCPUs simultaneously after skew is detected. In relaxed co-scheduling, only a subset of the VCPUs of a VM are co-scheduled simultaneously after skew is detected. More specifically, in relaxed co-scheduling, only VCPUs that are skewed (i.e., lagging) beyond a particular threshold are co-started. This ensures that when any VCPU is scheduled, all other VCPUs that are lagging will also be scheduled, thereby reducing skew. More details of relaxed coscheduling are described in U.S. patent application Ser. No. 11/707,729, entitled, “Defining And Measuring Skew Between Coscheduled Contexts, filed Feb. 16, 2007, and incorporated herein by reference in its entirety.
Strict and relaxed co-scheduling work well with VMs having two to four VCPUs. However, as the number of VCPUs running in a single VM increases, the performance impact of simultaneously stopping and restarting sibling VCPUs becomes increasingly noticeable, even with hosts having a large number of physical CPUs.