Embodiments of the invention pertain to virtual machines. More particularly, embodiments of the invention pertains to allowing a virtual machine monitor (VMM) to recapture control of a processor when the privilege level of guest software running in a virtual machine (VM) meets a privilege level exiting criteria.
A conventional VMM may run on a computer and presents to other software the abstraction of one or more virtual machines. Each VM may function as a self-contained platform, running its own operating system (OS), such as a “guest operating system”, and applications, collectively known as the “guest software.” The guest software is said to be running in or on a VM. The guest software expects to operate as if it were running on a dedicated computer rather than a VM. That is, the guest software expects to control various computer operations and have access to hardware resources during these operations. The hardware resources may include processor-resident resources, such as control registers, and resources that reside in memory, such as descriptor tables. However, in a virtual-machine environment, the VMM should be able to have ultimate control over these resources to provide proper operation of VMs and protection from and between VMs. To achieve this, the VMM typically intercepts and arbitrates all accesses made by the guest software to the hardware resources.
Most instruction set architectures (ISAs) define multiple privilege levels to isolate less-privileged applications from more privileged operating system functionality. For example, one prior art 32-bit architecture has four privilege levels, referred to as ring 0 through ring 3, with ring 0 being the most privileged and ring 3 being the least privileged. The processor provides controlled ways to switch between the different privilege levels. Switches may be explicit by invoking a special instruction or implicit by raising an exception or fault, or by an external event such as an interrupt. For example, a privilege level change may occur during execution of an instruction, such as a call (CALL), a software interrupt (INT), or an interrupt return (IRET). A privilege level change may also occur as a result of other synchronous or asynchronous events such as, for example, exceptions, external interrupts, faults, task switches, traps, and other similar events.
Operating systems for multi-processor or multi-threaded systems protect data that might be accessed simultaneously from more than one thread or processor with software implemented locks ensuring mutual exclusion. In cases where locks are usually held for a short time, so-called spin locks may be used. When software operating on one processor or thread attempts to acquire a lock that is already taken by software operating on another processor or thread, the software attempts to reacquire the lock in a tight code loop. While running in this tight loop the software does not perform any useful work and the hardware processor thread provides no benefit. On multi-threaded processors or multiple processor systems, the execution of one thread or processor may take away resources of the other threads or processors, such as consuming bandwidth, execution units, or power. Therefore, the spinning period should be as short as possible.
VMMs virtualizing multi-processor or multi-threaded systems may execute the software running on each instance of guest software in a separate VM, or virtual processor (VP). On a non-VM system, these instances of guest software would execute on a distinct processor or thread. Collectively the VPs and all instances of the guest software are referred to as a virtual system. Such a VMM may experience significant degradation when not taking this guest locking behavior into account. Hence, the VMM should not preempt a VP while that VP holds a lock unless it preempts all VPs for that virtual system. Since locking primitives cannot be directly detected by hardware when locks are implemented by software, heuristics or indirect observation techniques may be used.
One such heuristic is based on common OS behavior. While an OS is not executing in privileged mode or while it is in a low-power state, the OS likely holds no locks. A VMM may take advantage of that knowledge by only preempting a virtual processor which is executing in an unprivileged mode or is in a low power state. Preemptions of guest software executing in privileged mode is deferred until the guest software switches to unprivileged mode.