1. Field of the Invention
The present invention relates to a computer system that allocates and schedules host operating system (OS) resources among Virtual Machine processes.
2. Description of the Related Art
Modern computer systems include a CPU that executes a number of different tasks and processes. The basic operating system of the computer concept includes “threads,” and host OS typically uses thread scheduling mechanisms. Threads are given a certain priority, meaning that some threads take precedence over others. The major function of the CPU thread scheduler is to keep the thread with the highest priority running at all times (or at least most of the time), and, if there is more than one thread with the highest priority, deciding which thread to run for a time quantum. The quantum of time is a unit of time during which a thread can run without being preempted by another thread if another high priority thread becomes active. Thus, if the thread does not complete its tasks before its time quantum ends, the thread is preempted and the next waiting thread with the highest priority is dispatched for execution.
There are “root mode” and “non-root mode” implemented in processors with virtualization support. In the “root mode” instructions can have direct access to the real hardware and “root mode” executes instructions outside any virtualization environment. The “non-root mode” is intended for running virtual environments and Virtual Environments are isolated at least partially.
There are two well known VMM architectures: stand-alone VMM (so called “type I”) and hosted VMM (so-called “type II”). The stand-alone VMM basically operates as a full-fledged operating system. It runs on the system level and creates Virtual Machines on user level. Such a VMM itself controls all hardware, and, similar to an operating system, requires device drivers for each hardware device. Virtual Machines use VMM services for communicating with actual hardware.
Another type of VMM is a type II VMM that runs as an application on top of an existing host operating system. It runs on a user level and creates Virtual Machines on the user level as well. Since the VMM does not have privileges to use direct execution in the native mode, it employs full binary translation for virtualization. The VMM uses host OS services for memory management, processor scheduling, hardware drivers, and resource management.
Another type of VMM is a hosted VMM runs simultaneously with an existing host OS. It coexists with the host OS on the system level and executes Virtual Machines on the non-root or user (referred to the case of lightweight hypervisor) level. The VMM and the host OS are able to independently, without being interrupted, issue any processor instruction and modify any aspect of the processor state. The hosted VMM uses the host OS services for memory management, processor scheduling, hardware drivers, and resource management.
Hypervisor-based VMMs combine the advantages of the prior art systems and eliminate major disadvantages. The Hypervisor runs on the system level and creates Virtual Machines on the user level. One of the Virtual Machines runs a so-called “primary OS” and has privileges to handle some of the hardware directly. Other Virtual Machines and the Hypervisor use the primary OS and its hardware drivers for communication with the actual hardware. At the same time, the Hypervisor employs efficient memory management, processor scheduling and resource management without the help of the primary OS. The advantages of the Hypervisor-based approach are high Virtual Machine isolation, security, efficient resource management and a small trusted Hypervisor footprint.
In a traditional scheme with Hypervisor, processor interrupts being generated by system calls force the processor to switch state to the kernel level. It is realized inside interrupt handler.
A Lightweight Hypervisor runs on the system level, and handles some of the low level hardware resources to help the VMM performs a more effective virtualization. For example, the Lightweight Hypervisor reloads the Interrupt Descriptor Table (IDT) and protects it from modification by the primary OS and the VMM. The primary OS and the VMMs are not allowed to modify IDT, and they are not allowed process interrupts independently. It further provides correct virtualization of interrupts. The Lightweight Hypervisor coexists in all address spaces (primary OS context and VMM contexts) and exclusively processes all hardware interrupts and exceptions. It is responsible for interrupt forwarding, context switching between the primary OS and VMMs, and efficient resource scheduling.
The Hypervisor can be implemented as a driver for the primary OS. It is activated after the first Virtual Machine starts and is deactivated after the last Virtual Machine stops (alternatively, it might never be deactivated and will stay active until host OS reboot). Alternatively, the Hypervisor can be a separate, independently-loaded binary file.
After activation, the Lightweight Hypervisor can load its own IDT instead of the primary OS IDT, and takes control over all interrupts and exceptions. The Hypervisor protects its IDT against writes, and controls all attempts of the primary OS and of its subsystems to write to the IDT.
The Lightweight Hypervisor is co-resident in the primary OS and all VMM address spaces and handles all interrupts in all contexts. The Hypervisor is responsible for forwarding interrupts to the primary OS and VMMs. The Hypervisor detects primary OS' and its subsystems' write access to the IDT and maintains an interrupt subscription list for primary OS components.
Described above VMM/Hypervisor (referred later as “Lightweight Hypervisor”) based on virtual systems are shown for exemplary goals only. Invention can be also used with other types of virtual systems where present mechanism of at least partial handling resources.
In Hypervisor-based VMMs, the virtualization system is divided into two parts:
(a) A Hypervisor, which allows at least partially handling of hardware resources and low level software resources and events to provide more effective integration with primary OS, or with a primary OS service that allows communication with hardware in general, and more effective functioning of several VMMs simultaneously, when running on single host.
(b) A VMM which allows to run guest operating systems in their own virtualization environment. In such an architecture, the VMM is the mechanism for virtualizing a guest OS environment. The Hypervisor is the mechanism for effective coexistence of several VMMs simultaneously, and the Hypervisor allows handling of necessary resources in the most effective manner for further use in virtualization engine (in a VMM).
Another example of a Hypervisor-based architecture is Open Bus Hypervisor (described in U.S. Provisional Patent Application No. 60/951,147, filed Jul. 20, 2007, incorporated herein by reference) which enables deeper integration with primary OS, handles hardware virtualization CPU features, maintains communication between different parts of virtualization environment, etc.
Xen multiplexes physical resources at the granularity of an entire operating system and provides performance isolation between them. It virtualizes all architectural features required by existing standard ABIs (application binary interface). Xen presents a Virtual Machine abstraction that is not identical to the underlying hardware—an approach which has been dubbed “paravirtualization”. It does not require modifications to the guest operating system. Also, changes to the ABI are not required, and hence no modifications are required to guest applications.
Xen currently schedules domains according to a Borrowed Virtual Time (BVT) scheduling algorithm. This algorithm has a special mechanism for low-latency wake-up (or dispatch) of a domain when it receives an event. Fast dispatch minimizes the effects of virtualization on OS subsystems that are designed to run in a timely fashion.
VMware ESX Server is designed to multiplex hardware resources among the VMs. The requests from the applications supported by the guest OSs are scheduled by the using by the physical CPUs on the host OS. The guest OS is used by the Virtual CPU to run different processes. The VMM manages the VM's access to such hardware resources as CPU, memory, disk storage, network adapters, mouse, and keyboard.
When there is an attempt to use the busy shared resource, then this process must wait before the ESX Server can allocate the physical CPU to it. This situation may arise when multiple processes are trying to use the same physical CPU. The ESX Server scheduler manages access to the physical CPUs on the host system. Different processes must wait in the queue in a “ready-to-run” state before they can be scheduled on a CPU, which is known as “ready time”. Also, there are maintenance activities that the operating system must perform, for example, when the guest OS is idle. Thus, even idle guest processes must be scheduled, consume CPU resources and accumulate ready time.
It should be noted that this invention is not limited to the types of VMMs described above, these are only examples where the present invention can be used.
Accordingly, there is a need in the art for a more efficient mechanism for scheduling of resources for a system where there are multiple Virtual Machines.