In order to provide a secure operating environment, the x86 architecture provides a mechanism for isolating user applications from the operating system using “privilege levels.” In this model, a processor provides 4 privilege levels, also known as rings, which are arranged in a hierarchical fashion from ring 0 to ring 3. Ring 0 s the most privileged level with full access to the hardware and ability to call privileged instructions. The operating system runs in ring 0 with the operating system kernel controlling access to the underlying hardware. Rings 1, 2 and 3 operate at a lower privilege level and are prevented from executing instructions reserved for the operating system. In commonly deployed operating systems, user applications run in ring 3. Rings 1 and 2 historically have not been used by modern commercial operating systems. This architecture ensures that an application running in ring 3 that is compromised cannot make privileged system calls; however, a compromise in the operating system running in ring 0 hardware exposes applications running in the lower privileged levels.
The x86 architecture provides another mechanism called “virtualization” for isolating user applications from the operating system. Virtualization permits multiplexing of an underlying host machine between different virtual machines. The host machine allocates a certain amount of its resources to each of the virtual machines. Each virtual machine is then able to use the allocated resources to execute applications, including operating systems (referred to as guest operating systems (OS)). The software layer providing the virtualization is commonly referred to as a hypervisor and is also known as a virtual machine monitor (VMM), a kernel-based hypervisor, or a host operating system.
In a virtualized environment, the hypervisor runs at the most privileged ring level 0, controlling all hardware and system functions. The virtual machines run in a lower privileged ring, typically in ring 3. Since a guest operating system may have been originally designed to run directly on hardware, it expects to be running in ring 0 and may make privileged calls that are not permitted in ring 3. When the guest operating system makes these privileged calls, the hardware traps the instructions and issues a fault, which typically destroys the virtual machine.
An early attempt to overcome this problem was “emulation,” in which guest operating system instructions of a virtualized x86 machine were fully translated from a guest format to a host format by the hypervisor. Unfortunately, emulation resulted in very poor performance. As a result, binary translation was developed. In this model, the hypervisor scans the virtual machine memory, intercepts privileged calls before they are executed, and dynamically rewrites the code in memory. The guest operating system is unaware of the change and operates normally. This combination of trap-and-execute and binary translation allows any x86 operating system to run unmodified upon the hypervisor.
A more recently developed technique is known as paravirtualization. In paravirtualization, the guest operating system running in the virtual machine is modified to replace all the privileged instruction calls with direct calls into the hypervisor. In this model, the modified guest operating system is aware that it is running on the hypervisor and can cooperate with the hypervisor for improved scheduling and I/O, removing the need to emulate hardware devices such as network cards and disk controllers.
In one implementation of paravirtualization, the hypervisor is responsible for core hypervisor activities such as CPU, memory virtualization, power management, and scheduling of virtual machines. The hypervisor loads a special privileged virtual machine called a paravirtualized machine that runs in domain 0. The paravirtualized machine has direct access to hardware and provides device drivers and I/O management for virtual machines.
Each virtual machine contains a modified kernel where CPU and memory accesses are handled directly by the hypervisor but I/O is directed to the paravirtualized machine. Requests for I/O are passed to a “back end” process in the paravirtualized machine which manages the I/O. In this model, the guest operating system runs in ring 1 while user space runs in ring 3.
With paravirtualized machines, the back end shares memory with the guest where requests are placed in the shared memory so that the hypervisor does not need to translate and execute requests. Unfortunately, since this shared memory resides in domain 0, both the virtual machine and the host machine are subject to compromise.