Computer system virtualization allows multiple operating systems and processes to share the hardware resources of a host computer. Ideally, the system virtualization provides resource isolation so that each operating system does not realize that it is sharing resources with another operating system and does not adversely affect the execution of the other operating system. Such system virtualization enables applications including server consolidation, co-located hosting facilities, distributed web services, applications mobility, secure computing platforms, and other applications that provide for efficient use of underlying hardware resources.
Virtual machine monitors (VMMs) have been used since the early 1970s to provide a software application that virtualizes the underlying hardware so that applications running on the VMMs are exposed to the same hardware functionality provided by the underlying machine without actually “touching” the underling hardware. As IA-32, or x86, architectures became more prevalent, it became desirable to develop VMMs that would operate on such platforms. Unfortunately, the IA-32 architecture was not designed for full virtualization as certain supervisor instructions had to be handled by the VMM for virtualization, but could not be handled appropriately because use of these supervisor instructions could not be handled using existing interrupt handling techniques.
Existing virtualization systems, such as those provided by VMWare and Microsoft, have developed relatively sophisticated virtualization systems that address these problems with IA-32 architecture by dynamically rewriting portions of the hosted machine's code to insert traps wherever VMM intervention might be required and to use binary translation to resolve the interrupts. This translation is applied to the entire guest operating system kernel since all non-trapping privileged instructions have to be caught and resolved. Furthermore, VMWare and Microsoft solutions generally are architected as a monolithic virtualization software system that hosts each virtualized system.
The complete virtualization approach taken by VMWare and Microsoft has significant processing costs and drawbacks based on assumptions made by those systems. For example, in such systems, it is generally assumed that each processing unit of native hardware can host many different virtual systems, thereby allowing disassociation of processing units and virtual processing units exposed to non-native software hosted by the virtualization system. If two or more virtualization systems are assigned to the same processing unit, these systems will essentially operate in a time-sharing arrangement, with the virtualization software detecting and managing context switching between those virtual systems.
This problem is compounded when considering other computing resources available to a virtualization system. For example, although virtual systems generally share processing units and communication interfaces, multiple virtual systems are typically entirely separated in terms of memory resources allocated thereto, with little (if any) sharing of resources available. When data is transferred among those systems, it typically must be transferred via a virtualized network or some other operation that introduces latency into the system due to the various layers of abstraction through which the data must be processed (e.g., through a Mellanox-based Infiniband communication protocol). This is the case even where two virtual systems are co-located on the same hardware or at least hardware in close proximity and connected by some type of high-speed interconnect, since the greatest level of distance possible in such a system are typically assumed.
Additionally, existing VMMs have relatively structured requirements for the definition of a partition. Accordingly, the VMM and partition for each particular partition are required to be co-located, and further requiring some dedicated memory, a dedicated or shared processing unit, and optionally I/O devices, which may also be shared among partitions. This limits the functionality and usability of those partitions, since the useability of the partition is limited by the resources that can be provided on a particular hardware platform. This limitation is particularly apparent in the case of database systems; although such systems perform at operations of magnitude better when data resides in memory, typical databases are sized much larger than allocated memory systems, such that on-disk storage is required on a particular host hardware system.
For these and other reasons, improvements are desirable.