In traditional, or single-level, machine virtualization a hypervisor controls the hardware (bare-metal) resources and runs one or more concurrent virtual machines (VMs), each VM running its own guest operating system. Nested virtualization enables a bare-metal hypervisor (level-0 or L0) to run one or more hypervisors (level-1 or L1), each of which can run its own set of VMs [18, 7, 29, 13] (level-2 or L2). Nested virtualization has many known potential benefits [7]. It can be used to host VMs running commodity operating systems, such as Linux and Windows, that utilize hardware virtualization to host other operating systems. Hypervisors that are embedded in firmware [15, 31] could use virtualization to run other hypervisors. Infrastructure-as-a-Service (IaaS) providers could use nested virtualization to allow users to run their own hypervisors and to allow migration of VMs across different IaaS providers [45]. Nested virtualization could also allow new approaches to hypervisor-level security [35, 33, 37, 20, 21, 14, 4], hypervisor development, and testing.
Besides the above benefits, nested virtualization also opens up a new possibility. L1 hypervisors that provide different services could be co-located on the same machine. An L2 VM according to the present technology could simultaneously use these diverse L1 services. For instance, besides running on a commodity L1 hypervisor, an L2 VM could simultaneously run on another L1 hypervisor that provides an intrusion detection service, or a deduplication [46] service, or a real-time CPU or I/O scheduling service.
Unfortunately, current nested virtualization solutions restrict an L2 VM to run on only one L1 hypervisor at a time. This prevents an L2 VM from taking advantage of services from multiple L1 hypervisors.
Nested VMs were originally proposed and refined in [16, 17, 32, 5, 6]. IBM z/VM [29] was the first implementation of nested VMs using multiple levels of hardware support for nested virtualization. Ford et al. [13] implemented nested VMs in a microkernel environment. Graf and Roedel [18] and Ben-Yehuda et al. [7] implemented nested VM support in the KVM [23] hypervisor on AMDV [1] and Intel VMX [42] platforms respectively. Unlike IBM z/VM, these rely on only a single level of hardware virtualization support. Prior nested VM platforms restrict the L2 VM to execute on a single L1 hypervisor at a time. Although one can technically live migrate [11, 19] an L2 VM from one L1 hypervisor to another, the “one-hypervisor-at-a-time” restriction still applies. None of the prior approaches allow a single L2 VM to execute simultaneously on multiple L1 hypervisors on the same physical machine.
Distributed operating systems, such as Amoeba [36, 2] and Sprite [22], aim to aggregate the resources of multiple networked machines into a single pool. ScaleMP [43] is a commercial system that provides a distributed hypervisor spanning multiple physical machines, to transparently support SMP VMs, and also supports nested VMs via a feature called VM-on-VM, but does not appear to support multi-hypervisor VMs. Further, being a proprietary product, very few implementation details are available. DVM [38] implements a distributed virtual machine service for the Java platform by moving system services such as verification, security enforcement, compilation and optimization, out of the client into central servers. In contrast to such systems that aggregate resources across multiple physical machines, the present technology, called Span, transparently supports nested VMs that span multiple co-located L1 hypervisors.
A related line of research relates to dis-aggregating the large administrative domain [25, 12, 10, 40] typically associated with a hypervisor, such as Domain 0 in Xen. The goal of these efforts is to replace a single large administrative domain with several small sub-domains (akin to privileged service-VMs) that are more resilient to attacks and failures, better isolated from others, and can be customized on a per-VM basis. Thus a VM could pick and choose the services of specific sub-domains which run at the same level as the VM atop the common hypervisor. In contrast to prior efforts, the present technology supports running a VM simultaneously on multiple lower-level hypervisors, each of which could possibly offer specialized hypervisor-level services.
As only L0 can execute in the highest privileged mode, all privileged instructions executed by L1 and L2 are trapped by L0. This same hierarchical constraint would generally apply to a deeper set of hypervisors: each hypervisor can execute with no further privilege than its parent, and typically, certain privileges are reserved to the parent or L0 and denied to the child, thus functionally distinguishing the layers.