A “virtual machine” or a “VM” refers to a specific software-based implementation of a machine in a virtualization environment, in which the hardware resources of a real computer (e.g., CPU, memory, etc.) are virtualized or transformed into the underlying support for the fully functional virtual machine that can run its own operating system and applications on the underlying physical resources just like a real computer.
Virtualization works by inserting a thin layer of software directly on the computer hardware or on a host operating system. This layer of software contains a virtual machine monitor or “hypervisor” that allocates hardware resources dynamically and transparently. Multiple operating systems run concurrently on a single physical computer and share hardware resources with each other. By encapsulating an entire machine, including CPU, memory, operating system, and network devices, a virtual machine is completely compatible with most standard operating systems, applications, and device drivers. Most modern implementations allow several operating systems and applications to safely run at the same time on a single computer, with each having access to the resources it needs when it needs them.
Virtualization allows one to run multiple virtual machines on a single physical machine, with each virtual machine sharing the resources of that one physical computer across multiple environments. Different virtual machines can run different operating systems and multiple applications on the same physical computer.
One reason for the broad adoption of virtualization in modern business and computing environments is because of the resource utilization advantages provided by virtual machines. Without virtualization, if a physical machine is limited to a single dedicated operating system, then during periods of inactivity by the dedicated operating system the physical machine is not utilized to perform useful work. This is wasteful and inefficient if there are users on other physical machines which are currently waiting for computing resources. To address this problem, virtualization allows multiple VMs to share the underlying physical resources so that during periods of inactivity by one VM, other VMs can take advantage of the resource availability to process workloads. This can produce great efficiencies for the utilization of physical devices, and can result in reduced redundancies and better resource cost management.
To illustrate, consider the scenario where it may be desirable to use virtualization to provide the same base disk image to a large number of users. For example, a public library may seek to provide access to computers for members of the general public. In this situation, where random users may walk off the street to access the shared computers, there is no need to customize the computers that are accessible to the public. Instead, virtualization can be utilized to display the same base image (e.g., selected operating system/desktop and applications) on each of the computers accessible to the members of the public.
“Cloning” is common approach that can be taken to allow the same base image to be used by multiple virtualization computing nodes. FIG. 1A illustrates this situation, where a virtual disk 106 may include a base image that is utilized by one or more virtualization nodes in the system. Each of the virtualization nodes includes a hypervisor to implement the virtualization functionality. Here, a first virtualization node 1 includes a hypervisor 104a that allows it to access a base image on a disk 106.
Consider if there are other virtualization nodes 2 and 3 that also seek to use the base image on virtual disk 106. One possible approach is to allow “full clones” of the base image to be created for each of the other virtualization nodes. Thus, as shown in FIG. 1B, a fully cloned disk 108 will be created for virtualization node 2 and another fully cloned disk 110 will be created or virtualization node 3. The problem with this approach is that it is a very heavyweight approach, with very expensive resource consumption requirements.
Another possible approach is to allow the virtualization nodes to link to a common base image. FIG. 1C illustrates this approach, which is often termed the “linked clone” or the “link-based clone” approach. Here, a full clone is not created for each of the virtualization node. Instead, link-based clones are implemented that allow the virtualization nodes to link to the single copy of the virtual disk 106 that exists in the system. To the extent that any of the virtualization nodes need to make any changes to the data (creating a “delta” between the base image and the current data set viewed at the node), then local delta disk 118/120 is maintained at the remote nodes 2 and 3, respectively, that track the delta between the base image and the corresponding local view of the base image.
In the approach of FIG. 1C, it is noted that access to the linked base image can be directly provided to the local hypervisors, or may be routed through the hypervisor that resides at the host node for the base image. The possible problem with this linked clone approach is a bottleneck may result from having each of the remote virtualization nodes 2 and 3 needing to go through a host node 1 to access the base image at disk 106. The bottleneck occurs because the resources of the host node (e.g., memory and CPU resources) are used to access the locally controlled base image on behalf of the remote nodes. In some circumstances, a “bootstorm” may result when all of the remote nodes need to hit the same shared image at the same time, e.g., in the morning when an organization/company first opens for business as all users seek to boot up at the same time. This situation can cause great delays, resulting in excessive periods of unproductive time while remote nodes are queued up to access the same shared image.
Therefore, there is a need for an improved approach to implement access to a shared image in a virtualization environment.