In the world of virtual computing, multiple virtual machines (VMs or guests) can be instantiated at a software level on a single physical computer (a host). In various virtualization scenarios, a software component often called a hypervisor can act as an interface between the guests and the host operating system for some or all of the functions of the guests. In other virtualization implementations, there is no underlying host operating system running on the physical, host computer. In those situations, the hypervisor acts as an interface between the guests and the hardware of the host computer. Even where a host operating system is present, the hypervisor sometimes interfaces directly with the hardware for certain services. In some virtualization scenarios, the host itself is in the form of a guest (i.e., a virtual host) running on another host. The services described herein as being performed by a hypervisor are, under certain virtualization scenarios, performed by a component with a different name, such as “supervisor virtual machine,” “virtual machine manager (VMM),” “service partition,” or “domain 0 (dom0).” The name used to denote the component(s) performing specific functionality is not important.
One common virtualization architecture is for a single host to contain a large number (e.g., dozens or even hundreds) of separate VMs. Although different ones of the hosted VMs can be configured differently, in many instances multiple ones each contain copies of many the same files. For example, dozens of VMs on a single host could each run the same software applications, such as a specific office suite, a given accounting package, the same development tools, etc. It is also not uncommon for multiple VMs to each have a separate copy of the same large data set.
In such instances, the duplicate copies of the files on the multiple VMs result in wasted storage space. More specifically, where multiple VMs containing the same files are running on a single host, multiple copies of the same files (including in some instances significant numbers of very large file sets) reside on the underlying storage hardware of the single host computer. In addition, scanning operations such as scanning the files to detect malware or to identify specific content involve a great deal of duplicated effort under these circumstances. This is so because scan operations targeting files on the multiple VMs on the host end up scanning multiple copies of each duplicated file residing on each separate VM. This results utilizing computing resources to repeat the same task multiple times.
It would be desirable to address these issues.