Computer virtualization is a technique that involves encapsulating a physical computing machine platform into a virtual machine that is executed under the control of virtualization software on a hardware computing platform, or “host.” A virtual machine has both virtual system hardware and guest operating system software. Virtual system hardware typically includes at least one “virtual disk,” a single file or a set of files that appear as a typical storage drive to the guest operating system. The virtual disk may be stored on the host platform or on a remote storage device. Typically, a virtual machine uses the virtual disk in the same manner that a physical storage drive is used, to store the guest operating system, application programs, and application data.
The virtualization software, also referred to as a hypervisor, manages the guest operating system's access to the virtual disk and maps the virtual disk to the underlying physical storage resources that reside on the host platform or in a remote storage device, such as a storage area network (SAN) or network attached storage (NAS). Because multiple virtual machines can be instantiated on a single host, allocating physical storage space for virtual disks corresponding to every instantiated virtual machine in an organization's data center can stress the physical storage space capacity of the data center. For example, when provisioning a virtual disk for a virtual machine, the virtualization software may allocate all the physical disk space for the virtual disk at the time the virtual disk is initially created, sometimes creating a number of empty data blocks containing only zeroes (“zero blocks”). However, such an allocation may result in storage inefficiencies because the physical storage space allocated for the virtual disk may not be timely used (or ever used) by the virtual machine. In one solution, known as “thin provisioning,” virtualization software dynamically allocates physical storage space to a virtual disk only when such physical storage space is actually needed by the virtual machine and not necessarily when the virtual disk is initially created.
Storage inefficiencies may also be caused by an accumulation of “stale” data in the virtual disk, i.e., disk blocks that were previously used but are currently unused by the guest operating system. For example, deletion of a file, such as a temporary file created as a backup during editing of a document, in the virtual disk by the guest operating system does not generally result in a release of the actual data blocks corresponding to the temporary file. While the guest operating system may itself track the freed data blocks relating to the deleted temporary file in its own guest file system (e.g., by clearing bits in a bitmap for the guest file system), the guest operating system is not aware that the disk on which it has deleted the temporary data file is actually a “virtual disk” that is itself a file. This file is stored in a “virtual machine” level file system (hereinafter sometimes referred to as a “virtual machine file system”) that is implemented and imposes an organizational structure in a logical unit number (LUN) of a storage device. Therefore, although a portion (i.e., the portion of the virtual disk that stores the guest file system's bitmap of freed data blocks) of the virtual disk may be modified upon a deletion of the temporary file by the guest operating system, the portion of the virtual disk corresponding to actual data blocks of the deleted temporary file does not actually get freed in the virtual machine file system. This behavior can result in storage inefficiencies because such “stale” portions of the virtual disk are not utilized by the corresponding guest operating system and are also not available to the virtual machine file system for alternative uses (e.g., reallocated as part of a different virtual disk for a different virtual machine, etc.). The foregoing stale data phenomenon can be additionally complicated due to the difficulty in reclaiming data blocks because of possible “impedance mismatches” of guest operating system block size, which may be 4 KB, and virtual disk block size, which may be 1 MB. As such, even if a guest operating system expressly de-allocates certain data blocks in its guest file system (e.g., of 4 KB size), corresponding virtual machine file system data blocks within the virtual disk (e.g., of 1 MB size) at the virtual machine file system may be too large to deallocate and may further contain data corresponding to other data blocks at the guest file system level that remain in use, a phenomena typically referred to in the art as “false sharing” due to block size artifacts.