Computer virtualization is a technique that involves encapsulating a physical computing machine platform into a virtual machine that is executed under the control of virtualization software running on a hardware computing platform, or “host.” A virtual machine has both virtual system hardware and guest operating system software. Virtual system hardware typically includes at least one “virtual disk,” a single file or a set of files that appear as a typical storage drive to the guest operating system. The virtual disk may be stored on the host platform or on a remote storage device. Typically, a virtual machine (VM) uses the virtual disk in the same manner that a physical storage drive is used, to store the guest operating system, application programs, and application data.
The virtualization software, also referred to as a hypervisor, manages the guest operating system's access to the virtual disk and maps the virtual disk to the underlying physical storage resources that reside on the host platform or in a remote storage device, such as a storage area network (SAN) or network attached storage (NAS). Because multiple virtual machines can be instantiated on a single host, allocating physical storage space for virtual disks corresponding to every instantiated virtual machine in an organization's data center can stress the physical storage space capacity of the data center. For example, when provisioning a virtual disk for a virtual machine, the virtualization software may allocate all the physical disk space for the virtual disk at the time the virtual disk is initially created, sometimes creating a number of empty data blocks containing only zeros (“zero blocks”). However, such an allocation may result in storage inefficiencies because the physical storage space allocated for the virtual disk may not be timely used (or ever used) by the virtual machine. In one solution, known as “thin provisioning,” the virtualization software dynamically allocates physical storage space to a virtual disk only when such physical storage space is actually needed by the virtual machine and not necessarily when the virtual disk is initially created.
In a similar manner, thin provisioning may be implemented as a storage space optimization technology in the underlying storage hardware, e.g., storage array, which may include an array of rotating disks or solid state disks as the physical storage media. In such cases, a storage system controller that manages the physical storage media and exposes them as logical data storage units, referred to as logical unit numbers (LUNs), to the host, thinly provisions the LUNs. That is, the storage system controller dynamically allocates physical storage space to the LUNs only when such physical storage space is actually needed by the LUNs and not necessarily when the LUNs are initially created. As a result, when the LUNs are initially created, the logical size of each of the LUNs is typically much greater than its physical size.
However, even with the use of thinly-provisioned virtual disks and thinly-provisioned LUNs, storage inefficiencies may be caused by an accumulation of “stale” data, i.e., disk blocks that were previously used and are currently unused but remain allocated. For example, deletion of a file, such as a temporary file created as a backup during editing of a document, in the virtual disk by the guest operating system does not generally result in a release of the actual data blocks corresponding to the temporary file. While the guest operating system may itself track the freed data blocks relating to the deleted temporary file in its own guest file system (e.g., by clearing bits in a bitmap for the guest file system), the guest operating system is not aware that the disk on which it has deleted the temporary data file is actually a “virtual disk” that is itself a file. Therefore, although a portion (i.e., the portion of the virtual disk that stores the guest file system's bitmap of freed data blocks) of the virtual disk may be modified upon a deletion of the temporary file by the guest operating system, the portion of the virtual disk corresponding to actual data blocks of the deleted temporary file does not actually get released from the virtual disk back to the LUN by the virtualization software. This behavior can result in storage inefficiencies because such “stale” portions of the virtual disk are not utilized by the corresponding guest operating system and are also not available to the virtualization software for alternative uses (e.g., reallocated as part of a different virtual disk for a different virtual machine, etc.).
The process known as Storage vMotion™ involving live migration of virtual machine disk files (including one or more virtual disks and other VM configuration files) from a source LUN to a destination LUN provides another example of “stale” data being accumulated in a thinly-provisioned LUN. During Storage vMotion™, actual data blocks corresponding to the virtual machine disk files are copied from the source LUN to the destination LUN, and at the conclusion of the copying, the LUN supporting the VM is atomically switched from the source LUN to the destination LUN. After the atomic switch-over, the actual data blocks corresponding to the virtual machine disk files in the source LUN are no longer needed. While the virtualization software may itself track these data blocks and mark them as “free,” for example, by actually deleting the virtual machine disk file from the source LUN, the portion of the source LUN corresponding to these free data blocks of the virtual machine disk file does not actually get released from the LUN back to the storage array. This may be acceptable if the virtualization software quickly reallocates the freed data blocks in the source LUN for alternative uses (e.g., by allocating a new virtual machine disk file for another virtual machine, etc.). However, in cases where the freed data blocks remain unallocated, such “stale” portions of the LUN lessen the storage space efficiencies gained from thin provisioning (e.g., since such stale portions could have been reallocated by the storage array manager to a different thinly provisioned LUN that may be experiencing storage pressure).