Computer virtualization is a technique that involves encapsulating a physical computing machine platform into a virtual machine that is executed under the control of virtualization software running on a hardware computing platform, or “host.” A virtual machine has both virtual system hardware and guest operating system software. Virtual system hardware typically includes at least one “virtual disk,” a single file or a set of files that appear as a typical storage drive to the guest operating system. The virtual disk may be stored on the host platform or on a remote storage device. Typically, a virtual machine (VM) uses the virtual disk in the same manner that a physical storage drive is used, to store the guest operating system, application programs, and application data.
The virtualization software, also referred to as a hypervisor, manages the guest operating system's access to the virtual disk and maps the virtual disk to the underlying physical storage resources that reside on the host platform or in a remote storage device, such as a storage area network (SAN) or network attached storage (NAS). Because multiple virtual machines can be instantiated on a single host, allocating physical storage space for virtual disks corresponding to every instantiated virtual machine in an organization's data center can stress the physical storage space capacity of the data center. For example, when provisioning a virtual disk for a virtual machine, the virtualization software may allocate all the physical disk space for the virtual disk at the time the virtual disk is initially created, sometimes creating a number of empty data blocks containing only zeros (“zero blocks”). However, such an allocation may result in storage inefficiencies because the physical storage space allocated for the virtual disk may not be timely used (or ever used) by the virtual machine. In one solution, known as “thin provisioning,” the virtualization software dynamically allocates physical storage space to a virtual disk only when such physical storage space is actually needed by the virtual machine and not necessarily when the virtual disk is initially created.
In a similar manner, thin provisioning may be implemented as a storage space optimization technology in the underlying storage hardware, e.g., storage array, which may include an array of rotating disks or solid state disks as the physical storage media. In such cases, a storage system controller that manages the physical storage media and exposes them as logical data storage units, referred to as logical unit numbers (LUNs), to the host, thinly provisions the LUNs. That is, the storage system controller dynamically allocates physical storage space to the LUNs only when such physical storage space is actually needed by the LUNs and not necessarily when the LUNs are initially created. As a result, when the LUNs are initially created, the logical size of each of the LUNs is typically much greater than its physical size.
However, even with the use of thinly-provisioned virtual disks and thinly-provisioned LUNs, storage inefficiencies may be caused by an accumulation of “stale” data, i.e., disk blocks that were previously used and are currently unused but remain allocated. For example, deletion of a file, such as a temporary file or a swap file used by the hypervisor while supporting virtual machines, does not generally result in a release of the actual data blocks corresponding to the files. In addition, after migration of one or more virtual disks associated with a virtual machine, known as Storage vMotion™, is carried out, a set of files corresponding to the virtual disks is deleted by the hypervisor in response to an unlink command issued by a virtual machine management server. The deletion of these files does not necessarily result in a release of the actual data blocks corresponding to these files. This behavior can result in storage inefficiencies because such “stale” portions of the LUN are not utilized. U.S. patent application Ser. No. 13/181,153, filed Jul. 12, 2011, the entire contents of which are incorporated by reference herein, describes a technique that employs unmap commands to reclaim data blocks of “stale” data and continue to maintain the benefits of thin provisioning. However, the overhead associated with the execution of unmap operation in the storage system often offsets the benefits gained by the unmap operation.