A virtual machine in a virtual computing infrastructure can run on a host device that comprises physical hardware and virtualization software. One or more applications that can run within the virtual machine can generate data that may be stored on one or more virtual disks. Virtual disks can be implemented on a primary storage system such as a storage array (or, storage appliance) having a substantial number of disks. Current storage array capacities can be up to many terabytes, several petabytes, or more. But, the increased primary storage system capacity comes with increased costs. Costs can include the cost of the disks, CPUs and memory to manage the disks that store the data, the power required to operate and cool the disks, and the cost of the storage space to house the disks. In addition, when it is time to backup the primary storage system, the original data on the primary storage system must be transmitted to a backup server, further increasing the cost of the data.
The size of the data stored on a primary storage system can be reduced using compression and/or deduplication. Requiring a virtual machine to compress its own data before writing the data to a primary storage system requires processing overhead that would reduce end-user performance. Currently, deduplication is limited to implementations on backup servers and target storage devices used for backup. Deduplicating on a backup server incurs the cost of transmitting all of the original data from the primary storage system to the backup server or target storage.