A network storage environment may comprise one or more storage servers configured to provide host computing devices with access to data stored on storage devices accessible from the respective storage servers. In particular, a host computing device may connect to a storage server that may provide the host computing device with access to data stored on storage devices that are accessible to and/or managed by the storage server. For example, a user may desire to startup a virtual machine on a desktop host computing device using a virtual machine hosting application. The virtual machine hosting application may be configured to retrieve virtual machine data of the virtual machine from a network storage environment within which the virtual machine data may be stored. In particular, the virtual machine hosting application may access a storage server. The storage server may provide the virtual machine hosting application with access to the virtual machine data stored across one or more storage devices accessible to storage server.
Unfortunately, storage servers may store a significant amount of duplicate (e.g., redundant) data within storage devices that the respective storage servers access. Thus, conventional storage servers may perform deduplication to mitigate storage of duplicate data. That is, a storage server may comprise deduplication functionality (e.g., programming) to detect whether writeable data from a host computing device that is to be written to a storage device is already stored by the storage server (on a storage device). It may be appreciated that writeable data and/or the like as used herein may be thought of as data that is to be written (e.g., write data). If the writeable data is not already stored by the storage server, then the storage server may store the writeable data within a destination location (e.g., storage device) of the writeable data. If the writeable data is already stored within a storage device, then the storage server may merely store a reference within the destination location that points to a source location (e.g., other (or same) storage device) already comprising the writeable data. It may be appreciated that the reference may be relatively small compared to the writeable data. For example, a user on a host computing device may send a write command associated with 2.5 GB of virtual machine writeable data to a storage server. Before storing the large amount of writeable data, the storage server may determine whether the virtual machine writeable data is already stored within one or more storage devices accessible to the storage server. If the virtual machine writeable data is already stored by the storage server, then the storage server may merely store a reference with a size of merely a few kilobytes pointing to a source location of the virtual machine data already stored by the storage server. In this way, storage of redundant data may be reduced to enhance network storage capacity utilization by the storage server (e.g., a reference with a size of a few kilobytes may be stored in place of a 2.5 GB file). Such operations may have a significant impact on large scale network storage environments where storage servers may store data within a plurality of storage devices for a significant number of host computing devices (e.g., a storage server may store over a thousand virtual machines across hundreds of storage devices, where a significant amount of virtual machine data may overlap and/or result in redundant data (e.g., many of the virtual machines may comprise similar operating system data, file system data, etc.)).
The storage server may perform deduplication through a variety of deduplication techniques. In one example, the storage server may compute a signature of writeable data received from a host computing device (e.g., the storage server may compute a hash signature from the writeable data to uniquely identify the writeable data). The storage server may compare the signature of the writeable data with signatures of data already stored by the storage server (e.g., the storage server may maintain a data structure, such as an index, of signatures computed for data already stored by the storage server on storage devices). In another example, the storage server may compute the signature of the writeable data and perform a byte-by-byte comparison, which may mitigate false positives occurring from collisions (e.g., hash function collision, signature function collision, etc.) where different data may be hashed into the same signature. In this way, the storage server may perform deduplication upon data received from the host computing device to mitigate storage of redundant data.