Data deduplication, the process of redundant data elimination, is becoming an important technology. Deduplication allows reduction of the required storage capacity because only unique data is stored. Additionally, source side deduplication also provides benefits such as network traffic reduction. In a typical configuration, a disk-based storage system, such as a storage-management server or VTL, has the capability to detect redundant data “extents” (also known as “chunks”) and reduce duplication by avoiding the redundant storage of such extents. For example, the deduplicating storage system could divide file A into chunks a-h, detect that chunks b and e are redundant, and store the redundant chunks only once. The redundancy could occur within file A or with other files stored in the storage system.
Known techniques exist for deduplicating data objects. Typically, the object is divided into chunks using a method such as Rabin fingerprinting. Redundant chunks are detected using a hash function such as MD5 or SHA-1 to produce a hash value for each chunk and then comparing that value against values for chunks already stored on the system. Typically, the hash values for stored chunks are maintained in an index (a “deduplication” index). Hash value and chunk size are typically used to uniquely identify a chunk. If a redundant chunk is identified, that chunk can be replaced with a pointer to the matching chunk.
In a client-server storage system, the deduplication can be performed at the data source (client), the data target (server), or on a deduplication appliance connected to the target server. The ability to deduplicate data either at the source or at the target offers flexibility in respect to resource utilization and policy management. However, there are certain security risks when performing deduplication activities at a source location.
Specifically, existing methods in the prior art fail to provide storage management systems with the ability to comprehensively address fake backup, contaminated target attacks, and chunk spoofing techniques. Given the security risks inherent with storing data at a target location after source-side deduplication, an enhanced security approach within a deduplicating storage system and various strategies for client/server authentication are needed to prevent these types of attacks.