1. Technical Field
The present disclosure relates generally to data storage systems. More particularly, the present disclosure relates to data storage systems supporting cloud storage system protocols and providing inline distributed deduplication services.
2. Description of the Background Art
With the increasing amount of data is being created, there is increasing demand for data storage solutions. Storing data using a cloud storage service is a solution that is growing in popularity. A cloud storage service may be publicly-available or private to a particular enterprise or organization. Popular public cloud storage services include Amazon S3™, the Google File System™, and the OpenStack Object Storage (Swift) System™.
Cloud storage systems may provide “get” and “put” access to objects, where an object includes a payload of data being stored. The payload of an object may be stored in parts referred to as “chunks”. Using chunks enables the parallel transfer of the payload and allows the payload of a single large object to be spread over multiple storage servers.
With the increasing need for archival data storage, it is highly desirable for a cloud storage system to be efficient in storing large data sets. Deduplication is one technique for substantially improving the efficiency of a cloud storage system. Deduplication identifies identical data chunks in a storage system so as to avoid storing duplicative copies of the same chunk. Distributed Deduplication is a more advanced form of deduplication that also avoids transferring duplicative copies of the same chunk over the network.
For reasons of data security and privacy for users, it is also highly desirable provide end-to-end encryption of data chunks in a cloud storage system. End-to-end encryption requires chunks to be stored and transmitted only when they are encrypted.
However, it is challenging to only store and transmit encrypted chunks in a distributed system while, at the same time, retaining the ability to determine whether two chunks are identical for data deduplication. The present disclosure provides an advantageous solution to this challenge.