Information that is used to access a stored digital item is referred to herein as the “access key” of the stored item. In typical file systems, stored items are retrieved based on (a) the location at which the items are stored, and (b) a name or identifier of the items. For example, if a file named “foo.txt” is located in a directory named “c:\myfiles\text”, then applications may use the pathname “c:\myfiles\text\foo.txt” as the access key to retrieve the file from the file system. Because conventional access keys are based on the location of the items being retrieved, the access keys change when the items are moved. In addition, each copy of an item has a different access key, because each copy is stored at a different location.
In contrast to conventional file systems, Content Addressable Storage (CAS) systems allow applications to retrieve items from storage based on a hash value that is generated from the content of the items. Because CAS systems perform storage-related operations on items based on the hash values generated for the items, and the hash values are based on the content of the items rather than where the items are stored, the applications that request the operations may do so without knowing the number or location of the stored copies of the items. For example, a CAS system may store multiple copies of an item X at locations A, B and C. An application that desires to retrieve item X would do so by sending to the CAS system a hash value that is based on the contents of item X. Based on that hash value, the CAS system would provide to the application a copy of item X retrieved from one of the locations A, B, and C. Thus, the application would obtain item X without knowing where item X was actually stored, how many copies of item X existed, or the specific location from which the retrieved copy was actually obtained.
Storing a digital item, such as a file or a message, often involves making a call to a “chunk storage system”. A chunk storage system is a storage system that performs storage operations without understanding the format or content of the digital information itself. Such storage systems are referred to as chuck storage systems because the systems treat all forms of digital items as if those items were merely opaque chunks of data. For example, the same chunk storage system may be used by word processing applications, image management applications, and calendaring systems to respectively store documents, images and appointments. However, from the perspective of the chunk storage system, only one type of item is being stored: opaque chunks of digital information.
Chunk storage systems may be implemented as CAS systems. For example, a chunk storage system may generate a hash value for a chunk by applying a cryptographic hash function (e.g. MD5, SHA-1 or SHA2) to the chunk. The chunk store may then store the chunk, and maintain an index that associates the hash value with the location at which the chunk is stored. When an application subsequently requests retrieval of the chunk, the application provides the hash value to the chunk storage system. The chunk storage system uses the index to locate the chunk associated with the hash value, and provides the chunk thus located to the requesting application.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.