The present invention relates generally to content addressable storage and more particularly to improved access to data in content addressable storage systems.
Storing large quantities of data is critical to business. File systems must be backed up, volumes of records must be kept to satisfy regulatory requirements, and large collections of data must be stored, among other business applications. This important data must be stored in a manner that is resilient to hardware failure. Various systems have been employed which address the particular needs of each of these data collections.
Previously, content addressable storage (CAS) has been used for building high capacity storage systems. CAS systems generate an address for a block of data by hashing the data block contents. This allows duplicate copies of the data block to be readily identified so that the CAS system need only store one copy of the data block. The reduction in storage requirements makes CAS systems useful for high capacity storage.
However, since CAS typically store immutable objects, they only allow writing data organized in Directed Acyclic Graphs (DAGs). That is, once a parent block points to (e.g., contains the address on a child block, it is not possible for the child block to point to the parent block. Ignoring the sharing aspect in which a child is pointed to by many parents) these DAGs as may be informally referred to as “trees.” Using CAS to implement a storage system that allows stored data to be modified (e.g., allows a file to be overwritten) presents two important challenges—(1) how to efficiently utilize the storage (e.g., minimize the use of parent blocks to point to a set of changing data/leaf/child blocks), and (2) how to allow concurrent modifications to different parts of the tree (e.g., avoid excessive locking) in order to offer better performance. Accordingly, improved systems and methods of storing and retrieving data in content addressable storage systems are required.