The present invention relates to Content Addressable Storage, CAS, and its use with legacy systems, and more particularly but not exclusively, to replication techniques using CAS where the source system is a CAS system and the destination system that replicates the source is a legacy or any other system that does not allow for CAS.
Content Addressable Storage, CAS, also referred to as associative storage, is a mechanism for storing information that can be retrieved based on the content rather than on the storage location. CAS is typically used for storage and retrieval of fixed content and for archiving or permanent storage.
In Content Addressable Storage, the system records a content address, a key that uniquely identifies the information content. A hash function is typically used as the key to identify the data content, and the quality of the system depends on the quality of the hash function. Too weak a hash function may lead to collisions between different content items, whereas too strong a hash key leads to inefficiency in data storage.
A typical CAS storage space has access nodes through which input and output is handled and storage nodes for permanent storage of the data, and CAS metadata allows for content addressing and retrieval within the system.
Often the storage space requires to be backed up, and thus a replication of the source space may be constructed at a destination location. The source and destination spaces are often not located physically together and there may be bandwidth limitations and latency involved in communication between the two. For the purpose of replication, nodes at the source storage space are required to provide consistent copies of input data to the destination, and any system has to allow for failures at one location or another in a way that takes account of the latency within the system. Two close-together operations on the same data may otherwise result in inconsistent replication.
It is furthermore desirable to avoid unnecessary data transfer in view of limitations on bandwidth. If the destination system is a CAS system then the amount of storage space can be reduced simply by transferring the data only the first time it is used and subsequently merely transferring the hash information. Since the destination system is a CAS system, the hash is sufficient information for it to be able to perform the replication.
If the destination system is not itself a CAS system then the hash is of no help. The data is stored independently each time and needs to be transferred independently each time since the non-CAS system has no way of identifying two identical data items stored at different locations.