The present invention, in some embodiments thereof, relates to replication techniques with content addressable storage, and, more particularly, but not exclusively to a system and method for reducing the latency and amount of data transfer during replication, in a content addressable storage system.
Content addressable storage, CAS, also referred to as associative storage, is a mechanism for storing information that can be retrieved based on the content rather than on the storage location. CAS is typically used for storage and retrieval of fixed content and for archiving or permanent storage.
In Content Addressable Storage, the system records a content address, a key that uniquely identifies the information content. A hash function is typically used as the key to identify the data content, and the quality of the system depends on the quality of the hash function. Too weak a hash function may lead to collisions between different content items.
A typical CAS storage space has access nodes through which input and output is handled and storage nodes for permanent storage of the data, and CAS metadata allows for content addressing and retrieval within the system.
Often the storage space requires to be backed up, and thus a replication of the source space is constructed at a destination location. The source and destination spaces are often not located physically together and there may be bandwidth limitations and latency involved in communication between the two. For the purpose of replication, nodes at the storage space are required to provide consistent copies of input data to the destination, and any system has to allow for failures at one location or another in a way that takes account of the latency within the system. Two close-together operations on the same data may otherwise result in inconsistent replication.
It is furthermore desirable to avoid unnecessary data transfer in view of limitations on bandwidth.
It is noted that a CAS storage space as considered here may be a system that is internally a CAS system but looks like a standard data storage block to external applications. That is to say, CAS metadata such as hashes are not generally available to externally to the memory block.