Data storage is a critical component for computing. In a computing device, there is a storage area in the system to store data for access by the operating system and applications. In a distributed environment, additional data storage may be a separate device that the computing device has access to for regular operations. In an enterprise environment, the stored data in the storage area of the computing device or additional data storage often access one or more offsite storage devices as a part of a global disaster recover (DR) strategy to protect the entire organization by having one or more copies of data at offsite locations. Traditionally, backup applications are used to copy data to tapes, which are then physically shipped to offsite locations. This labor-intensive process is error prone, introduces security risks and is extremely slow for data recovery. A network-based alternative is to transfer stored data over a computer network. In this kind of environment, an onsite storage may be referred to as a source storage, and an offsite storage may be referred to as a target storage. For data protection purposes, it is important to make regular copies of data from a source storage to a target storage, and the process may be referred to as data replication.
Data deduplication is a set of techniques for eliminating duplicated copies of repeating data. It improves storage utilization and can also be applied to copy data across network to reduce the amount of data to be transferred. Thus data deduplication can be utilized along with data replication. Yet, it is challenging to effectively replicate data from a deduplicated storage system.