As technology advances, data storage is becoming increasingly important and data storage capacities are increasing rapidly. Correspondingly, the size of data storage arrays and their demands for storage have increased rapidly. Ever increasing amounts of data are required to be highly available and protected from corruption or damage that may be caused by any of a variety of factors, such as natural disasters and power failures, etc. As a result, increasingly complex data storage clusters are used to satisfy the demands for data storage and retrieval. The data related to these storage clusters are routinely backed up to prevent data loss.
In order to ensure the ability to keep operating after a disaster, all data of a local site may be replicated to one or more remote site(s). The replication of the local data to remote sites can consume large amounts of bandwidth and require identical amounts of storage at the remote site. The storage costs are thus proportional to each remote site thereby making the practice of maintaining multiple remote sites particularly costly. Replication can thus be expensive both in terms of resources and bandwidth. Unfortunately, since all the data is replicated to the remote site(s), both important and unimportant data is copied. This results in the unfortunate need for storage and bandwidth to store and transmit data that is of reduced or little importance.
Thus, a need exists to more efficiently replicate data while reducing bandwidth and storage requirements at remote sites.