Technical Field
The present disclosure relates to data de-duplication and, more specifically, to global data de-duplication in a cloud storage-based environment serving a plurality of data centers.
Background Information
Many large organizations may utilize cloud storage (“cloud”) as a common, global repository for enterprise data that may be accessed and shared, i.e., served, across geographically separated data centers. These organizations typically employ well-known data de-duplication techniques to reduce or eliminate storage of duplicate data at the data center level. For example, in addition to data center level de-duplication, an organization may employ a “global de-duplication” technique that de-duplicates across data streams deposited into the cloud over networks, such a wide area network (WAN) links, from different offices of the data centers. An implementation of this global de-duplication technique may leverage the global repository to synchronize de-duplication metadata, e.g., fingerprints, across the offices. However, such an implementation becomes problematic as the number of global offices increase, thereby increasing network traffic across the WAN links from the data centers. In addition, any disruption in WAN connectivity between the offices may lead to stale de-duplication metadata.
One solution to this global de-duplication problem is to have each data center perform a local data de-duplication procedure to reduce the amount of data that is transmitted over the network for storage in the cloud. However, such a solution does not achieve optimal performance as redundant data that originates from different data centers may still be stored within the cloud. Another solution may be to install a data de-duplication engine within the cloud; however, computational costs within cloud storage environments are substantially high, thereby causing the cost of such a solution to be unacceptably high and sometimes impractical.
Thus, there is a need for a cost effective technique to achieve global data de-duplication in a cloud storage environment that serves a plurality of data centers.