Organizations can store data on local data storage systems (on-premises) and may utilize remote data storage systems (off-premises) for backup, disaster recovery, etc. For example, an online retailer may store customer data on a plurality of locally-hosted servers, while storing backup copies of the data on a remote data center in the “cloud.” The use of on-premises and off-premises storage systems is known as a “hybrid” storage system or “hybrid cloud.”
Many multi-system storage environments implement data deduplication technologies to improve storage capacity utilization by reducing the amount of duplicated storage across storage devices. Data deduplication systems reduce the total amount of physical storage that is required to store data by ensuring that duplicate data is not stored multiple times. However, current multi-system storage environments implement similarity based deduplication systems that assume that all of the repository data is on-premises, and furthermore, these similarity based deduplication systems are not capable of supporting hybrid clouds.