Data centers are facilities used to house computer systems and associated components, such as telecommunications and storage systems. The data is meant to be very securely stored, so they generally include redundant or backup power supplies, redundant communication connections, extensive environmental controls (e.g., air conditioning, fire suppression), and various security measures (e.g., vaults, alarms, guards, etc.). Large data centers are often industrial scale operations and are increasingly used by large corporations, enterprises, government organizations, and the like.
The need for redundancy of data in data protection and disaster recovery (DR) systems means that data must be copied (replicated) among different server computers and storage devices. In large-scale environments, different data centers (sites) are often used to store the backup data, and these data centers can be separated by significant distances to ensure that a problem that affects one data center, such as a natural disaster, does not affect the backup sites. When sensitive data is replicated to remote data centers for disaster recovery purposes, the reality is that those sites are not always as secure as the main (production) sites. For example, in federal IT environments, the production site is staffed and tightly controlled, while the remote DR site in many cases is normally unstaffed or understaffed unless a disaster occurs. A DR site may have guards or security apparatus on the perimeter of the site, but typically not inside the data center, and the overall security of the site is often very much less than that of the production site. Oftentimes, more than one remote location is available for recovery in case of a disaster in more than one site. In fact certain products such as the RecoverPoint platform from EMC Corp. of Hopkinton, Mass. can replicate the same data to up to four remote sites). Although this greatly helps in data availability for recovery purposes, it does add potential security weakness if the additional remote (DR) sites are not as well protected as the primary or production site. Thus, present disaster recovery networks that rely on widely distributed, large-scale data centers are vulnerable to data access attacks if the available DR sites are not as tightly protected and controlled as the main production site.
One simple and present approach to protect the remote site data is to split the data when it is replicated to the remote sites in such a way that if one site is compromised an attacker will not gain access to all of the data through this attack. The data is split at the source and different pieces of the data are sent to the remote sites. In case a failover to the remote site is required, the data is reconstructed by sending the pieces from the other remote sites to the selected site. This approach however has some flaws. First, the split of the data may not be sufficient to successfully prevent reconstruction of the full data from one part of it. Without clear knowledge of the content, and not just the bits, this method cannot be proven to be sufficient. Second, each data part should be sent to more than one site for redundancy in case the remote site also fails in the disaster situation, so it becomes more complex to manage. Third, in case of a disaster at the production site, the data would need to be transmitted to the chosen failover site from all the other sites, which results delayed recovery and increased RTO (Recovery Time Objective). Fourth, the split of data does not save storage space in the remote sites. Even if the data is split into four parts for transmission to four remote sites, every site must still have storage capacity for the full data in case it is required to serve as a failover site.
What is needed therefore, is a system that distributes data among remote sites in such a way that if one site is compromised an attacker will not gain access to all of the data through this one attack, but instead would need to gain control over at least two remote sites (or three sites in some extreme sensitive situations).
The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions. EMC, Data Domain and RecoverPoint are trademarks of EMC Corporation.