1. The Field of the Invention
The present invention relates to data replication, and more particularly, to the proactive replication of data using a hidden or secret relocation algorithm that determines the target location and timing of the relocation in a cryptographically secure way.
2. Related Technology
Computing technology has transformed the way we work and play. Modern computer networking technologies and infrastructures allow for different applications and users to electronically access data even over vast distances relatively quickly using readily-available computer systems. Such computer systems may include, for example, desktop computers, laptop computers, Personal Digital Assistants (PDAs), digital telephones, or the like.
This high level of data availability allows for numerous useful services to be offered over the Internet or other networks. Indeed, the level of data availability is considered a critical performance component of many, if not most, network services. Customers often expect little, if any, interruptions in access to the data offered by given network services. However, there are cases in which data may be destroyed, thereby potentially causing significant, if not permanent, interruption in services that rely on access to that data.
For example, the computer system that stores the data may malfunction causing the stored data to be corrupted. Perhaps a user inadvertently deleted or saved a different item over the stored data. Perhaps a disgruntled or malicious person intentionally destroyed the data. Alternatively, the storage device that stores the data may be physically damaged or destroyed. Regardless of the failure mechanism, such destruction of data may be catastrophic depending on the importance and reconstructability of the data lost.
One conventional mechanism for guarding against such failure is to make multiple replicas of the data, and to store at least some of the replicas on different computer systems or even in different geographically remote locations. The goal of such replication is to continue data availability even if one of the data replicas becomes inaccessible or destroyed. Should one of the replicas be destroyed, the data may still be accessed via another of the replicas. In cases in which a minimum number of replicas is desired in order to allow a high degree of security that all replicas will not be destroyed, the recovery algorithm of the replication system may generate further replicas in order to compensate for any lost replicas.
The use of multiple replicas for guarding against such failure provides significant security against many failure mechanisms. For example, if data is inadvertently deleted or intentionally destroyed, the data may still be accessed from other replicas. If a computer system fails or the storage device is destroyed, the data may still be accessed from a replica on another computer system.
The use of multiple replicas assumes that there is a high degree of independence between potential failure mechanisms for at least some of the replicas. For example, if the anticipated failure mechanism was that the computer system fails, independence from this failure mechanism may be accomplished by storing a replica in another computer system. If the anticipated failure mechanism was a geographically related problem such as a power outage, intentional physical destruction or natural disaster, independence may be accomplished by storing the replicas at geographically remote distances. If the anticipated failure mechanism was an intentional destruction of the data by an antagonist (also colloquially referred to as a “hacker”), then independence may be hoped for if the antagonist is not aware of all of the replicas.
However, it is possible that individuals or organizations might perform a malicious, sophisticated, and concerted attack against all copies of the data substantially simultaneously. If such a malicious attacker were to destroy all of the data replicas before the system could respond by recreating other copies, then the data might be lost forever. The loss in data would occur regardless of the fact that the system had a recovery mechanism to recreate replicas once one was lost, since all the replicas would be lost prior to the recovery mechanism being successful in creating further replicas.
One critical piece of information that might be required in order to facilitate such a concerted attack is the location of each of the replicas. In accordance with the principles of the present invention, a replication system is described which guards against such attacks by moving the data around using a cryptographically secure algorithm such that the location of the replicas is either unknown to any user (even potentially system administrators), or is known to only a small group. Even if one were to know of the location of one or more thereafter would not be able to be used to determine the current location of the replica. Accordingly, a concerted attack against all of the replicas would more likely fail, thus allowing the replication system to more likely survive such an attack.
There are conventional replication systems that do move replicas around periodically. However, such replication systems move replicas around in order to perform what is called “software rejuvenation” or in order to perform other housekeeping purposes unrelated to obscuring the location of the replicas. Software rejuvenation is performed by gracefully terminating an application, and then restarting the application with a clean internal state. Such rejuvenation is performed in order to counter an effect called “software aging” in which the performance of software degrades over time. When it is time for one of the replicas of a software application to be rejuvenated, that replica is terminated and then restarted in a clean state. After terminating and before restarting, the replica may be moved to another location when, for example, the old location may be contributing or may be more susceptible to software aging.
The use of software rejuvenation in replication systems has not conventionally made attempts at obfuscating the target location and movement times for a replica. This is not surprising since performing such obfuscation of the target location would not advance the purpose of performing software rejuvenation. Accordingly, the occasional movement in replication systems due to software rejuvenation provides little, if any, protection against a concerted attack against all replicas, since the location of the replicas are not hidden, even though the locations are occasionally moved. If an antagonist could determine the old location of the replicas, the antagonist may often be able to determine the new location of the replicas.