Computer data is vital to today's organizations, and a significant part of protection against disasters is focused on data protection. As solid-state memory has advanced to the point where cost of memory has become a relatively insignificant factor, organizations can afford to operate with systems that store and process terabytes of data.
Conventional data protection systems include tape backup drives, for storing organizational production site data on a periodic basis. Such systems suffer from several drawbacks. First, they require a system shutdown during backup, since the data being backed up cannot be used during the backup operation. Second, they limit the points in time to which the production site can recover. For example, if data is backed up on a daily basis, there may be several hours of lost data in the event of a disaster. Third, the data recovery process itself takes a long time.
Another conventional data protection system uses data replication, by creating a copy of the organization's production site data on a secondary backup storage system, and updating the backup with changes. The backup storage system may be situated in the same physical location as the production storage system, or in a physically remote location. Data replication systems generally operate either at the application level, at the file system level, or at the data block level.
Current data protection systems try to provide continuous data protection, which enable the organization to roll back to any specified point in time within a recent history. Continuous data protection systems aim to satisfy two conflicting objectives, as best as possible; namely, (i) minimize the down time, in which the organization production site data is unavailable, during a recovery, and (ii) enable recovery as close as possible to any specified point in time within a recent history.
Continuous data protection typically uses a technology referred to as “journaling,” whereby a log is kept of changes made to the backup storage. During a recovery, the journal entries serve as successive “undo” information, enabling rollback of the backup storage to previous points in time. Journaling was first implemented in database systems, and was later extended to broader data protection.
One challenge to continuous data protection is the ability of a backup site to keep pace with the data transactions of a production site, without slowing down the production site. The overhead of journaling inherently requires several data transactions at the backup site for each data transaction at the production site. As such, when data transactions occur at a high rate at the production site, the backup site may not be able to finish backing up one data transaction before the next production site data transaction occurs. If the production site is not forced to slow down, then necessarily a backlog of un-logged data transactions may build up at the backup site. Without being able to satisfactorily adapt dynamically to changing data transaction rates, a continuous data protection system chokes and eventually forces the production site to shut down.
Another challenge in continuous data replication systems is to avoid entering a loop of repetitive actions that can negatively impact a production site or other system. For example, a system can constantly transition between a first state in which the system has reacted to a detected problem and a second state in which the system has recovered from the problem. If the problem keeps occurring the system can become stuck in a loop.