The present invention relates to computer-data backup and recovery technology and in particular to improvements to disaster-recovery (DR) systems that allow such systems to more efficiently back up and recover data after a catastrophic data-loss event.
Disaster-recovery systems are today configured to back up and restore data on a component, or server-to-server, basis. This is true even when an entire data center is configured to be protected by a single disaster-recovery system. Database transactions and other types of information are captured from each information source, such as a source database that logs database transactions into a local area of persistent storage, and replicated to a target recovery database that is usually located at a physically distinct site. Similarly, when recovering from a catastrophic loss, previously backed-up data is replicated from each target database to a recovery database that will allow users to access data copied from the source.
Known DR systems may in practice require an enormous number of component-to-component connections, each of which captures or restores data associated with a single information source. In modern datacenters that comprise thousands of databases and other types of information repositories, the complexity of such component-to-component DR systems makes them resource-intensive, difficult to configure, and burdensome to manage.
In particular, because each source component may require independent, customized backup and recovery procedures, a network or system administrator has little control over a DR system's aggregate, datacenter-level resource utilization and task prioritization. For example, if a recovery operation requires 500 components to be restored, known DR systems must manage each restoration task independently, making it difficult to prioritize or efficiently manage the consumption of network bandwidth and other resources required to recover lost data of an entire datacenter.
In one example, a datacenter may comprise one thousand database servers that are continuously backed up to target recovery servers scattered across three other sites. After a catastrophic failure that has affected a subset of the database servers and a subset of the network connections between source-server/target-server pairs, recovery operations may be hampered by the need to determine, one-by-one, which source databases require access to recovery servers still at remote sites, which recovery servers are still operational, and which recovery servers have the connectivity required for recovery. These problems are further aggravated by the lack of a way to manage all recovery operations and all communications between source and recovery server pairs through a dedicated connection.
Another problem with known DR systems is loss of database transactions and other stored information that had been created or revised immediately before a catastrophic loss. Because a finite transfer time is required to replicate a source transaction to a remote backup, transactions-in-transit that did not have time to reach a target database may be inaccessible during a recovery operation.
These technical flaws are rooted in the architecture of current disaster-recovery technology, which is inherently limited by topology to independently backing up and restoring each data source individually through a distinct communications line; and has no way to restore data that was could not be stored successfully in a backup database from which lost data can be recovered.