Modern enterprise database systems store massive amounts of business data, often including mission-critical business data that needs to be backed-up. In most modern enterprise database systems, the computing infrastructure is physically distributed, sometimes over a wide geographical separation. In legacy backup scenarios, a production environment is backed up by periodically taking an interval-spaced series of snapshots of the production system, and replicating them to a geographically remote location for restoration in the event of a failure. Often, a restore operation using such a legacy backup scenario would require a suspension of at least some database services (e.g., services that write data to the production database), thus causing a period of at least partial ‘down time’.
One technique to reduce the period of down time is to maintain a separate copy of the production database (e.g., one or more interval-spaced snapshots), and capture changes continuously (e.g., in a stream of redo-log changes) to be applied to the separate copy. This can potentially reduce the duration of the aforementioned down time and loss of data, however when the separate copy is stored at a remote site (e.g., a distant location relative to the production database), there can potentially be a large number of transactions in-flight between the time that a primary database transaction is performed and the time that the corresponding redo log change for the transaction is captured at the remote site. This leads to a potentially large data loss in the event of a failure.
The aforementioned legacy techniques, singly or in combination, are still deficient at least in the sense that the restored/patched database can be only as up-to-date as of the last operation captured in the last redo log file. While this legacy technique has the potential for completely restoring a destination system to a recent state, the potential is only a possibility that is dependent on the state/recency of the database to be to be patched with the redo log entries, and is further dependent on the recency of the transmission of all of the redo log entries.
One approach to address these deficiencies is to capture redo log events synchronously; that is, to force the production database to wait after a transaction until the redo log has been captured at the remote site. This introduces yet another deficiency inasmuch as the latency in communication between the production database and the remote site can be substantial, thus impacting throughput in the production system. To address this deficiency then, an intermediate server (e.g., a server located relatively nearer to the production database) can be introduced, and the synchronous redo log can be captured continuously at the intermediate server without introducing undue latency delay. The synchronously captured redo log can be applied at the remote site to an appropriately recent snapshot backup, and thus, even in the event of a failure of the primary database system, the intermediate server holds the last synchronously-captured transaction.
Individually, none of the aforementioned technologies have the desired capabilities for zero or near-zero data loss database backup and recovery. Therefore, there is a need for an improved approach.