1. Field
This disclosure relates generally to fault detection and recovery, and, more particularly, to a system and method for restoring a database, database record or associated metadata to a desired consistent order state.
2. Background
Performance requirements often demand persistent availability of computing resources. Computing systems designed to meet these demands are called “high-availability” computing systems. High availability systems utilize a number of diverse strategies to operate for long periods of time with a low rate of failure. When failures do occur, systems and strategies may be utilized to maintain the appearance of normal operation to external systems, recover a lost state or lost information, and restore normal operation as quickly as possible.
Traditionally, there are two primary replication and recovery solutions for high-availability computing systems. The first traditional replication and recovery solution relies on internal application logic to maintain persistent order state in memory in case of failure. However, this makes the system highly vulnerable to data corruption. If the host computer system crashes, there may be state and information losses upon recovery, leaving the application in an inconsistent and unpredictable condition. These outages are, of course, inevitable, and Information Technology (IT) personnel must often forego maintaining persistence to undergo time-intensive restoration of the application to a “clean” operating state, risking the loss of critical information.
The second traditional replication and recovery solution relies on active applications that are hosted on separate computer systems, tasked with providing data replication and recovery services. These applications are placed on the data flow path and intercept inbound data before passing it along to the supported computer system. While this solution avoids the pitfalls of relying on internal application logic, it necessarily introduces latency into the system, which may create unacceptable data bottlenecks, especially for systems that must handle large volumes of information. Furthermore, both traditional replication and recovery solutions typically require large amounts of processing time to recover the order state of the supported system.