Over the last decade, the use of small computer systems, typically referred to as "personal computers" or "workstations", have been used increasing for significant commercial applications. The data processed on the computers may be extremely important to a company and faulty data and faults in the computers inevitably lead to unacceptable disruptions of operations, financial loss, or data loss in critical PC applications.
A fault tolerant architecture provides a system with redundant resources. If one resource fails, another can be assigned in its place giving the ability to continue processing the application without disruption, or with minimal disruption. The goal of fault tolerant design is to improve dependability by enabling a system to perform its intended function in presence of a given number of faults. A fault tolerant system is not necessarily highly dependable, nor does high dependability necessarily require fault tolerant. The deterministic goal for a fault tolerant system is that no single fault can cause system failure.
Error recovery is an important aspect of a fault tolerant system. "Error recovery" is correction of the system to an acceptable state for continued operation. System recovery schemes restore system operation to a previous correct state or a recovery point. For example, a processor is rolled back to a recovery point by restoring registers and memories to the saved state and invalidating cache memories, forcing cache data to be restored from disk.
Database Management Systems (DBMSs) use a form of error recovery in relation to transactions. A transaction is a series of processing steps having a beginning and an end. A transaction may be "committed" (made permanent) or "aborted" (records in database returned to original state). At least one DBMS allows a user to rollback a number of transactions.
One important aspect of error recovery is recovery of data on a hard disk or other mass storage medium after a failure. A typical failure could include a power outage during a write operation in which the new data has been only partially written to the hard disk and the previous data has been partially overwritten to the write operation, or by an operator error causing faulty data to be written to the hard disk. In either case, the user may wish to return to a previous known state to continue the application.
Therefore, a need has arisen in the industry for a fault tolerant system having an effective and cost efficient method of recovering from an error affecting the hard disk drive.