1. Field of the Invention
The present invention most generally relates to digital data processing systems and more particularly relates to recovery from disk failures in transaction oriented digital data processing systems.
2. Description of the Prior Art
It has been known for some time to utilize large scale digital data processing systems for real time posting of transactions from a large number of terminals or work stations to a single or small number of related data bases. Real time banking, airline reservations, and theater ticketing are examples of applications for such data processing systems.
One particularly difficult problem with such transaction based systems is the need to protect against loss of a portion of the data base and the corresponding transactions through unrecoverable disk errors and/or data base corruption. Most often such problems result from failures of the storage hardware and/or associated switching equipment or from errors in new software.
The simplest form of data recovery employs redundant storage of the data base. U.S. Pat. No. 4,084,231, issued to Capozzi et al., utilizes redundant storage hardware for that purpose. Unfortunately, such complete redundancy is too costly for any but the smallest data storage capacities. Because of the cost, many applications employ such redundancy only for particularly critical data. U.S. Pat. No. 5,089,958, issued to Horton et al., teaches redundant storage of machine state values.
A somewhat more cost effective means of utilizing redundancy is through the storage of only data changes rather than storage of the complete data base. U.S. Pat. No. 4,020,466, issued to Cordi et al., has copy back store at each level of a hierarchical memory system to save changes to the main storage at that level. Whereas this approach is less costly than complete redundancy, it is still too costly for very large scale systems. xe2x80x9cRecovery Techniques For Database Systemsxe2x80x9d, by Joost B. M. Verhofstad, Computing Surveys, Vol. 30, No. 3, June 1978, provides a theoretical analysis for the various common approaches to recovery of data bases following hardware failure. A specific data recovery capability is postulated in xe2x80x9cThe Recovery Manager of the System R Database Managerxe2x80x9d, by Jim Gray et al., Computing Surveys, Vol. 13, No. 2, June 1981. At section 2.9, Gray et al. recommend that failures of the data base storage media be accommodated through the use of periodic dumps to mass storage along with a simple audit trail to sequentially record each transaction. Not addressed by Gray et al. is the extraordinary length of time required to actually achieve data base recovery in this manner.
Improvements to the audit trail approach are suggested in xe2x80x9cAudit Trail Compaction for Database Recoveryxe2x80x9d, John Kaunitz et al., Communications of the ACM, Volume 27, Number 7, July 1984. Though no particular implementation is taught, Kaunitz et al. do postulate that recovery time could be enhanced by compaction of the audit trail information and by elimination of redundant and unnecessary entries.
Though the prior art does show the posting of audit trail entries to a periodic data base dump for the purposes of recovery from media failures, no accommodation of the extensive recovery time is shown, except for rudimentary compaction of the audit trail entries. For very large scale systems, rapid recovery is necessary to prevent the system from swamping due to the continuing real time transaction inputs during the recovery period.
The present invention overcomes the disadvantages found in the prior art by providing the apparatus for and method of efficiently taking periodic data base dumps and maintaining an audit trail for rapid recovery from data base media failures.
In the preferred mode, and not to be deemed limiting of the present invention, four basic factors directly contribute to improved efficiency. Two of these occur during normal operation of the audit trail recording process and the other two are found during data base recovery following a storage medium failure.
Unlike the prior art systems, the preferred mode of the present invention does not simply save all transactions in serial fashion in the order processed in a single audit trail storage facility. Instead, the audit trail information is segregated according to which physical storage facility or logical file it relates. In this manner, all audit trail data for a given disk drive, for example, is stored together and is separated from the audit trail information which relates to different physical disk drives. In this fashion recovery from a physical disk drive or logical file failure necessitates access only to the audit trail information corresponding to that physical disk drive or logical file. No audit trail data relating to other disk drives or logical files need be accessed. Furthermore, data compaction of the audit trail transaction data provides smaller audit trail storage requirements during normal operation and quicker recovery as is explained below.
A second normal run time feature is directed to the data base dump process. In addition to segregating data base dumps by physical data base storage facility or logical file, the timing of these dumps is directly determined by activity at the individual disk drives or logical files. In the preferred mode for a given physical disk drive or logical file, this is determined by the rate of filling of the associated audit trail information storage space. Whenever, the audit trail storage area becomes filled, the corresponding physical disk drive or logical file is dumped and that audit trail storage area is released to be refilled. As a result, the most active physical disk drives or logical files are dumped the most often. This prevents unnecessary dumping of relatively inactive physical disk drives or logical files and ensures that dumping is only performed when necessary.
At recovery time, only the data base dump and audit trail information associated with the failed physical disk drive or logical file are accessed. Because the audit trail data has been sufficiently compacted and segregated during online operation of the present invention, it can be readily retrieved and stored in audit memory in time sequential order. As each file""s records or pages are read into a data base memory buffer from the data base dump tape, the associated audit trail data is fetched from audit memory, the required changes are made by sequentially applying the audits to the data base memory buffer with the last change being applied last to reflect the latest state of the subject file""s records or pages, and the updated records or pages are written from the data base memory buffer to the output device (e.g. spare disk drive). Thus the complete recovery can be accomplished in essentially the time required to read the magnetic dump tape.
A further improvement may be provided by storage of the changed words of the data base entries and file indices rather than the transaction inputs. This speeds the recovery process by providing a simple substitution of the changed words of the data base entries rather than requiring the processing needed to actually post each transaction. However, this approach requires that the compacted audit trail information be time ordered to permit exclusion of the obsolete entries and provide actual data base changes only for the most recent transactions.
As can be readily seen, each of these enhancements greatly reduces the time required to recover from the failure of a physical disk or logical file and provides reduced process and hardware requirements during normal operation and the recovery process.