1. Field of the Invention
The present invention relates to a system for maintaining a backup for a computer disk drive in the event of failure, for the prevention of data loss. More specifically, the invention relates to a method and apparatus for enabling the user to restore either a file or the entire contents of a disk, as they existed on that disk at any prior point in time by creating a virtual copy of the disk at that point in time.
2. Description of the Prior Art
Computers have become an integral part of both business and personal life. Each day, countless transactions are made either utilizing computers or being completely controlled by them. While today's computers are faster and more efficient than prior methods of recordkeeping and data transfer, older techniques employing paper records had one distinct advantage: backup in the case of lost data. Duplicate records could be stored in any one of a number of places, and in various forms. Similarly, if the equipment or personnel creating, reading or interpreting the data were to fail or be unable to complete a task, continuation of the task was easily accomplished by transferring the past work to substitute machines or people. This is not always possible with an electronic computing device.
One particular difficulty with computers is the storage medium. The vast majority of data utilized in conjunction with computers is stored on magnetic media. While great care is taken to insure the accuracy of this media, some flaws may be present. Additionally, continued use of such media leads to faults or other malfunctions related to the electromechanical methods of reading and writing data to these data storage devices.
A second difficulty, closely related to the first, is the inability to recover lost data, once a fault has occurred. When utilizing paper records, the older versions of a document, for example, could always be retrieved. This is not the case with magnetic media if: (1) the data has been over-written on a disk, or (2) the disk is inaccessible due to physical faults. Additionally, a general shortcoming of prior art backup systems is that backups are only kept for a short period of time. The physical space required to keep a constant stream of new backup disks or tapes would soon outgrow even the largest storage facility. This limitation thus dictates the re-use of backup media, and the continuous trail of past data is gone.
Prior inventors have proposed several solutions to this problem, both from a hardware and software vantage point. While many of the devices are concerned with failure of an electronic component, i.e., the processing device or the volatile memory, rather than the more permanent storage media, the techniques are analogous in most respects to permanent media. The solutions frequently posed in this area are the use of redundant components which may be utilized in the event of failure in a primary component. Transparent and immediate error detection and switchover means are therefore a primary adjunct focus in the utilization of these redundant devices. Redundant data backup is also well documented in the prior art, where the backup device contains a duplicate copy or "mirror" of the information utilized by or stored within the primary device. Frequently, however, no teaching is present for the restoration of data from the redundant device to the primary device once the primary device has come back "on line."
Redundant hardware components are disclosed in Grieg, et. al., U.S. Pat. No. 4,607,365, which teaches a system that will automatically select the secondary components as needed because of faults in the system. Similarly, Yoshida, et. al., U.S. Pat. No. 4,727,516, discloses a system having redundant memory arrays. If a defective memory cell is detected, a sister cell is utilized in its place. Redundant processors are disclosed in U.S. Pat. Nos. 4,484,275 and 4,378,588, issued to Katzman, et. al., which utilize the redundancy to insure that if a processor or data path is interrupted, secondary parallel processors are available to take the load.
Even a single chip may have redundant components to be utilized in the event of a failure of the primary systems as disclosed in Chesley, U.S. Pat. No. 4,191,996, which discloses such multiple processors and memory circuits.
Mirrored data, which is stored on an alternate device, is discussed in Varaiya, et. al., U.S. Pat. No. 4,754,397, which discloses a multiple disk drive apparatus which mirrors data storage operations to a disk, or "writes," to a number of disks. Should a disk fail, a redundant drive is selected. Richer, U.S. Pat. No. 4,351,023, discloses the use of redundant controllers in a similar manner. One controller serves in a primary role, the other as backup. In the event of a shutdown, the state of the first controller is transferred to the second, which has been "tracking" the functions of the first controller on a delayed basis. After restoration of the first controller, its primary status is resumed, and the data is transferred thereto. While the patent is not particularly explicit in reference to the data transfer from the second controller to the first, such a transfer appears necessary in light of the need for the state of the machine to be continued intact.
Hess, et. al., U.S. Pat. No. 4,581,701, discloses an apparatus which utilizes redundant hardware to provide a backup in the event of a failure. More importantly, however, the apparatus monitors and stores the status of the host through the use of a continuously updated memory buffer. Upon failure, the status, which is contained in the buffer, is transferred from one unit to another, permitting continued function without interruption.
Error detection to facilitate the automatic implementation of the secondary devices is also well developed. Hendrie, et. al., U.S. Pat. No. 4,750,177, Reid, U.S. Pat. No. 4,453,215, Samson, et. al., U.S Pat. No. 4,654,857, and Wolff, et. al., U.S. Pat. No. 4,486,826, all disclose a method and apparatus for detecting errors in a system and selecting secondary devices for replacement.
The most pertinent art, however, relates to the ability to store and retrieve a prior state or status of the machine after a failure. Ziehm, et. al., U.S. Pat. No. 4,521,847, discloses a system which permits the restoration of the state and status of a processing device from a non-volatile memory source. The state and status of the processor and memory are continually updated in the non-volatile memory area. After a power shutdown or other malfunction, the operating state of the host machine may be restored to its last prior state, through transfer of stored information from the non-volatile memory.
Others have taught that processing and memory functions ay be assumed, if the host's operation is halted, by a node of a network. Data may then be recovered from the node at a later time. Rawlings, et. al, U.S. Pat. No. 4,156,907, discloses a data communications subsystem which provides that in the event of a shutdown, both processing and memory functions are transferred from the host to each of several nodes. This information may then be uploaded back to the host after restoration of service.
The primary focus of most of the prior art is thus the use of redundant components for utilization during a failure of the host, or the restoration of the prior state of the system after the first components are replaced. A secondary focus is error detection and implementation of such a redundant system thereafter.
There is thus a need for a device which will enable the user to not only keep a constant backup of a data storage device, such as a magnetic disk, but also keep the backup in the form of a readily retrievable continuous record of the state of that storage device.
A secondary ability not found in the prior art is the physical replacement of a faulty device and the replacement of data on the substitute without halting the system. In a network system especially, where many users share the data storage space, the loss of a physical device means more than just lost data--there is a considerable cost in "downtime" as the unit is repaired or replaced. No system is known which enables the rapid and easy replacement of physical devices while maintaining a portion of the functionality of the network.