In the field of electronic data processing, it has become apparent over the years that errors are wont to occur both in hardware and in software. Such errors can be disastrous if not detected, corrected or accounted for in a timely manner. Many systems have been invented for preventing errors, but none of them is foolproof.
In general, after an error has been detected, one of two basic approaches can be implemented: correct the data where it resides or copy a correct version of the data from elsewhere. Systems for copying data in light of impending errors have been known for some time in the computer field. For example, U.S. Pat. No. 3,866,182 issued to H. Yamada et al discloses a system for transferring information between memory banks upon detection of an error in one of the memory banks. Here, one of the memory banks serves as an operating device while another one serves as a standby device. After an error occurs, data that is preserved in the standby device can be reloaded into the operating memory. A separate memory-to-memory transfer circuit is required. It appears that the first active memory cannot be used while the standby memory is being accessed.
It is difficult to predict and impossible to prevent error generation. The best that can be done is to handle erroneous information in such a way so as not to affect the general operation of a computer system. To that end, systems have been developed to treat errors separately. In the field of hardware error handling, systems have been devised to deal only with correct data, not the errors embedded therein.
For example, U.S. Pat. No. 4,394,728 issued to J. A. Comfort et al discloses a hardware system for accessing multiple common resources. This represents an access control system for allowing two or more devices to access a common resource (e.g., a memory device in the Comfort patent). This reference is relevant to the discussion of error handling because it incorporates a concept of secondary memories which are used as ready-standby devices in the event of data transfer faults or system failure. Each secondary memory device is operatively connected to a principal device and is called into action when the principal device fails. The aforementioned reference deals with memory in a hardware system.
U.S. Pat. No. 3,882,455 issued to D. A. Heck et al teaches a digital communication system having a facility for duplicating data residing in a central processor, instruction storage, process storage and peripheral controllers. This facility is a hardware system operating under direction of a fault recovery program to handle malfunctions in one part of a storage area by making a copy of the original in another storage area. The storage areas contain data in a plurality of peripheral control units and the duplication process is handled on a unit by unit basis. Each peripheral control unit is designated as either active or standby. When a complete set of units in a complex, sometimes known as a physical device, can be placed in service, the fault recovery program does so. Anything less than a complete set of units, however, cannot be installed in a system.
The aforementioned hardware system is used in switching networks and requires a significant amount of specialized hardware for its operation.
In the past it has been difficult to guarantee data integrity in computer systems due to errors that may occur as a result of hardware or software failures.
In particular, problems associated with hardware or software failures that occur either when data is written (i.e., stored) or when previously written data is read at a later time (i.e., retrieved) have simply not been solved by hardware error handling systems. The general problem of protecting data during read and/or write operations has not been adequately addressed.
In hardware systems, error handling has tended to be expensive and relatively inflexible. For example, hardware systems often require that the data residing on an entire physical device be preserved by means of copying. Moreover, a direct correspondence between the original or source data and the copied or target data must be available in hardware error handling systems.
It would be advantageous to provide a facility for copying data (i.e., mirroring) on a logical device basis, rather than on a physical device basis.
Further, it would be advantageous for the mirroring operation to be controllable by a general user at will. That is, there should be no requirement for a user to be privileged or for some extraneous automatic hardware or software process to use an error handling system, nor should a general user require assistance from a privileged user.
It would further be advantageous to provide a system of error handling that does not require dedicated duplicate devices with special hardware associated therewith.
Moreover, it would be advantageous to be able to duplicate data simultaneously from one logical device to another in real time, instruction by instruction.
It would also be advantageous to perform a read operation singly from a logical device unless a data failure is detected and, in that case, to perform another read operation on another logical device. That is, a second read operation should be required only when a first read operation is unsuccessful. It would also be advantageous to provide a system of error handling that would operate successfully either synchronously or asynchronously.
It would further be advantageous to provide for continuous operation even upon failure of one or more logical devices.