Many businesses employ a data processing center in which one or more memory devices (e.g., data storage disks) store their business critical data. The number of memory devices employed by businesses varies as a function of their data storage demands. As will be more fully described below, however, the frequency of data corruption incidents increases with the number of memory devices used to store data.
FIG. 1 shows a data processing center in which a computer system 10 is coupled to a host node 12. Host node 12, in turn, is coupled to data-storage systems 14–20. Each of data-storage systems 14–18 includes memory devices 24–28, respectively, for storing data. Each memory device may include several components (e.g., data storage disks).
Memory device 24 stores a primary data mirror. The primary data mirror is the working data volume for the center shown in FIG. 1. Host node 12 may take form in a computer system (e.g., a server computer system) that receives requests from client computer system 10 or other client computer systems (not shown) to read data from or write data to the primary data mirror.
Memory devices 26 and 28 store mirrored copies of the primary data mirror. The mirrors closely track changes to the primary data mirror. Thus, when host node 12 writes data to primary data mirror in response to a request from client computer system 10, the same data is written to each of mirrors in memory devices 26 and 28. As such, each mirror is maintained as a real-time copy of the primary data mirror.
The mirrors of memory devices 26 and 28 are typically provided as backup solutions in the event of failure of the memory device 24. If memory device 24 suddenly becomes unusable or inaccessible, host node 12 can service read or write requests from client computer system 10 using a mirror in memory device 26 or 28. For example, if memory device 24 becomes inaccessible due to hardware or software failure, host node 12 can respond to a request for primary data mirror data from client computer system by returning data from the mirror of memory device 26.
Failure of memory device 24 is one problem facing businesses that employ large scale data processing systems. Data corruption is another problem. Data corruption has many sources. To illustrate, suppose host node 12 receives new data Dnew from client computer system 10 coupled thereto. This new data Dnew must replace existing data Dold within the primary data mirror. Improper operation of hardware or software may result in existing data Dold not getting overwritten with the new data Dnew. The new data Dnew may inadvertently get written to a disk track in a storage disk of memory 24 adjacent to the disk track that stores the existing data Dold (mis-tracking). It is also possible new data Dnew may not get written to the disk at all. When this happens, two tracks of the storage disk contain invalid or corrupted data. But the host node 12 believes the existing data Dold has been properly overwritten with the new data Dnew. If host node 12 receives a subsequent request from computer system 10 to read the new data Dnew thought to be stored in the primary data mirror, Dold will be returned rather than Dnew. A different manifestation of improper operation of software or hardware may also result in new data Dnew not getting written to the disk at all, while the write completes successfully. Yet another manifestation of improper operation of software or hardware may be experienced when one or more bits in new data Dnew is corrupted in the course of data transmission and handling (hit-flipping), resulting in corrupted data getting written over one of the copies of Dold.
Redundant storage offers multiple copies of the same data into multiple data storage memories. Chances of more than a single copy of the data getting corrupted in ways described above are vanishingly small and the alternative data copies could be used to correct the corruption cases described above if they can be detected. Client computer system 10 may perform a checking algorithm on the data returned by host node 12. If computer system 10 recognizes that the data returned is invalid, the computer system sends a second request for the same data. Unfortunately, host node 12 will once again return Dold in response to the second request.