Many businesses employ a data system in which one or more memory devices (e.g., data storage disks) store critical data. The number of memory devices employed in a data system varies as a function of the data storage demands. As will be more fully described below, however, the frequency of data corruption incidents increases with the number of memory devices used to store data.
FIG. 1 shows a data system in which a computer system 10 is coupled to a host node 12. Host node 12, in turn, is coupled to data-storage systems 14 and 16. Each of data-storage systems 14 and 16 includes memory devices 24 and 26, respectively, for storing data. Each of the memory devices 24 and 26 may include several components (e.g., data storage disks). For purposes of explanation, memory devices 24 and 26 contain a single data storage disk, it being understood that the term memory device should not be limited thereto.
The data storage disks 24 and 26 store mirrors M0 and M1, respectively, of a mirrored data volume V. Mirror is M0 the working data volume for the system shown in FIG. 1 in that host node 12 reads data from or writes data to the mirror M0 in response to a read or write request from client computer system 10 or other client computer systems (not shown). Host node 12 may take form in a computer system (e.g., a server computer system). read or write request from client computer system 10 or other client computer systems (not shown). Host node 12 may take form in a computer system (e.g., a server computer system).
Mirror M1 closely track data changes to mirror M0. When host node 12 writes new data to mirror M0, the same data is also written to mirror M1 in disk 26 via a separate transaction (hereinafter referred to as a mirroring write transaction). As such, mirror M1 is maintained as a real or near real-time copy of mirror M0. The mirror of disk 26 is typically provided as a backup solution if data mirror M0 in disk 24 is rendered inaccessible as the result of hardware or software failure. Thus, if disk 24 suddenly becomes inaccessible, host node 12 can continue to service read or write requests from client computer system 10 using mirror M1 in disk 26.
Failure of disk 24 is one problem facing businesses that employ large scale data storage systems. Data corruption is another problem. Data corruption has many sources. Data corruption can occur, for example, when host node 12 fails to properly overwrite old data with new data. To illustrate, suppose host node 12 seeks to overwrite old data Dold in mirror M0 with new data Dnew in response to a write request received from computer system 10. As a result of improper operation of hardware or software, new data Dnew is inadvertently written to a track in disk 24 near the disk track that stores the old data Dold. This type of data corruption is often referred to as mis-tracking. Yet another example of data corruption may occur when one or more bits in new data Dnew are inadvertently flipped just before the new data Dnew is written to disk 24. This type of data corruption is often referred to as bit-flipping and often occurs while data is handled in transit to its ultimate storage location. As a result of bit-flipping, the track that stores old data Dold is overwritten with bad data. Another type of error corruption can occur when new data Dnew is not written to disk 24 at all even though the host node 12 believes the new data Dnew to be written. When any of these types of errors occur, one or more instances of data corruption may occur on disk 24. While corruption may occur to disk 24 as a result of writing new data Dnew, the new data Dnew may be properly written to disk 26 via the mirroring write transaction.
Host node 12 may not be aware that the disk 24 contains corrupted data. To illustrate this effect, suppose host node 12 receives a first request to read data identified by some name (e.g., a filename or block ID number). Host node 12 accesses and reads data stored in disk 24 on one or more tracks corresponding to the name of the data sought. The tracks, however, contain data corrupted as a result of mis-tracking. Host node 12 may lack the ability to determine whether the data read from disk 24 is corrupted. As such, host node 12 may unwittingly return a copy of the corrupted data to computer system 10.
Client computer system 10, however, may be able to detect data corruption. Client computer system 10 may perform a checking algorithm on the data returned by host node 12 to identify data corruption. If computer system 10 recognizes that the data returned is corrupted, the computer system may send a second request for the same data. Unfortunately, host node 12 will once again return the same corrupted data from disk 24 in response to the second request.