Many businesses employ a data system in which one or more memory devices (e.g., data storage disks) store critical data. The number of memory devices employed in a data system varies as a function of the data storage demands. As will be more fully described below, however, the frequency of data corruption incidents increases with the number of memory devices used to store data.
FIG. 1 shows a data system in which a computer system 10 is coupled to a host node 12. Host node 12, in turn, is coupled to data-storage systems 14–18. Each of data-storage systems 14–18 includes memory devices 24–28, respectively, for storing data. Each of the memory devices 24–28 may include several components (e.g., data storage disks). For purposes of explanation, memory devices 24–28 contain a single data storage disk, it being understood that the term memory device should not be limited thereto.
The data storage disks 24–28 store a mirrored data volume. For purposes of explanation, the mirrored volume includes three mirrored data volumes or mirrors. One mirror is designated as a primary. Disk 24 stores the primary. Disks 26 and 28 store the mirrored copies of the primary. The primary is the working data volume for the system shown in FIG. 1 in that host node 12 reads data from or writes data to the primary in response to a read or write request from client computer system 10 or other client computer systems (not shown). Host node 12 may take form in a computer system (e.g., a server computer system).
The mirrors in disks 26 and 28 act as redundant backups to the primary. When host node 12 writes new data to the primary, the same data is written to each of the mirrors in disks 26 and 28. Host node 12 does not consider a write of data complete until data has been written to all available mirrors including the mirrors in disks 26 and 28. As such, each mirror is maintained as a real-time copy of the primary. The mirrors of disks 26 and 28 are typically provided as backup solutions if the primary in disk 24 is rendered inaccessible as the result of hardware or software failure. Thus, if disk 24 suddenly becomes inaccessible, host node 12 can continue to service read or write requests from client computer system 10 using a mirror in disk 26 or 28. Because all mirrors contain identical data, the mirrors are also used to improve read performance of the data volume by directing multiple simultaneous read operations to different mirrors, to be executed simultaneously.
Failure of disk 24 is one problem facing businesses that employ large scale data storage systems. Data corruption is another problem. Data corruption has many sources. Data corruption can occur, for example, when host node 12 fails to properly overwrite old data with new data. To illustrate, suppose host node 12 seeks to overwrite old data Dold in the primary with new data Dnew in response to a write request received from computer system 10. As a result of improper operation of hardware or software, new data Dnew is inadvertently written to a track in disk 24 near the disk track that stores the old data Dold. This type of data corruption is often referred to as mis-tracking. Yet another example of data corruption may occur when one or more bits in new data Dnew are inadvertently flipped just before the new data Dnew is written to disk 24. This type of data corruption is often referred to as bit-flipping and often occurs while data is handled in transit to its ultimate storage location. As a result of bit-flipping, the track that stores old data Dold is overwritten with bad data. Another type of error corruption can occur when new data Dnew is not written to the disk at all even though the host node 12 believes the new data Dnew to be written. When any of these types of errors occur, one or more instances of data corruption may occur on disk 24.
Host node 12 may not be aware that the disk storage device contains corrupted data. To illustrate this effect, suppose host node 12 receives a first request to read data identified by some name (e.g., a block ID or a filename). Host node 12 accesses and reads data stored in disk 24 on one or more blocks corresponding to the name of the data sought. The blocks, however, contain data corrupted as a result of mis-tracking. Host node 12 may lack the ability to determine whether the data read from disk 24 is corrupted. As such, host node 12 may unwittingly return a copy of the corrupted data to computer system 10.
Client computer system 10, however, may be able to detect data corruption. Client computer system 10 may perform a checking algorithm on the data returned by host node 12 to identify data corruption. If computer system 10 recognizes that the data returned is corrupted, the computer system may send a second request for the same data. Unfortunately, host node 12 is likely to once again return the same corrupted data from disk 24 in response to the second request.