1. Field of the Invention
The present system and method relates generally to data management for computer systems, and in particular to a system and method for detecting and correcting data corruption on storage media.
2. Description of the Background Art
Conventionally, many computer systems utilize disk storage devices for permanent data storage. Originally, data storage demands grew proportionately with the number of disk storage devices. Over the past decade, greater data storage demands have led to a disproportionate increase in the number of disk storage devices.
A decrease in price of disk storage devices has helped computer system administrators add disk storage devices to satisfy the increased data storage demands. However, as the number of disk storage devices grew, data corruption across multiple disk storage devices became harder to detect and correct.
Disk storage devices are examples of storage media. Storage media is any component attached to a computer system that stores data. Specifically, disk storage devices are examples of storage media for permanent data storage. Other types of storage media can include random access memory (RAM) and volatile and non-volatile cache memory. Similar to disk storage devices, data corruption can occur in RAM and cache memory.
FIG. 1 is a block diagram of a conventional computer system 100 with a typical physical configuration that includes a central processing unit (CPU) 110, a memory 120, a cache 130, a disk controller 150, and a disk 160. Exemplary storage media can include the memory 120, the cache 130, and the disk 160 for data storage. Typically, variations of the disk 160 can include a plurality of magnetic, optical, or other type of storage media for permanent data storage. Data can be stored on the memory 120, the cache 130, or the disk 160, all of which can be coupled to a system bus 140 to communicate with one another. However, volatile cache and the memory 120 may lose stored data when the computer system 100 experiences an electrical power loss. Conversely, non-volatile cache (NVRAM) and the disk 160 do not suffer data loss when the computer system 100 loses electrical power.
The disks 160 and disk data can be connected to multiple computer systems. The multiple computer systems include a primary computer system and at best one stand-by secondary computer system. If the primary computer system is unavailable, a clustering or a high availability (H/A) software ensures the availability of the disks 160 and disk data by transferring control (failover) to the stand-by secondary computer system. NVRAM will not lose data, but unlike the disks 160, NVRAM cannot be used for H/A or failovers. This diminishes NVRAM usefulness for preserving and assuring data correctness.
FIG. 2 shows an exemplary logical configuration of the computer system 100 (FIG. 1). In contrast with the physical configuration of the computer system 100, such as the physical arrangement of the CPU 110 (FIG. 1) and the memory 120 (FIG. 1), the logical configuration is a user-conceptualized representation of the data and the computer system 100. As an example, a user can view the logical configuration of a data file as a group of data blocks on the disk 160 (FIG. 1) in one location. In reality, the physical configuration of the data blocks is not grouped at one location on the disk 160. Instead, the data blocks can be allocated randomly throughout the disk 160.
The exemplary logical configuration includes an application program 210, an operating system 220, and a storage 240. The operating system 220 further includes an optional file system 225, a volume manager 230, and optional storage device drivers 235. Often, the storage 240 includes the disk 160 variations. The application program 210 can be a computer program that generates system operations for the operating system 220. Typically, the operating system 220 instructs the CPU 110 to execute the system operations as instructions. When the operating system 220 generates instructions that require interaction with the storage 240 via the file system 225, the volume manager 230 maps the logical configuration of data that is represented on the storage 240 to the physical configuration of data on the disk 160 through the storage device drivers 235.
Some system operations are data read or data write operations, which require the CPU 110 to interact with the memory 120, the cache 130, the disk controller 150, or the disk 160. Referring to FIGS. 1 and 2, for example, the CPU 110 can execute a read instruction for a data read operation of data already stored on the disk 160. Alternatively, the read instruction may require the CPU 110 to read the data from the memory 120 and the cache 130. If data is not present in the memory 120 or the cache 130, a request is made for the data by the volume manager 230, which communicates through the storage device drivers 235 to the disk controller 150 and subsequently to the disk 160. Once the proper data is found, the data is returned to the application 210 that initiated the data read operation.
The diagrams on FIG. 3A and FIG. 3B show further variations of the storage 240 (FIG. 2) mappings to the disk 160 (FIG. 1). Exemplary embodiments of the physical configuration can include three disks 310 coupled to one another as shown in FIG. 3A or three groups of disks 320 coupled to one another as shown in FIG. 3B. Alternatively, other variations in the number of disks 310 or groups of disks 320 can be utilized. Similar to the storage 240 of the disk 160 mapping, the volume manager 230 (FIG. 2) maps the logical configuration of data that is represented on the storage 240 to the physical configuration of data on the three disks 310 and three groups of disks 320.
Preferably, the data returned from a data read operation is error free. When an error occurs, however, the operating system 220 (FIG. 2) and the volume manager 230 correct the errors if they can be detected. Disadvantageously, as disk storage devices grow in number and size, more errors from system operations may occur that can be outside the ability of the operating system 220 or the volume manager 230 to detect or correct. Examples of these errors include bit flipping, mistracking, miscaching, and input/output (I/O) status errors.
Bit-flipping errors occur from data corruption on the disk 160 (FIG. 1), the disk controller 150 (FIG. 1), the CPU 110 (FIG. 1), or any other component of the computer system 100 (FIG. 1). The corruption causes bits in the data block to randomly flip, such as 0 to 1 or 1 to 0. For example, in a data block with “10110,” the last two bits can be flipped to result in an erroneous value of “10101.”
Mistracking errors occur during a data write operation when the data block is written to a wrong location of a component of the computer system 100. For example, the data block can be written to an incorrect location I on the disk 160 instead of a correct location C. Then, during a subsequent data read operation for the data in location C, the data returned is incorrect.
An example of a reporting error is a miscaching error. Typically, CPU 110 receives a report that data is present in a cache 130 (FIG. 1). However, in one type of miscaching error, the cache 130 may lose the data and still report the presence of the data. Therefore, when a read of the cache 130 is attempted, no data can be found. In another type of miscaching error, the data returned from the cache 130 is incorrect.
Another type of reporting error is the I/O status error, which incorrectly reports the status of the disk 160 (FIG. 1) to the CPU 110. For example, valid data can exist in a location V. However, with an I/O status error, the disk 160 or disk controller 150 (FIG. 1) can report invalid data at location V to the CPU 110. The result is the erroneous information that invalid data exists at location V that can subsequently be overwritten by other data.
When errors occur such as those described above, a simple solution known in the art to ensure valid data availability is mirroring. Mirroring is the duplication of data during real-time system operations on any storage media. For example, during a data write operation on the disk 160, a copy of the disk 160, called a mirror disk, receives the same data. Therefore, when data on the disk 160 is corrupted, the mirror disk is available to provide an uncorrupted copy of the data. Unfortunately, mirroring is not error-free. If mistracking or any of the errors described above occur during a synchronized data write operation, then it is difficult to discern which of the disk 160 and the mirror disk contains the valid data.
Another solution known in the art for avoiding errors is to establish a data striping solution that allocates data evenly across a set of disks 160 in combination with a backup. Data striped across a set of disks 160 behave as one disk 160. The backup is a copy of the data striped across the set of disks 160. If any disk fails on the set of disks 160, the missing data can be retrieved from the backup.
Unfortunately, even this solution does not solve bit-flipping errors. After a bit-flipping error, the data on the set of disks 160 is compared with the data on the backup, however, there is no known application-independent, general method to determine the correct data between the set of disks 160 and the backup. Typically, an application-dependent method to determine correct data between the set of disks 160 and the backup involves the use of a checksum within an application such as an Oracle® database server or a Microsoft® Exchange server. The checksum can be used to perform the comparison, but the checksum computation imposes a non-trivial overhead on the operating system 220 (FIG. 2) when performing I/O. The non-trivial overhead involves extra computations, which wastes valuable computer processing time.
Similarly, data stored among the different components of the computer system 100 (FIG. 1) such as the memory 120 (FIG. 1), the cache 130 (FIG. 1), or the disk 160 can also result in data inconsistencies. For example, data inconsistencies that occur between the memory 120 and the cache 130 are typically solved by cache coherence protocols such as snooping protocols and directory-based protocols. However, data inconsistencies among all the components of the computer system 100 are not solved by cache coherence protocols.
Therefore, what is needed is a technique that permits the logical configuration of a computer system to detect and correct errors that are not handled by existing techniques.