This invention relates to data storage for computers, and more particularly to method and apparatus for diagnosing and repairing data stored in a system including redundant information.
Relatively early in the development of computer systems, disk drives became a fundamental device for storage. Accordingly, computer operating systems and application programs have been developed assuming that memory will rely on input/output (xe2x80x9cI/Oxe2x80x9d) to a disk drive. The demand for storage has also skyrocketed. As a result a number of separate physical devices may be required to accommodate the total amount of storage required for a system.
The result, described briefly below, is that a number of strategies have developed for placing data onto physical disk drives. Indeed, there are a variety of ways of mapping data onto physical disks, as is generally known in the art.
It would be highly inefficient, however, to have to change the operating system and/or application programs every time a change is made to the physical storage system. As a result, there has been a conceptual separation of the application""s view of data storage and the actual physical storage strategy.
FIG. 1 illustrates this concept. The application/operating system""s view of the storage system contemplates three separate storage devicesxe2x80x94logical volume A 10, logical volume B 11, and logical volume C 12. Thus, as far as the operating system can discern, the system consists of three separate storage devices 10-12. Each separate storage device may be referred to as a xe2x80x9clogical volume,xe2x80x9d xe2x80x9clogical disk,xe2x80x9d or xe2x80x9cvirtual disk.xe2x80x9d These names reflect the fact that the application""s (or operating system""s) logical view of the storage device structure may not correspond to the actual physical storage system implementing the structure.
In FIG. 1, the data is physically stored on the physical storage devices 14-16. In this particular example, although there are three physical devices 14-16 and three logical volumes 10-12, there is not a one to one mapping of the logical volumes to physical devices. In this particular example, the data in logical volume A 10 is actually stored on physical devices 14-16, as indicated at 10a, 10b and 10c. In this example, logical volume B is stored entirely on physical device 14, as indicated at 12a, 12b. Finally, logical volume C is stored on physical device 14 and physical device 16 as indicated at 11a, 11b. 
In this particular example, the boxes 10a-10c, 11a-11b and 12a-12b represent contiguous segments of storage within the respective physical devices 14-16. These contiguous segments of storage may, but need not, be of the same size.
Array management software running on a general purpose processor (or some other mechanism such as a custom hardware circuit) 13 translates requests from a host computer (not shown) (made assuming the logical volume structure 10-12) into requests that correspond to the way in which the data is actually stored on the physical devices 14-16. In practice, the array management software 13 may be implemented as a part of a unitary storage system that includes the physical devices 14-16, may be implemented on a host computer, or may be done in some other manner.
The physical storage devices shown in FIG. 1 are disk drives. Disk drives include one or more disks of a recording media (such as a magnetic recording medium or an optical recording medium). Information can be written and read from the storage medium for storage purposes. The recording medium is typically in the form of a disk that rotates. The disk generally includes a number of tracks on which the information is recorded and from which the information is read. In a disk drive that includes multiple disks, the disks are conventionally stacked so that corresponding tracks of each disk overlie each other. In this case, specification of a single track on which information is stored within the disk drive includes not only specification of an individual track on a disk, but also which of the multiple disks the information is stored on.
Data on each physical device 14-16 may be stored according to one or more formats. Similarly, the request for data from the operating system or application program may correspond to one or more such formats. For example, large disk storage systems employed with many IBM mainframe computer systems implement a count, key, data (xe2x80x9cCKDxe2x80x9d) record format on the disk drives. Similarly, programs on such computers may request and expect to receive data according to the CKD record format. In the CKD format, the record includes at least three parts. The first part is a xe2x80x9ccount,xe2x80x9d which serves to identify the record and indicates the lengths of the (optional) key field and the data portion of the record. The key field is an optional field that may include information about the record. The xe2x80x9cdataxe2x80x9d portion of the record includes the actual user data stored by the record. The term xe2x80x9cdataxe2x80x9d refers to any information, including formatting information of a record. xe2x80x9cActual user dataxe2x80x9d refers to the data actually desired for use by the host computer, such as the information in the data field of a CKD record.
Disk drives that do not employ a CKD record format typically use a fixed block architecture (xe2x80x9cFBAxe2x80x9d) format. In an FBA storage system, each track of a disk is divided into a number of blocks, each having the same size.
Of course, it is possible to use an FBA disk drive system to store data formatted according to the CKD record format. In this case, the array management software 13 must perform the necessary translations between the CKD and FBA formats. One mechanism for performing this function is described in U.S. Pat. No. 5,664,144, entitled xe2x80x9cSystem and method for FBA formatted disk mapping and variable length CKD formatted data record retrieval,xe2x80x9d issued on Sep. 2, 1997.
In a system including an array of physical disk devices, such as disk devices 14-16 of FIG. 1, each device typically performs error detection and/or correction for the data stored on the particular physical device. Accordingly, each individual physical disk device detects when it does not have valid data to provide and, where possible, corrects the errors. Even where error correction is permitted for data stored on the physical device, however, a catastrophic failure of the device would result in the irrecoverable loss of data.
Accordingly, storage systems have been designed which include redundant storage capacity. A variety of ways of storing data onto the disks in a manner that would permit recovery have developed. A number of such methods are generally described in the RAIDbook, A Source Book For Disk Array Technology, published by the RAID Advisory Board, St. Peter, Minn. (5th Ed., February, 1996). These systems include xe2x80x9cRAIDxe2x80x9d storage systems. RAID stands for Redundant Array of Independent Disks.
FIG. 2A illustrates one technique for storing redundant information in a RAID system. Under this technique, a plurality of physical devices 21-23 include identical copies of the data. Thus, the data M1 can be xe2x80x9cmirroredxe2x80x9d onto a portion 21a of physical device 21, a portion 22a of physical device 22 and a portion 23a of physical device 23. In this case, the aggregate portions of the physical disks that store the duplicated data 21a, 22a and 23a may be referred to as a xe2x80x9cmirror group.xe2x80x9d The number of places in which the data M1 is mirrored is generally selected depending on he desired level of security against irrecoverable loss of data.
FIG. 2A shows three physical devices 21-23 which appear to be located in close proximity, for example within a single storage system unit. For very sensitive data, however, one or more of the physical devices that hold the mirrored data may be located at a remote facility.
xe2x80x9cRAID 1xe2x80x9d is an example of data redundancy through mirroring of data. In a RAID 1 architecture, a number of different mechanisms may be used for determining how to access and update data to improve, for example, performance of the storage system. In any event, a RAID 1 architecture certainly has the ability to recover lost data. Unfortunately, the RAID 1 architecture multiplies the cost of physical storage by the number of xe2x80x9cmirrorsxe2x80x9d included in the mirror group.
FIG. 2B illustrates a solution that requires less added storage. In FIG. 2B, data is stored at locations 24a-24d. In this particular example, the physical device 23 includes parity information P1 at 25a, 25b. The parity information is generated by a simple exclusive-OR (xe2x80x9cXORxe2x80x9d) of the corresponding bits of data. Thus, the parity information P1 would be generated by XORing the corresponding bits of the data D1 and data D2.
While xe2x80x9cparityxe2x80x9d redundancy is used in the illustrative examples of the present application, this is not intended as limiting. The invention may be applied, based on the disclosure herein, to other schemes that use more than a single bit to record error detection or correction information. For example, aspects of the invention may be applied to a RAID 2 system that uses Hamming codes for error correction.
A variety of mechanisms are known for distributing the parity information on the physical devices. In the example shown in FIG. 2B, all of the parity information is stored on a single physical device 23. In other cases, the parity information may be distributed across the physical devices.
In the event that parity data is not all stored in the same physical device, the parity segments may be allocated to physical storage devices in units the size of the smallest writable segment of disk memory, or more. Indeed, parity for an entire logical volume may be allocated to a single physical storage device before parity is designated for storage on a different physical device.
For both mirror groups and redundancy groups in a disk system, data is updated in discrete portions. In a disk system, there is a smallest unit of data that may be written from or read to the disk. In an FBA architecture, this unit is a single xe2x80x9cblockxe2x80x9d of data, having a fixed size. This size may be, for example, 512 bytes. In a CKD architecture, the smallest unit of data that may be written is a CKD record (of variable length). In a random access memory, the smallest unit is often a byte or 16 (or 32) bit word. In any event, xe2x80x9cwrite unitxe2x80x9d will refer to the smallest unit of data that may be read or written from the disk storage system. The amount of parity information written on as physical storage unit, before storing parity on a different physical storage unit, may or may not correspond to the size of the write unit.
Within a given disk array, there is no need for all of the data to follow the same redundancy rule.
FIG. 3 illustrates this concept. In FIG. 3, a first group of storage segments on physical devices 30-32 form a mirror group 34. In the mirror group 34, the entire contents of a single logical volume (LV-A) are mirrored on three different physical devices 30-32.
In FIG. 3, a single logical volume is stored on the fourth physical device 33, without any redundancy information, as indicated at 36.
Finally, a last group of data segments 35 on all four physical devices 30-33 implement a parity redundancy scheme. In this particular example, the parity information is stored in segments of memory on two different physical devices 32-33, as indicated at 37a and 37b. 
The data segments in the mirror group 34 and parity group 35 may each be referred to as part of their corresponding xe2x80x9credundancy group.xe2x80x9d Mirror group 34 and parity group 35 both include redundant information, although stored in different waysxe2x80x94the former as a copy of the information, the latter as parity information from which a copy of the information may be derived.
According to one embodiment of the present invention, a method of determining if a data coherence problem exists in a storage system is disclosed. According to this embodiment, a data unit format value, stored in each copy of a plurality of corresponding copies of a data unit, is compared with a known correct value. Based on the comparison, copies which do not include correct data are identified. The data unit may be stored on one of a plurality of mirrors in the storage system and may be a fixed block size. The method may include the step of repairing data units determined not to be correct. The data unit format value may vary, depending on the intended physical location of the data unit in the respective mirror. For example, the data unit format value may include a logical block address.
According to another embodiment of the present invention, a storage system is disclosed. According to this embodiment, the storage system includes a plurality of storage devices, storing a redundancy group. The storage system includes a data coherence tester, that comprises means for comparing a data unit format value, stored in each corresponding copy of a data unit, with a correct value known in advance. The storage system may include means for determining which of the copies do not have correct data.
According to another embodiment of the present invention, a method of determining if a data coherence problem exists in a storage system is disclosed. According to this embodiment, a plurality of corresponding copies of a data unit are provided. The method determines whether a data coherence problem exists among the copies, but without an exhaustive examination of the actual user data in the data unit. The step of determining whether a data coherence problem exists may comprise a step of comparing error code information stored in corresponding copies of the data unit. The data unit itself may be stored on one of a plurality of mirrors in the storage system.
According to another embodiment of the present invention, a storage system is disclosed. According to this embodiment, the storage system comprises a plurality of storage devices to store a redundancy group. The storage system further includes a data coherence tester, to identify data coherence problems among corresponding copies of one of the data units stored on the storage devices. The data coherence tester includes means for determining whether a data coherence problem exists, without an exhaustive examination of the actual user data in the data unit.
According to another embodiment of the present invention, a method of determining if a data coherence problem exists between a plurality of copies of a data unit is disclosed. According to this embodiment, data unit composition information in each corresponding copy of a data unit is examined to determine if one of the copies does not contain up to date information. The method includes a step of comparing actual user data stored in the copies, if a data coherence problem is not identified when the data unit composition information is examined. The data unit may be stored on a plurality of mirrors in the storage system. The method may include a step of identifying at least one of the copies of the data unit that has up to date information. The data unit composition information may be a mask field intended to indicate invalid mirrors of the corresponding data, or may comprise an error code, such as a cyclic redundancy code.
According to another embodiment of the present invention, a storage system is disclosed, which includes a plurality of storage devices in a data coherence tester. The data coherence tester is to identify data coherence problems among corresponding copies of a data unit. The data coherence tester may comprise means for examining data unit composition information to determine if a copy does not contain up to date information and means for comparing actual user data of the copies of the data unit, where the means for examining does not identify the copy as having data that is not up to date.
According to another embodiment of the present invention, a method of determining if a data coherence problem exists between a plurality of mirrors of segments of data, each segment of data including a plurality of data units, is disclosed. According to this embodiment, a respective copy of a data unit from each respective segment of data is read and examined to determine if subsequent data units in the respective segments of data do not have up to date information. The examination may include comparing a count field of a CKD record for each of the respective copies.
According to another embodiment of the present invention, a storage system is disclosed which includes a plurality of storage devices storing a plurality of mirrors of data. This embodiment includes a data coherence tester that includes means for examining a respective copy of a data unit to determine if subsequent data units in the respective segments of data do not have up to date information.
According to another embodiment of the present invention, a method of resolving a data coherence problem existing among a plurality of mirrored copies of data units is disclosed. According to this embodiment, the mirrored copies of the data units are examined to identify which have a data coherence problem, for those that have a data coherence problem at least one mirrored copy is identified that has up to date information, and the copies not having up to date information are updated.
According to another embodiment of the present invention, a storage system is disclosed, which includes a plurality of storage devices storing a plurality of mirrored copies of a plurality of data units. This embodiment includes a data coherence tester to examine the mirrored copies of the data units to identify those that have a data coherence problem and, for the identified copies, identifying at least one which has up to date information.
According to another embodiment of the present invention, a method of resolving a data coherence problem existing among a plurality of mirrored copies of a plurality of data units in the storage system is disclosed. According to this embodiment, a time stamp associated with each of the copies of the data units is provided, the time stamp indicating an increment of time sufficiently small to resolve between old data and up to date-data for most updates to the copies. According to this method, the copy with the most recent time stamp is considered to be up to date. The time stamp may distinguish time in about two second increments or less. The method may include a step of determining that a data coherence problem exists among copies of the data unit.
According to another embodiment of the present invention, a storage system is disclosed which includes a plurality of storage devices that include a plurality of mirrored copies of data units. This embodiment includes a data coherence resolver, to resolve a data coherence problem based on a time stamp associated with each of the mirrored copies. The time stamp indicates an increment of time sufficiently small to resolve between old data and up to date data for most updates to the copies. The data coherence resolver may further comprise means for identifying data coherence problems among the mirrored copies of the data units.
According to another embodiment of the present invention, a method of diagnosing a data coherence problem in a storage system that stores a plurality of data segments on physical storage devices is disclosed. According to this embodiment, a first copy of one of the data segments is provided. A corresponding copy of the data segment is generated using redundant information stored in the storage system. The first copy and the corresponding copy are compared to determine when a data coherence problem exists between the first copy and the redundant information. The data segments may be a part of a parity redundancy group. The method may also include a step of identifying which among the one data segment and the redundant information is not up to date, when a data coherence problem is found. This may involve comparing a time stamp in the first copy with a time stamp in the corresponding copy. The data segment or redundant information may be repaired. The method may also include a step of determining whether the corresponding copy is a viable data segment. The determination of whether the corresponding copy is a viable data segment may be made by generating an error code for a data unit in the corresponding copy and comparing the generating error code with the error code stored in the corresponding copy. In another embodiment, the determination of whether the corresponding copy is a viable data segment may include the steps of determining an expected value for a field in a data unit in the corresponding copy, and comparing the expected value with the value stored in the corresponding copy.
According to another embodiment of the present invention, a storage system is disclosed, which includes a plurality of storage devices each storing a data segment, the data segments being in a redundancy group that includes redundant information. The storage system according to this embodiment also includes a data coherence tester, coupled to the storage devices. The data coherence tester generates a corresponding copy of one data segment and compares the corresponding copy with a copy of the one data segment, to determine when a data coherence problem exists between the first copy and the redundant information.
According to another embodiment of the present invention, a storage system is disclosed which comprises a plurality of storage devices, means for generating a corresponding copy of one data segment stored on one of the storage devices, and means for comparing a first copy of the one data segment with the corresponding copy, to determine when a data coherence problem exists between the first copy and redundant information stored in the storage system.
According to another embodiment of the present invention, a data verification process (including methods such as those described above) is initiated independent of any catastrophic failure of the storage system.
According to another embodiment of the present invention, a storage system is disclosed that includes a data coherence tester that initiates a data verification process independent of the occurrence of any catastrophic failure of the system.
According to another embodiment of the present invention, a method of data verification is disclosed. According to this method, the verification process proceeds while operating the storage system in a normal mode of operation.
According to another embodiment of the present invention, a storage system is disclosed which includes a data coherence tester that tests coherence of data among a plurality of storage devices during normal operation of the storage system.