1. Field of the Invention
The present invention pertains to optical storage media for computers, and in particular is concerned with error recovery and data reconstruction schemes for optical disks which is particularly useful in automated storage libraries.
2. Description of the Related Art
Several techniques for error recovery and data reconstruction of data stored on disks are presently known. Reliable techniques include disk mirroring and the use of redundant arrays of inexpensive disks (RAID). Each technique has its own merits and disadvantages, but no known technique is particularly well suited to optical disks because of their unique data format. Additionally, optical disks are often dispersed among several storage areas that are accessed by one or more robotic mechanisms controlled by a central computer. These storage areas are referred to as automatic media storage libraries, or automated libraries. A concept, analogous to RAID, called redundant arrays of inexpensive libraries (RAIL) allows multiple storage libraries to be controlled by one central computer; however, known RAID techniques for error recovery and data construction are inefficient when applied to optical media stored in redundant libraries due to the large number of disks involved.
Referring to FIG. 1, a prior art example of a disk mirroring system is shown. File 13 is written by an application program 12, and then sent by an operating system 10 to a control program 20. The control program simultaneously sends identical copies of the file to the drives 22 and 24. The copy of the file stored on drive 22 is "mirrored" by the copy stored on drive 24. In this example, the two drives 22 and 24 represent the minimum configuration for mirroring. Mirroring can be used to improve performance in normal operation. For example, when a file is to be read from mirrored disks 22 and 24, the control program 20 reads alternate file clusters simultaneously from each of the drives and passes them by way of the operating system to an application program. For example, clusters 14, 16 and 18 are read from the copy of file 13 on disk 22 while clusters 15 and 17 are read from the copy of file 13 on disk 24. Obviously if three or more drives are used for mirroring then the performance of a file is increased; however, the trade-off is that the cost also increases with the addition of each disk. Mirroring is especially useful when a read failure occurs, such as one caused by a media surface defect or by a read head "crash". In either case, the whole intact file 13 can be recovered from the other disk. Further, if the failure is due to a media defect, then control program 20 may repair the disk experiencing the error by rewriting the missing data from file 13 stored on the undamaged disk. The primary disadvantage of mirroring is cost. A dedicated disk drive must be available to replicate each disk drive of interest.
Another technique used for error recovery and data reconstruction, less expensive than mirroring, is known as Redundant Array of Inexpensive Disks (RAID). A research group at the University of California, Berkeley, in a paper entitled "A Case for Redundant Arrays of Inexpensive Disks (RAID)", Patterson et al., Proc. ACM SIGMOD, June 1988, describes this technique. A RAID 1 architecture is essentially disk mirroring described above. A RAID 2 architecture uses "bit striping" in which each bit of each word of data plus error detection and correction (EDC) bits for each word are stored on separate disk drives. For example, a 32-bit data word might have 7 EDC bits added using the known "Hamming code" technique. The 39-bit word must then be written one bit per disk drive onto 39 disk drives. If one of the 39 disk drives fail, a control program may use the remaining 38 bits of each stored 39-bit word to reconstruct each 32-bit data word. A significant drawback of RAID 2 architecture is that a large number of disk drives are required, and that seven of the drives must be dedicated to bit striping EDC data.
U.S. Pat. No. 4,092,732 to Ouchi describes a RAID 3 architecture. RAID 3 architecture is the predecessor to more recent generations of RAID architecture, including the widely implemented RAID 5 architecture. RAID 3 architecture is based on internal error detection and correction, so that it is not necessary to provide extra error detection and data correction, as in RAID 2 architecture.
Referring to FIG. 2, an example of a direct descendant of a RAID 3 type architecture is shown. A file 33 is written by an application program 32 and eventually passed to control program 40 for storage on a disk. The control program simultaneously sends one-half of the file clusters (34, 35, and 36) to drive 42 and the other half (37, 38, 39) to drive 43. Drive 44, which does not receive file data, is reserved as the "parity" drive. The control program 40 performs an Exclusive OR (XOR) operation on the data being written to drives 42 and 43. The result is parity data, written to drive 44. An XOR operation results in a "0" bit whenever two identical bits are compared, and a "1" bit whenever two dissimilar bits are compared. For example, cluster 34 of file 33 contains the byte 50 comprised of the bits "1100", and cluster 37 contains the byte 52 comprised of bits "1010". An XOR operation yields the following parity bits 53 comprised of "0110". This parity information written to the parity drive can be used to reconstruct the data in the event of a read failure by performing a reverse XOR operation of the undamaged bits with the parity bits. Three is the minimum number of drives for a RAID 3 architecture, but there is no maximum limit. However, there is a marked decrease in performance caused by an increase in the number of drives to which parity bits must be written. If there are more than three drives in an array, the first two are XORed and that result is XORed with the next drive, and so on until all the drives containing data have been XORed and the final result written to a parity disk. An obvious advantage of RAID 3 architecture is that for "N" number of disk drives, only one additional parity drive is required. Thus "N+1" disk drives define a "redundancy group". On the other hand the disadvantage of RAID 3 architecture is the performance overhead required to read each bit of data, perform multiple XOR operations, and write the data to a parity disk.
U.S. Pat. No. 4,761,785 to Clark et al. describes a RAID 5 architecture in detail. A RAID 5 architecture revises the RAID 3 scheme by distributing the data and parity information across all available disk drives. Typically, a redundancy group of "N+1" drives are divided into a plurality of equally sized address areas known as "blocks". Each drive generally contains the same number of blocks. Blocks from each storage unit in a redundancy group having the same unit address ranges are referred to as "stripes". Each stripe has N blocks of data, plus one parity block on one drive containing parity for the remainder of the stripe. This technique is commonly referred to as "parity striping". Further stripes each have a parity block, the parity blocks being distributed on different storage units. In this way, no single unit is burdened with all of the parity update activity. A limitation of RAID 5 architecture is that a change in a data block requires a considerable performance overhead, as both the parity block and data block must be read and XORed, and the result XORed with new data. Then both the result (the new parity block) and the new data must be written to disk drives. Any requests to read or write new data during the period when parity is being updated must wait until updating is completed. This performance overhead is commonly referred to as the RAID write penalty.
A method for parity protecting distributed files in a parallel network is disclosed in U.S. Pat. No. 5,130,992 to Frey et al. This invention is useful for spreading parity among a large number of devices; however, the '992 invention involves updating a parity block whenever data is written in a file that is parity protected. The disadvantage of immediate updating is the overall impact to system performance as the data is read, a parity operation performed, and the new data is written. Another method which builds on RAID 5 techniques to reconstruct redundancy information during normal or "online" operation is described in U.S. Pat. No. 5,235,601 to Stallmo et al. Applying such an "online" technique to a large number of disks (e.g. in a storage library) would also have the inherent disadvantage of slowing throughput, because a parity update would be required for numerous disks after every write operation.
A method of reconstructing data without the necessity of continuously updating a parity redundancy is disclosed in U.S. Pat. No. 5,124,987 to Milligan et al. The '987 invention involves writing new or modified data to non-permanent electronic memory, in the form of "logical tracks". A background process compares the new data on a "logical track" to any old data stored in a previous write and removes the old data. The data is periodically written to physical tracks which serve as redundancy drives. By using logical memory to perform periodic reads, the '987 method improves performance over conventional raid techniques that use only disk memory for storing redundancy information. However, the tradeoff is the inherent risk of relying on volatile non-permanent memory for data construction. There is a risk that the contents of volatile electronic memory will be lost once power is interrupted.
RAID techniques have typically been used with magnetic drives, such as the type of Direct Access Storage Disks (DASD) units, which are often used as conventional hard drives in personal computers. RAID architecture is not typically employed in magneto-optical units, because these units have pre-erased sectors. The pre-erased sectors are usually created before shipping and a formatting operation is employed to ensure that the pre-erase has been completed. The pre-erase creates all 0's throughout the disk. This is done because of an inherent inability of a laser beam used in magnetic-optic technology to write 0's and 1's simultaneously. Typically, a full revolution of the disk is required for a "write pass" (writing 1's) and another revolution is required for an "erase pass" (writing 0's). To avoid the erase pass, the disk is pre-erased before any writing of data is allowed. As data accumulates, the area devoted to pre-erased sectors decreases but the presence of them presents a problem for calculating parity. The pre-erased sectors contain logical blocks of data that would be treated as "0"'s in a conventional parity striping algorithm. The resulting overhead involved with creating parity data for a block of "0"'s is completely unnecessary, since the data will not change as long as the sector remains erased. Alternatively, the pre-erased sectors might send a conventional RAID parity striping program into error recovery mode, reducing performance and inducing errors itself. Entering error recovery mode reduces performance and therefore should be avoided if possible.