A disk array is a collection of hard disk drives (HDDs) managed as a unit. Disk arrays can provide a better data I/O rate and data availability for application programs than single large capacity disks.
In "A Case for Redundant Arrays of Inexpensive Disks" report no. UCB/CSD/87/391, December 1987, Patterson et al. defined five levels of RAID. In each RAID level, redundant information is provided so that if one of the HDDs is unavailable, the data on that HDD can be reconstructed from one or more of the other HDDS in the array. RAID-1, often referred to as disk mirroring or data duplexing, stores identical images of user data on two or more member HDDs. In the RAID level 3, 4 and 5 systems, redundancy is provided using parity data.
In RAID level 4 and 5 systems, blocks of data are stored on each HDD in the array, and parity is calculated based on a group of blocks of data on each disk drive. A parity stripe or segment consists of a set of corresponding data blocks on each disk drive and a parity block calculated from those data blocks. Data can be striped at many levels, by blocks, tracks, multiple tracks, cylinders, and so forth. In RAID-5, parity is rotated amongst all the disk drives which makes the workload on the disks in the array uniform. Other RAID levels are also known including RAID-0 where data is striped on a set of HDDs but the array does not include any parity or other redundant information.
Customers of storage arrays are most concerned with reliability, access times, and cost per megabyte of data stored. RAID systems provide a way of addressing the reliability issue and access requirements. Access time is improved by caching data. A cache is a random access memory often included as part of a storage subsystem to further increase the I/O speed. A cache stores information that either has recently been requested from the disk or that needs to be written to the disk.
Data compression techniques provide a solution for improving the cost per megabyte of data storage. However, there are problems with implementing compression in RAID systems where data is always stored in the same location (home address) even after it continues to be modified. Although a good compression algorithm yields space savings in general, the amount of compression achieved is dependant on the actual data values. After a piece of data is updated it may not compress as well as it did before it was updated so it may not fit back into the space that had been allocated for it before the update. This creates a problem for any storage system where data is assigned a home address.
In a RAID level 5 system, parity information is updated for a write operation from the logical combination of the old data, the new data, and the old parity. While RAID-5 provides many benefits for increasing concurrent accesses, a write penalty is incurred. Rather than only having one array access for writing the new data, a write operation in RAID 5 requires four array access operations, for reading the old data, reading the old parity, writing new data and writing new parity.
In Rosenblum et al, "The Design and Implementation of a Log Structured File System," Proceedings of the 13th ACM on Operating System Principles, October 1991, a log structured file system was proposed where modified data blocks are re-written to the disk sequentially in a log-like structure. Information is also written with each write operation about the data being written. This information is used in managing the system.
A log structured array (LSA) uses some of the same principles of a log structured file in an array system. There are many benefits to using an LSA over a home address based RAID system. An LSA can accommodate the size changes in data produced through data compression since data is not given a fixed location on the disk. Therefore, in an LSA, data can be stored on disks in a compressed form. Also, since an LSA writes all modifications to disk sequentially in a log like structure, it solves the RAID-5 write penalty problem described previously. There is no longer a need to read the old data and old parity, since data blocks for an entire segment are written together.
Application programs and system software running on a host computer read and write data using logical devices independent of the physical location of the data blocks on a storage device, such as a HDD. Programs access data blocks from the storage system using logical cylinder, logical head, and logical record addresses. The storage system controller translates the logical address to the physical address at which the data block is stored. The host computer is unaware of the manner in which requested data blocks are stored on and accessed from the physical storage devices. The typical unit of data management within the controller is a logical track. A combination of a logical cylinder and logical head address represent the logical track address.
The log structured array consists of N+P+S physical disk drives where N is the number of HDDs worth of physical space available for customer data, P is the number of HDDs worth of space available for parity data, and S is the number of spare HDDS provided. Each HDD is divided into large consecutive areas called segment columns. Typically, a segment column is as large as a logical cylinder. Corresponding segment columns from the N+P+S HDDs constitute a segment. The array has as many segments as there are segment columns on a HDD in the array. In a RAID-5 configuration, one of the segment columns of a segment contains the parity of the remaining data segment columns of the segment. A segment directory is stored as part of each segment providing information on each logical track in the segment.
An LSA allows a logical track to be updated to a different location on disk. Since in an LSA the location of a logical track changes over time, a directory called a main LSA derectory gas an entry for each logical track providing its current location in the disk array.
An LSA directory or portions of it reside in main memory. If a failure or error corrupts or obliterates the contents of memory, this will cause the information in the directory and consequently the data tracked by the directory to be lost.
Examples of very severe failures include loss of power to the subsystem or a hardware failure in a component of the subsystem. These failures are categorized as catastrophic failures that obliterate the LSA directory completely.
A straight forward method to recover the LSA directory from catastrophic failures is to rebuild it from scratch using every single segment directory on disks. This process involves first reading each segment's time stamp of when it was last written to disk and making an ordered list of all the segments according to the time stamps, then reading the segment directory of each segment, one at a time from disk in the order described by the segment list. For each entry in the segment directory the LSA directory entry for the corresponding logical track is built up or updated using the segment directories information. The LSA directory recovery is completed when every segment directory entry of every segment directory is examined. While this provides a simple solution, this recovery process can be very expensive in terms of processing time.
Additionally, there are other errors or failures that can't be corrected by the foregoing method. These errors affect the LSA directory and happen more frequently such as microcode logical errors (MLEs) which require speedy recoveries. These kinds of errors are not as massive and fatal as catastrophic failures and tend to be more local. However, the chance of a separate MLE occurring is greater than that of a catastrophic failure. There is a need for a relatively quick way of recovering the main LSA directory in the event of a MLE.
One or more of the foregoing problems is solved, or one or more of the foregoing goals is achieved in using the current invention.