A disk array is a collection of hard disk drives (HDDS) managed as a unit. Disk arrays can provide better data I/O rate and data availability for application programs than single large capacity disks.
In "A Case for Redundant Arrays of Inexpensive Disks" report no. UCB/CSD/87/391, December 1987, Patterson et al. defined five levels of RAID. In each RAID level, redundant information is provided so that if one of the HDDs is unavailable, the data on that HDD can be reconstructed from one or more of the other HDDs in the array. RAID-1, often referred to as disk mirroring or data duplexing, stores identical images of user data on two or more member HDDS. In the RAID level 3, 4 and 5 systems, redundancy is provided using parity data.
In RAID level 4 and 5 systems, blocks of data are stored on each HDD in the array, and parity is calculated based on a group of blocks of data on each disk drive. A parity stripe or segment consists of a set of corresponding data blocks on each disk drive and a parity block calculated from those data blocks. Data can be striped at many levels, by blocks, tracks, multiple tracks, cylinders, and so forth. In RAID-5, parity is rotated amongst all the disk drives which makes the workload on the disks in the array uniform. Other RAID levels are also known including RAID-0 where data is striped on a set of HDDs but the array does not include any parity or other redundant information.
Customers of storage arrays are most concerned with reliability, access times, and cost per megabyte of data stored. RAID systems provide a way of addressing the reliability issue and access requirements. Access time is improved by caching data. A cache is a random access memory often included as part of a storage subsystem to further increase the I/O speed. A cache stores information that either has recently been requested from the disk or that needs to be written to the disk.
Data compression techniques provide a solution for improving the cost per megabyte of data storage. However, there are problems with implementing compression in RAID systems where data is always stored in the same location (home address) even after it continues to be modified. Although a good compression algorithm yields space savings in general, the amount of compression achieved is dependant on the actual data values. After a piece of data is updated it may not compress as well as it did before it was updated so it may not fit back into the space that had been allocated for it before the update. This creates a problem for any storage system where data is assigned a home address.
In a RAID level 5 system, parity information is updated for a write operation from the logical combination of the old data, the new data, and the old parity. While RAID-5 provides many benefits for increasing concurrent accesses, a write penalty is incurred. Rather than only having one array access for writing the new data, a write operation in RAID 5 requires four array access operations, for reading the old data, reading the old parity, writing the new data and writing the new parity.
In Rosenblum et al, "The Design and Implementation of a Log Structured File System", Proceedings of the 13th ACM on Operating System Principles, October 1991, a log structured file system was proposed where modified data blocks are re-written to the disk sequentially in a log-like structure. Information for managing the system is also written with each write operation.
A log structured array (LSA) uses some of the same principles of a log structured file in an array system. There are many benefits to using an LSA over a home address based RAID system. An LSA can accommodate the size changes in data produced through data compression since data is not given a fixed location on the disk. Therefore, in an LSA, data can be stored on disks in a compressed form. Also, since an LSA writes all modifications to disk sequentially in a log like structure, it solves the RAID-5 write penalty problem described previously. There is no longer a need to read the old data and old parity, since data blocks for an entire segment are written together.
Application programs and system software running on a host computer read and write data blocks using logical devices independent of the physical location of the data on the storage device (such as an HDD). Programs access data blocks from the storage system using logical cylinder, logical head, and logical record addresses. The storage system controller translates the logical address to the physical address at which the data block is stored. The host computer is unaware of the manner in which requested data blocks are stored on and accessed from the physical storage devices. The typical unit of data management within the controller is a logical track. A combination of a logical cylinder and logical head address represent the logical track address.
The log structured array consists of N+P+S physical disk drives where N is the number of HDDs worth of physical space available for customer data, P is the number of HDDs worth of space used for parity data, and S is the number of spare HDDS provided. Each HDD is divided into groups of consecutive sectors called segment columns. Typically, a segment column is as large as a logical cylinder. Corresponding segment columns from the N+P+S HDDs constitute a segment. The array has as many segments as there are segment columns on a HDD in the array. An example of the layout for such a system is shown in FIG. 2. In a RAID-5 configuration, one of the segment columns of a segment contains the parity of the remaining data segment columns of the segment.
In an LSA, data blocks such as logical tracks are updated to different locations on the disks. Since, in an LSA, the location of a logical track changes over time, a directory called an LSA directory has an entry for each logical track providing its current location in the disk array.
LSA segments are categorized as one of the following: FREE, meaning the segment contains no valid data and is ready to be opened; OPEN, meaning the segment is available to hold logical tracks being written to the disks ("destaged")and is in the process of being filled with logical tracks being destaged; CLOSING, meaning the segment contains some valid data, but no destage data can be further assigned to it and it is in the process of being closed and written to the disks; and CLOSED, meaning all of its data has been written to the disks.
The logical tracks in a logical cylinder may be destaged (written to disk) together to enhance the performance of a sequential access. A logical cylinder is called a "neighborhood." Other groupings of logically sequential data may also be categorized as a neighborhood. A group of logical tracks in a logical cylinder destaged together is called a "neighborhood in destage."
Destaging a neighborhood involves assigning it to an open segment. The open segment remains available to accept other neighborhoods in destage until it is deemed full enough to close. All of the data blocks and parity that constitute a segment are written to disk before the segment is considered closed. Each logical track in the open segment has an entry in the segment's segment directory that describes the track's location in the segment. The segment directory is written on the disk as part of the segment at segment closing time.
Closed LSA segments written to the storage device have "live" tracks and "holes." Live tracks are tracks that have not been updated since being assigned to the segment and contain current, valid data. Holes refer to the space vacated by tracks that were assigned to the segment but subsequently were updated and assigned to a different open segment, as well as fragmented space which was left vacant at the time the segment was closed.
Garbage collection is the process of reclaiming "holes" in closed segments on the storage devices. A garbage collection procedure is started when the number of free segments falls below a threshold. The process of garbage collecting a segment involves reading the segment's directory from disk and scanning each directory entry and comparing the track's address as indicated by the entry with the address as indicated by the corresponding LSA directory entry. If the two entries match, then the track still resides in the segment and is considered "live." All live tracks are then read from the disk into the memory and are written back to disk in other segments. Segments that were garbage collected become free (or available) segments.
While a RAID-5 disk storage system with one drive of parity data can protect against data loss from one disk drive, sometimes data can still be lost. If two corresponding sectors of data on two drives are damaged, both sectors are lost even though both drives are still operational.
In the case that a lost sector happens to contain segment directory data, rather than regular data, the impact is greater. In the event a failure or error occurs that corrupts or obliterates the contents of a main LSA directory, a customer may lose data without knowing which piece of data is lost or knowing that data is lost at all. When the main LSA directory is lost, segment directories are required to recover the main LSA directory. If a segment directory is not available, the main LSA directory cannot be recovered.
In addition, a segment directory is also used in the garbage collection process. A segment directory with a lost sector (a "damaged segment directory") will prevent the segment from being garbage collected and reused. The segment cannot be garbage collected because the segment directory is needed to validate the logical tracks in the segment as being live by comparing the segment directory entries to the main LSA directory entries. As a result, the disk space utilization is reduced because the segment space can not be reclaimed through garbage collection. Further, without knowing the fact that there is lost data in the segment directory, an effort may be made repeatedly to try to include the segment in the garbage collection procedure only to stop the garbage collection on the segment after failing to read the segment directory.
One or more of the foregoing problems is solved, or one or more of the foregoing goals is achieved in using the current invention.