1. Field of the Invention
This invention relates in general to fault tolerant arrays of hard disks that are known as redundant arrays of inexpensive disks (RAID), and more particularly to a method and apparatus for providing write recovery of faulty data in a non-redundant array disk system.
2. Description of Related Art
Modern mass storage subsystems are continuing to provide increasing storage capacities to fulfill user demands from host computer system applications. Further, it is very important that a computer storage system perform reliably. For example, some real time computer storage systems are used to control complex and sometimes dangerous production processes. Failure within storage systems of this type may have adverse consequences both for the products being produced as well as for the health and safety of the surrounding environment. As another example, computer storage systems are often used in mission critical roles. Once again, failure within these types of storage systems may have extremely serious consequences. Of course, even in cases where the failure of a computer system is not dangerous, failure may still be inconvenient and/or expensive.
Various storage device configurations and geometries are commonly applied to meet the demands for higher storage capacity while maintaining or enhancing reliability of the mass storage subsystems. A popular solution to these mass storage demands for increased capacity and reliability is the use of multiple smaller storage modules configured in geometries that permit redundancy of stored data to assure data integrity in case of various failures. In many such redundant subsystems, recovery from many common failures can be automated within the storage subsystem itself due to the use of data redundancy, error codes, and so-called “hot spares” (extra storage modules which may be activated to replace a failed, previously active storage module). These subsystems are typically referred to as redundant arrays of inexpensive (or independent) disks (or more commonly by the acronym RAID). The 1987 publication by David A. Patterson, et al., from University of California at Berkeley entitled A Case for Redundant Arrays of Inexpensive Disks (RAID), reviews the fundamental concepts of RAID technology.
There are several “levels” of standard geometries defined in the Patterson publication. RAID 0 offers disk striping without parity. The multiple disks provide quick reads and writes for large files without the data redundancy protection provided by parity. However, Level 0 is not considered true RAID. A RAID level 1 system, comprises one or more disks for storing data and an equal number of additional “mirror” disks for storing copies of the information written to the data disks. Subsequent RAID levels, e.g., RAID 2, 3, 4 and 5 segment the data into portions for storage across several data disks. One or more additional disks are utilized to store error check or parity information. RAID Level 6 is like RAID 5 but with additional parity information written that permits data recovery if two drives fail. This configuration requires extra parity drives, and write performance is slower than a similar implementation of RAID 5. Some RAID implementations use different levels on separate banks of drives in an attempt to provide better overall application performance on the Host system. While this approach can provide some performance benefits, it raises the complexity of data management, and creates the possibility of large performance and/or cost penalties when data sets optimized for one level must be relocated to a different level when the capacity of a given bank is exceeded. Many other varieties of RAID levels exist, with many being proprietary. Yet, the general aspect of providing protection against storage system failures is the goal.
RAID storage subsystems typically utilize an array controller that shields the user or host system from the details of managing the redundant array. The controller makes the subsystem appear to the host computer as one (or more), highly reliable, high capacity disk drive. In fact, the RAID controller may distribute the host computer system supplied data across a plurality of the small independent drives with redundancy and error checking information so as to improve subsystem reliability. Frequently RAID subsystems provide large cache memory structures to further improve the performance of the RAID subsystem. The cache memory is associated with the control module such that the storage blocks on the disk array are mapped to blocks in the cache. This mapping is also transparent to the host system. The host system simply requests blocks of data to be read or written and the RAID controller manipulates the disk array and cache memory as required.
Each of these configurations (geometries or RAID levels) may be preferred over the others in particular applications depending upon performance or reliability requirements. It is vital to proper operation of the RAID storage subsystem that the configuration information be maintained. Each disk drive of the disk array must be in a known address and/or physical position with respect to the various interface and control busses. The order of the various disk drives in their respective groupings is critical to proper operation of the RAID subsystem. Furthermore, many RAID storage subsystems permit a plurality of groupings of disk drives to be simultaneously operable within the subsystem. Each grouping may be operating under a different RAID geometry to satisfy the needs of a particular application.
Initial implementations of RAID were in the form of software device drivers that had to be added to the host computer system. It quickly became apparent that the overhead involved in managing the RAID was significant, and made the computer run much slower. Because storage demands weigh heavily on a processor, executing all the read and write operations on the disk system results in a huge number of I/O interrupts. If these were to be processed by a host CPU, then the host would be doing little else. To ease this burden, storage vendors and motherboard designers have contemplated alternative methods of processing I/O.
One alternative to host based I/O control is to implement an I/O processor directly on the storage controller to handle most of the I/O to its connected drives. This is called a controller-based RAID. In the host-based I/O control, all the RAID functions are handled directly by the file system and device drivers of the host operating system. However, with a RAID controller, most of the RAID functions are passed on to the RAID controller to manage. There is still I/O between the host CPU and the controller, but a significant portion of this is reduced with controller-based systems. This hardware RAID controller concept provides greater performance while maintaining all the benefits of the RAID technology. Thus, a RAID controller organizes the disk drives into the RAID configuration and controls the interaction between the individual disk drives.
As generally described above, a RAID system may consist of a host, a RAID controller, and non-redundant RAID array(s) and/or redundant RAID array(s). Data transfer between the host and any such array is initiated via a host Read Request or Write Request. A host Read or Write Request causes a READ or WRITE command, respectively, to be issued by the RAID controller to one or more disks (built to SCSI interface, for example) in a designated array.
When the command is completed normally (with no error), the drive returns a GOOD completion status (or, simply status) to the RAID controller, which in turn presents a Good status to the host. A write operation is generally assumed successful if the disk receiving the WRITE command returns a Good status after it is executed. Even though no write error occurred at a certain data block location on the disk, a subsequent read operation at that block may or may not be successful.
For a redundant RAID configuration, when an unreadable data block is discovered, the RAID controller generally attempts to retry the read, and if it is unsuccessful, the RAID controller tries to re-write the block with data regenerated from the member drives (as in a RAID 5) or from the mirrored copy (as in a RAID 1 or RAID 6). After the write, data is verified with a READ command with the FUA (Force Unit Access) bit being set to “1”, indicating that data must be read from the media. If the re-write is not successful, a block reassignment is attempted. However, this error recovery procedure is not used for a non-redundant array or a redundant array having one drive offline since “replacement” data is not available during a read operation.
When a data block needs to be updated, a WRITE command is issued, without prior knowledge that the block on the disk may have been determined by the disk as unreadable and re-writing it may or may not solve the problem. In the latter case, a relocation on the disk may have been recommended by the disk. On a Write Request, the controller will issue a WRITE command to the disk as usual. Unless a write error occurs, the controller will not verify the data or perform a block reassignment. Consequently, the data may remain unreadable at the same physical location.
There are disks on which a write operation may not end up with any error indication, but reading the data cannot be successful. For example, the disk may use a No-IDTM architecture, and as a result, write errors do not usually occur (other than a possible “No Sector Found” error). However, when an unrecovered read error occurs, the controller is not able to re-write or relocate data with a non-redundant RAID array because of the absence of replacement data. Therefore, in some cases, data at a faulty location may never be recovered even when replacement (or updated) data becomes available for a write or relocate operation.
It can be seen that there is a need for a method and apparatus for write recovery of faulty data in a non-redundant array disk system.