The need to store digital files, documents, pictures, images and other data continues to increase rapidly. In connection with the electronic storage of data, systems incorporating more than one storage device have been devised. In general, using a number of storage devices in a coordinated fashion in order to store data can increase the total storage volume of the system. In addition, data can be distributed across the multiple storage devices such that data will not be irretrievably lost if one of the storage devices (or in some cases more than one storage device) fails. An additional advantage that can be achieved by coordinating the operation of a number of individual storage devices is improved data access and/or storage times.
Storage systems that provide at least some integration of individual storage devices, such as JBOD (Just a Bunch of Disks), SBOD (Switched Bunch of Disks) or RAID (Redundant Array of Independent Disks) systems have been developed. These storage systems are typically deployed as a number of individual disk drives or other storage devices within an enclosure to present an integrated component to the user. In addition to the individual storage devices, the enclosure may contain one or more power supplies and one or more cooling devices. Integrated storage systems may also include one or more storage system controllers that can be used to control the distribution of data across the individual storage devices in a given storage system.
A RAID controller is a storage device that provides users with mass storage space, quick data access, and/or data protection. It can achieve data protection with either mirroring or parity data schemes. RAID level 1 provides a mirroring scheme, meaning every byte in one drive has a redundant copy in another drive. RAID levels 3, 4, and 5 provide parity protection. They allow a single drive failure without losing any customer data. These systems also offer to rebuild (reconstruct) the data from the failed drive to another drive if there is one such drive available. After a drive has failed and before the rebuild completes, the array is said to be in a critical condition, meaning another drive failure can cause the entire array to be unreadable. After the rebuild is complete, the array is said to be fault tolerant again, meaning that it believes it can sustain a drive failure without losing any customer data. Other RAID levels use redundant data storage such that a single drive failure does not lead to complete data loss. For example, some RAID levels may allow 2 or 3 failures before the array becomes critical. However, once the array becomes critical, the behavior is no different from the single parity protection arrays. Unfortunately, cases exist under these storage schemes where a single failure can still cause data to be lost, even in a fault tolerant array.
Disk drive vendors have attempted to minimize data losses resulting from bad areas on a disk by using a data reallocation technique. Typically, a certain number of sectors on a given drive are reserved for these reallocation purposes, and are “invisible” to the outside world. The disk drive sectors are accessed through a parameter called a Logical Block Address (LBA). The LBA defines where a particular sector can be found on a given disk. When a block of corrupted data or a bad block of data is found on a disk drive, the data that should have been stored in that block is recovered and written to another disk or another location on the same disk. There are several types of protocols that can be used to reconstruct data from a defective memory location. Two of these protocols are the Automatic Write Reallocation (AWRE) protocol and the Automatic Read Reallocation (ARRE) protocol.
When AWRE is enabled, a disk drive is allowed to automatically relocate bad data detected during write operations. For example, if a write command is issued to write data to a sector and there is a failure, the disk drive may map the data to be written to a new location. Typically, the data is written or rewritten to a new location that is not user accessible. Of course, the drive can only rewrite the data to a new location if the data used to reconstruct the data from the defective block is valid (e.g., the original data in the buffer is valid or the data is recovered from the backup media). The valid data is then written to a reserve location.
When ARRE is enabled, the disk drive is allowed to automatically relocate bad data detected during read operations. The reallocation of the data is performed in a similar fashion to the AWRE protocol. Namely, the defective stripe of data is reconstructed and written to a new location that is not user accessible. Again, the success of this reconstruction depends upon the backup data and whether or not it is valid.
The success of both reconstruction processes depends upon the information used to reconstruct the data from the defective location. The successful use of the reconstruction information further depends on mapping information. If the mapping information is lost, or backup memory locations are also defective, then data may be lost indefinitely. Unfortunately, most current systems store the mapping information in the controller memory. This memory is not fault tolerant even though the system believes it is fault tolerant. Furthermore, mapping information is typically stored on the buffers of the drive controller memory during reconstruction. In the event of a power outage, if no alternative power supplies are available, the mapping data may be lost and as a result the relocated data on the drive may be unrecoverable.
Another drawback with current reconstruction techniques is that they are not very economical in their redistribution data. For example, when a stripe of data containing a defective block is detected, the entire stripe of data is rewritten. Thus, an entire stripe of data needs to be reconstructed and allocated to a new stripe on the drive. It is inefficient to rewrite an entire stripe of data when only one block of that stripe of data may be defective. This process uses more reserve disk space than is necessary and a number of potentially good blocks of data are rewritten. The unnecessary rewriting of data also uses processing resources that otherwise could be used to perform other functions.
Another disadvantage in using current storage systems is that they do not maintain a record of their status with great accuracy. Currently, when a controller determines that a memory location is defective, the controller marks that location as bad and saves such information in the bad LBA map. If the number of bad LBAs reaches a particular threshold, the controller will treat that drive as a bad drive, and mark the array that the bad drive belongs to as critical. If a spare drive is available (e.g., a dedicated spare or a global spare), the controller will start a rebuild of the entire array on the backup drive.
Typically, it can take a long time for any particular drive to reach the threshold where it becomes marked as a bad drive. During this period of time, it is very likely other memory locations on the drives can become unreadable. In most redundant storage schemes, all of the disk drives that exist in the array are used to redundantly store a single stripe of data. If a majority of these memory locations become defective, then a data loss can occur without the drive actually reaching the bad LBA threshold. The odds of having the right combination of memory locations become defective, thus resulting in a permanent data loss, increases as the drive continues to be used. Since it typically takes a very long time for the number defective memory locations to reach the critical threshold, a permanent loss of data may occur even if the array is marked as fault tolerant.