The present invention relates to a technique for avoiding the abandonment of a disk in a mirrored disk storage system when a disk sector becomes unreadable. The technique may be used to improve reliability in distributed databases or other high reliability storage systems that make use of multiple physical disks.
Data processing systems increasingly require access to greater and greater amounts of storage space. Many high end systems, such as web servers and database systems, are used for storing information to which constant access that is critical to the operation of an enterprise. It is thus extremely important that the users of these systems not be denied service for extended periods of time, and it is quite desirable that as many errors as possible can be fixed without requiring user intervention, or worse, having to shut down or remove the system from active service.
It is now quite common for various data recovery techniques to be implemented, even right on the disk hardware itself. One common failure occurs when a sector on a single track goes bad. A bad sector is one that does not reliably accept or provide data, generally due to a failure of the underlying electromagnetic media. To accommodate sector failures, modern disk units typically provide at least one spare sector per track. This permits remapping of sectors without losing a significant amount of the remaining disk space or which is still good. A disk controller contained in the disk unit itself and/or associated closely with the disk contains firmware to perform such remapping of sectors.
In a typical scenario, a disk may be arranged in 512 byte sectors. Each sector on the disk is identified by the disk controller referencing its physical location, such as by track number and/or sector number. However, most disk controllers now assign a unique Logical Block Address (LBA) to most if not all of the physical disk sectors. Read and write commands to the disk thus access a sector by calling out its location via the LBAs and not by the physical locations. The LBAs typically run contiguously from zero up to some number, m. The disk controller generally assigns LBAs to physical sectors in such a way that sequentially accessing LBAs results in the fastest possible access time.
As alluded to previously, the disk controller typically reserves several spare sectors, generally spreading the spare sectors across the disk rather than concentrating them in one area. If a sector becomes inaccessible, the disk controller maintains a mapping table whereby each sector's LBA can be reassigned to one of the spare sectors. In this manner, the user of the disk (typically the operating system or other application software) need not keep track of which sectors are defective—all LBAs in the range 0-m are always “guaranteed” to be good.
If the application software then attempts to write to a LBA and the disk controller determines that the sector to which the LBA points is defective, the disk controller reassigns the LBA (unbeknownst to the application) to a spare sector. The only possible affect on the application is a slight delay on redirecting the access. Subsequent reads or writes to that LBA generally will not be as fast as before, since the replacement sector will not be adjacent to the sectors pointed to by the previous and following LBAs. If the sector is physically nearby, however, the increase in access time will not be that great. This is one reason for scattering spare sectors throughout the disk rather than grouping them all in one region.
Read failures cannot be handled quite as smoothly, however. If a read attempt to a particular LBA results in an error, the data in that physical sector is now irretrievably lost. In that case, the disk controller notifies the application that an unrecoverable read error has occurred, and reassigns that LBA to a spare sector as in the case of a write error. Thus, future accesses to that LBA will be valid again, in furtherance of the abstraction that all LBAs in range of 0-m are guaranteed to be good. After receiving an unrecoverable read error, if the application has archived the data for the inaccessible sector elsewhere, the application can now take steps to rewrite the block. The data will now be stored in a new, non-defective sector; and the application can then now continue as if the error had not occurred.
The rate of unrecoverable read errors varies among disks and typically with age and environmental conditions such as temperature and vibration. It is rarely zero. Thus disk applications must be able to tolerate occasional data loss. Fundamentally, disk data must typically be able to be regenerated in some fashion, usually by retrieving an archived copy of the data. High performance storage systems use multiple redundant disks to minimize the frequency of retrieval from archives.
One such multi-disk storage system that achieves fault tolerance is known as Redundant Array of Inexpensive Disks (RAID). One commonly used RAID technique its so-called RAID Level 1 that consists of an array of an even number of disks, half of which mirror the other half. For example, in a system with 8 total disks, numbered disks 2, 4 6 and 8 contain exact copies of the data on disks numbered 1, 3, 5, and 7. In another possible implementation of RAID 1, each disk is partitioned, with one half of each disk mirroring the other half of another disk.
When a disk sector in an array system fails or becomes inaccessible, the storage system redirects all accesses destined for the failed disk to the mirror. After the failed disk has been replaced, and its contents restored as necessary from the mirror, the storage system can then redirect accesses destined for the failed disk to its replacement instead of to the mirror. In this fashion the storage system remains operational despite occasional disk failures.
Some have recognized that disk sector repair methods may be used in connection with mirrored disk arrays. For example, in Kari, et al., “Analysis of Repair Algorithms for Mirrored Disk Systems”, IEEE Transactions on Reliability, vol. 46, no. 2, June 1997, pp. 193-200, it was recognized that most modern disk units are arranged for handling sector faults internally, by their associated disk controllers remapping LBAs, and that a faulty sector can be replaced using one of the spare. However, there is no particular discussion in that paper of how to handle unrecoverable read errors in connection with such disk repair processes, while avoiding the need to replace an entire disk with its mirror.