1. Field of the Invention
The present invention relates to systems for providing fault-tolerance for disk drives in computer systems. More specifically, the present invention relates to a method and an apparatus for using acoustic signals to identify one or more disk drives that are likely to fail in a computer system.
2. Related Art
As computer systems grow increasingly more powerful, they are able to manipulate larger volumes of data and are able to execute larger and more sophisticated computer programs. In order to accommodate these larger volumes of data and larger programs, computer systems are using larger amounts of disk storage. For example, some existing server systems currently support more than 15,000 disk drives.
Ensuring the reliability of disk storage in these systems is critically important for most applications. Allowing data to be corrupted or lost can have a devastating effect on businesses that rely on the data. For example, airlines rely on the integrity of data stored in their reservation systems for most of their day-to-day operations, and would essentially cease to function if this data became lost or corrupted.
About one percent of disk drives within a computer system fail each year. This has motivated system designers to develop techniques to mitigate the loss of data caused by disk drive failures. For example, disk drives are often organized into xe2x80x9cRAIDxe2x80x9d arrays to ameliorate the effects of a drive failure by providing data redundancy.
Although these redundancy-based techniques can help prevent the loss of data, a failed disk drive must be replaced quickly to maintain system reliability. If a second disk drive fails before the first failed disk drive can be replaced, data can be lost.
Note that disk drives can fail in a number of ways. A failure in the electrical circuitry of a disk drive is typically instantaneous and catastrophic. On the other hand, more common mechanical failures often develop over an extended period of time. For example, one of the most common disk drive failures is a failure of a spindle bearing. Spindle bearing failures typically take place over an extended period of time as frictional forces gradually wear away at the spindle bearing. In many cases, a spindle bearing can change from being fully functional to completely failed over several hours, or even days.
Some software techniques attempt to detect incipient failures by analyzing read/write errors and retry attempts. While these techniques can be effective in some situations, a disk drive needs to be very close to failure before the software can detect the impending failure. This leaves very little time to replace the failing disk drive.
What is needed is a method and an apparatus for identifying disk drives that are likely to fail without the problems described above.
One embodiment of the present invention provides a system that facilitates determining whether a disk drive is likely to fail. The system operates by monitoring at least three acoustic signals emitted from a two-dimensional array of disk drives and then comparing characteristics of each acoustic signal with baseline acoustic signals. These baseline acoustic signals reflect normal operation of the two-dimensional array of disk drives. If the acoustic signals differ by a predetermined amount from the baseline acoustic signals, the system identifies one or more disk drives in the two-dimensional array of disk drives that are likely to fail.
In a variation of this embodiment, monitoring the acoustic signals involves monitoring signals from microphones arranged non-linearly on a periphery of the two-dimensional array of disk drives.
In a further variation, comparing characteristics of each acoustic signal with baseline acoustic signals involves calculating a power spectral density for each acoustic signal and then subtracting the baseline power spectral density from the power spectral density for each acoustic signal. It can also involve subtracting the power spectral density from the baseline power spectral density for each acoustic signal.
In a further variation, calculating the power spectral density involves performing a Fourier transform on each acoustic signal.
In a further variation, the system limits the frequency range of the power spectral density to a predetermined frequency range that is associated with failing disk drives.
In a further variation, identifying that one or more disk drives is likely to fail involves correlating the acoustic signals to determine one or more disk drives within the two-dimensional array of disk drives that may fail.
In a further variation, correlating the acoustic signals involves localizing failing disk drives by applying a barycentric coordinate technique to the acoustic signals.
In a further variation, the system monitors at least four acoustic signals emitted from a three-dimensional array of disk drives and compares characteristics of each acoustic signal with baseline acoustic signals. These baseline acoustic signals reflect normal operation of the three-dimensional array of disk drives. If the acoustic signals differ by the predetermined amount from the baseline acoustic signals, the system identifies one or more disk drives in the three-dimensional array of disk drives that are likely to fail.