Personal computers (PC's) have become increasingly more powerful during recent years and are utilized for a variety of applications in industry, business and education. The varied uses result in different requirements for various subsystems that form the PC. As applications become more complex, the storage requirements for PC's increase. Thus, it is now common for PC's to include hard disks having a storage capacity of as much as 16.8 gigabytes and capacities continue to increase.
Information is stored on the disks in a plurality of concentric circular tracks by an array of transducers, or heads (usually one per disk surface) mounted for movement to an electronically controlled actuator mechanism. The storing of information on the disks is sometimes also referred to as "writing", and the subsequent retrieval of information from the disks is also called "reading".
Over time, hard disks tend to develop a number of defects. Some defects are attributable to user manageable causes such as radiation, temperature, moisture, pressure, impact and vibration. Other defects are attributable to mechanical failure of one or more components of the disk drive assembly such as the spindle, the arm and other mechanical components.
Currently, there are computer programs for testing computer peripheral storage media, particularly rotating magnetic storage media, to determine whether there are areas that are bad or marginal with respect to storing data with integrity. Many of these programs accomplish the task by writing and reading areas of a storage medium repeatedly to determine the reliability of these areas. If an area does not meet some selected threshold of reliability, then the area is marked bad and data is relocated if possible. These programs are designed to test the disk drive prior to sale of the disk drive and/or prior to incorporating the disk drive into the computer system. These programs tend to be customized for a particular make and model of disk and are not typically generically applicable.
U.S. Pat. No. 5,422,890 discloses a system and method that captures and characterizes error information during disk tests. The system is capable of dynamically determining whether the disk under test has exceeded acceptable error rates based on an actual number of bytes read. The system saves error log information, including specific sector addresses, error rates, error types and data patterns.
Other software-only monitors are known. However, they are limited to timing signals between a host microprocessor and the drive controller. These signals are predominantly sensitive to variations of disk rotation speed which, because of their high regulation, do not furnish any practical early warning of trouble. When the disk spindle has serious bearing wear or lack of lubricant, the drive controller increases power to overcome the resultant mechanical grinding. As a result, disk failure is hastened in a manner that is not readily detectable.
In the manufacture of disk drives, it is not unusual for tens of thousands of disk drive units to be fabricated daily. With such high numbers of disk drives being made, it is apparent that a certain number of units will fail to meet the design specifications, due to faulty components, improper assembly, contamination, and other elements familiar to those of skill in the art. While every effort is made by disk drive manufacturers to minimize these defective units and assembly errors, a small percentage of defective units will inevitably be built. When the defect is introduced into the unit at an early stage in the manufacturing process, the fault may not be detected until a much later stage of the process. Such a delay in the detection of defective assemblies can result in a significant amount of labor costs when taken over the large numbers of units being manufactured.
U.S. Pat. No. 5,557,193 discloses a method and apparatus for predicting failure of a disk drive based upon electrical power consumption. This system is capable of determining when a disk drive may fail and entrap the stored data. Like other patents that detect dynamic anomalies as opposed to media failures, it requires new hardware and embedded code added to the disk drive during the manufacturing process (at the factory).
Another example of the "factory-installed" approach to disk drive failure prediction is the S.M.A.R.T. (Self-Monitoring Analysis and Reporting Technology) technology. This is a voluntary standard covering sensing and reporting of hard drive dynamic performance. It is a combination of Compaq's Intellisafe and IBMs Predictive Failure Analysis (PFA). One of the drawbacks to S.M.A.R.T. is that special, customized hardware is needed to allow users to effectively employ S.M.A.R.T.
Declining disk drive costs reduce the need for sophisticated evidence before making a disk drive replacement decision. When S.M.A.R.T. was originally conceived, disk drive storage was relatively expensive and a decision to replace a suspect disk drive required detailed evidence of potential failure. The cost of disk drive storage has dramatically fallen, from $0.92 a megabyte in 1993 to about $0.09 a megabyte in 1998 and is expected to drop even further in the future due at least in part to increased competition in the market place.
The forgoing known methods of predicting disk drive failure using factory installed components are disadvantageous for a number or reasons. First, there is a high cost of operation. The drive assemblies require additional hardware, which necessarily increases drive costs at a time when the drive industry is suffering strong price erosion due to vigorous competition. Second, the factory installed approach has limited application. Drives already shipped cannot be tested without a return trip to the factory. Thus, absent an industry wide agreement, competitive drives cannot be monitored against each other. Third, there is an increased risk of error due to the possibility of failure of the additional hardware. Fourth, the factory installed systems are difficult to maintain because if there is a sensor or other hardware problem, the drive must be sent back to the factory.
Accordingly, there is a need for a generic disk failure prediction system that overcomes the above mentioned problems and provides a reliable indication of the state of the disk and alerts appropriate personnel when the disk becomes faulty.