1. Field of the Invention
This invention relates in general to disk drive reliability, and more particularly to a method and apparatus for providing predictive failure analysis using the MR head resistance.
2. Description of Related Art
Most companies today are concerned about system availability of their local area networks. With mission-critical applications now commonplace on PC servers, companies cannot afford to incur unplanned outages. Further, computer users today have great expectations of data storage reliability. Many users do not even consider the possibility of losing data due to a hard disk drive failure. Even though continual improvements in technology make data loss uncommon, it is not impossible. Computer system failures on the whole are aggravating. Production is delayed, customers upset, users dismayed, and in general, nothing can be accomplished until the system is operational, and data restored. Even though disk drive reliability has been constantly improving, failures still occur.
Because of the consequences of losing data, disk drive reliability is very important. Reliability has been measured in the industry with Mean Time Between Failure (MTBF), a term or claim that is easy to advertise (a higher number is better), difficult to explain, and nearly impossible to prove or guarantee. Unlike other hard disk drive performance parameters, reliability cannot be measured until after there has been field experience with the product. Analysis of actual failures is needed for accurate numbers.
Historically, there are four ways to manage hardware maintenance. First, hardware maintenance can be managed by doing nothing until something fails. Then the defective part can be replaced. This is cost-effective if unplanned down time, lost data, and all of the other unpleasantness of a disk drive failure is acceptable.
Alternatively, preventive maintenance can be practiced. This method requires the replacement of all parts that typically fail, before they fail. This is somewhat effective in reducing unscheduled down time (parts do not always fail on schedule), but has a high cost in replacing parts that would not have failed.
The third alternative is to use redundancy. In this method, if one disk drive is needed, two or more are used, with one for primary and one or more are used as a mirrored backup. Redundant Array of Independent Disks (RAID) is another example of redundancy. Redundancy has additional expense because of the extra hardware and software requirement, and may lower the performance of the system.
Finally, there is a fourth maintenance solution. Condition monitoring may be used to provide predictive failure analysis (PFA). PFA condition monitoring is an improved method that can provide early warning of impending failure, and allow scheduled replacement of the failing device.
As with any electrical/mechanical device, there are two basic failure types. First, there is the on/off type of failure, e.g., a cable breaks, a component burns out, a solder connection fails. These are all examples of unpredictable catastrophic failures. As assembly and component processes have improved, these types of defects have been reduced but not eliminated. PFA cannot provide warning for on/off unpredictable failures.
The second type of failure is the gradual performance degradation of components. PFA has been developed to monitor performance of the disk drive, analyze data from periodic internal measurements, and recommend replacement when specific thresholds are exceeded. The thresholds have been determined by examining the history logs of disk drives that have failed in simulated customer operation.
Typically, predictive failure analysis in disk drives involves the measurement of several attributes, including head flying height, to predict failures. The disk drive, upon sensing degradation of an attribute, such as flying height, sends a notice to the host that a failure may occur. Upon receiving notice, users can take steps to protect their data.
PFA is an attractive solution to disk drive maintenance. PFA minimizes exposure to data loss, and at a much lower cost than redundancy. PFA only calls for preventive replacement of a disk drive when that drive""s performance is degraded. Accordingly, PFA provides a new level of data protection and allow for scheduled replacement of the drive.
PFA may monitor performance in two ways. PFA may be a measurement driven process or a symptom driven process. The measurement driven process automatically performs a suite of self-diagnostic tests which measure changes in the disk drive""s component characteristics. Various magnetic parameters of the head and disk are measured, as well as figures of merit for the channel electronics. Head fly height on all data surfaces, channel noise, signal coherence, signal amplitude, and writing parameters may also be monitored. Unlike conventional error monitors, this method provides for direct detection of specific mechanisms that can precede a disk drive failure.
The measurement of the current driving the spindle motor has also been used to detect changes in the health of the drive. For example, an increase in the current required to drive the spindle motor may indicate drag at one or more head/disk interfaces.
However, because of the subtleties involved at the head/disk interface, further details of the head/disk interface provided by magnetoresistive (MR) head resistance monitoring have not been previously used in the predictive failure analysis. Yet qualitative and quantitative monitoring of the head/disk interface would provide an indicator of system performance over time and could identify specific mechanisms that can precede a disk drive failure. Further, the qualitative and quantitative monitoring of the head/disk interface is applicable to contact recording, where identification of wear at the head-disk interface is important.
It can be seen then that there is a need for a method and apparatus that provides qualitative and/or quantitative monitoring of the head/disk interface during normal file operations.
It can also be seen that there is a need for a method and apparatus that quantifies magnetic head wear.
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a method and apparatus for providing predictive failure analysis using the resistance or a sensor.
The present invention solves the above-described problems by correlating the resistance of the sensor with the stripe height in the sensor to provide a quantitative measurement of the sensor wear.
A method and apparatus in accordance with the principles of the present invention obtains a baseline measurement of resistance for at least one sensor of a disk drive, periodically obtains subsequent measurements of resistance for the at least one sensor of a disk drive and processes the subsequent measurements and the baseline measurement to identify a detrimental change to the at least one sensor.
Other embodiments of a method and apparatus in accordance with the principles of the invention may include alternative or optional additional aspects. One such aspect of the present invention is that the processing further includes comparing a subsequent resistance measurement for the at least one sensor to the baseline measurement of resistance for the at least one sensor to detect a head/disk interface problem and flagging the file for corrective action when the head/disk interface problem is detected.
Another aspect of the present invention is that the processing further includes determining a change in stripe height based upon the difference between the baseline measurement of resistance and the subsequent measurement of resistance.
These and various other advantages and features of novelty which characterize the invention are pointed out with particularity in the claims annexed hereto and form a part hereof. However, for a better understanding of the invention, its advantages, and the objects obtained by its use, reference should be made to the drawings which form a further part hereof, and to accompanying descriptive matter, in which there are illustrated and described specific examples of an apparatus in accordance with the invention.