1. Field of the Invention
The present invention relates to hard disk drives. More particularly, the present invention relates to a functional test for determining whether a hard disk drive has experienced an early-life failure, or may fail in the near future.
2. Description of the Related Art
Hard disk drives store large volumes of data on one or more disks mounted on a spindle assembly. Disk drives employ a disk control system for interfacing with a host (e.g., a computer) to control the reading and writing of data on a disk. Each disk includes at least one disk surface which is capable of storing data. On each disk surface, user data is stored in concentric circular tracks between an outside diameter and an inside diameter of the disk.
As a result of the manufacturing process, defective data sites may exist on the disk surfaces of the disk drive. These defective data sites are termed xe2x80x9cprior defectsxe2x80x9d. A defect discovery procedure is performed to locate these defects and mark them out as defective locations on the disk surface which are not available for use. A typical defect discovery procedure includes writing a known data pattern to the disk surface and subsequently reading the data pattern from the disk surface. Defective data sites are identified by comparing the data pattern read from the disk surface with the known data pattern written to the disk surface.
Following the defect discovery procedure, defective data sites are put in a prior defect list which is stored in a table. The prior defect list is used during formatting of the disk surface to generate a defect management table. Within the defect management table, the defective data sites may be mapped to data sector locations (cylinder number, head number, and data sector number). Once identified in the defect management table, the defective data sectors may not be used for storing data.
Defective data sites encountered after formatting the disk surface are known as xe2x80x9cgrown defectsxe2x80x9d. Grown defects often occur in locations adjacent to defective data sites found during defect discovery. Grown defects are also listed in a table, similar to that utilized by the xe2x80x9cprior defectsxe2x80x9d. The number of sites marked out on a disk drive as xe2x80x9cdefective data sitesxe2x80x9d is used as a measure of the quality of the disk drive. Upon interrogation by a host, the disk drive will report the defect list generated in the defect management table.
Defects such as xe2x80x9cprior defectsxe2x80x9d and xe2x80x9cgrown defectsxe2x80x9d are known as hard sector errors. A hard sector error is essentially permanent in nature, thus the sector cannot be recovered. A disk may also contain transient or xe2x80x9csoftxe2x80x9d errors. A transient error is defined as an error or defect which clears over a period of time. For example, a transient error may occur due to a thermal asperity on the disk surface. A retry mode may be entered, wherein the command is retried a number of times allowing sufficient time to pass for the transient error to clear. Transient errors are also logged on the drive as they occur.
A common problem encountered by disk drive manufacturers is the improper diagnosis of disk drive failures in customer systems. In many instances, functional disk drives improperly diagnosed as defective by customers are unnecessarily returned to the manufacturer, resulting in down time for the customer as well as extra expense to the manufacturer to diagnose the disk. The problem of improper diagnosis of disk failures is particularly acute in drives that are relatively new (e.g., fewer than 600 power-on hours).
Test suites presently exist for testing the condition of a disk drive. These test suites exhaustively test all locations on the surface of the disk for failures. Unfortunately, these test suites require extensive run time (often 30 minutes or longer), and often require special expertise to activate proprietary modes within the disk drive. Therefore, such test suites are rarely used by end customers to diagnose drive problems. Additionally, since such test suites read, write, and verify essentially all storage sites on the disk, such tests will become even slower as disk capacities increase in the future. Finally, test suites may affect customer data stored on the disk drive.
It is desirable to have a functional test for customers to quickly and easily diagnose early-life disk drives in customer systems when a disk failure is suspected, in order to prevent the customer return of properly functioning disk drives. The time required for performing the functional test should be independent of the capacity of the hard disk drive. The functional test should utilize both historical performance parameters (such as xe2x80x9chardxe2x80x9d and xe2x80x9csoftxe2x80x9d errors and xe2x80x9cpriorxe2x80x9d and xe2x80x9cgrownxe2x80x9d defects) continually logged on the disk, and active read/write/verify operations to the most susceptible/critical data sites on the disk in order to determine the operating condition of the disk. Finally, the functional test should not disturb customer data while testing the surface of the disk.
The present invention provides a method of functionally testing a potentially defective disk drive having data sites on a disk for recording data thereon. During the operation of the disk drive, the disk drive stores a plurality of historical performance parameters for continuously logging operational problems.
The method begins by performing an analysis of the stored historical performance parameters. A set of performance thresholds associated with each of the plurality of stored historical performance parameters is defined. Next the stored plurality of historical performance parameters is retrieved, and each of the plurality of historical performance parameters is compared against its associated performance threshold. If the value of the historical performance parameter exceeds the associated performance threshold, the disk drive is marked as a failed disk drive.
If none of the performance thresholds are exceeded, the method next performs a set of non-destructive read/write tests to selected regions of the disk. A set of performance thresholds associated with each of the set of non-destructive read/write tests is defined. Next, the set of non-destructive read/write tests is run, generating a set of results. The results of each of the non-destructive read/write tests is then compared against the associated performance threshold. If the results of the non-destructive read/write tests exceed: the associated performance threshold, the disk drive is marked as a failed disk drive.
In one embodiment of the present invention, the functional testing method retrieves a power-on time parameter value from the disk drive, compares the power-on time parameter value against a user defined threshold value and if the power-on time parameter value exceeds the user defined threshold value, the functional test is terminated. The power-on time parameter value is set to 600 hours in a preferred embodiment of the present invention.
The functional testing method of the present invention issues commands to the disk drive which are compliant with SCSI-3 specifications. The plurality of historical parameters used by the present invention include: counts of soft error rates and reassignments, counts of corrected and uncorrected errors encountered during read, write, and :verify operations to the disk drive, and the number of entries found in the grown defect list (GLIST).
The set of non-destructive read/write tests include: a read/write test of a known pattern to all disk drive heads in a non-customer data area, a verification test of the first 100 megabytes of data on the disk drive, and a series of random inner diameter zone region/outer diameter zone region read and seek tests. The series of random inner diameter zone region/outer diameter zone region read and seek test includes a random read operation to an inner diameter zone. region logical block address (LBA), followed by a random read operation of an outer diameter zone region LBA, followed by a seek to a random inner diameter zone region LBA, followed by a seek to a random outer diameter zone region LBA. The inner diameter zone region of the disk drive is defined by the range of logical block addresses (LBA""s) from (maximum LBA/8)*5 to maximum LBA. The outer diameter zone region of:the disk drive is defined by the range of logical block addresses from 0 to (maximum LBA/8). The present invention provides a function for obtaining the data capacity and maximum logical block address (LBA) of the disk drive under test.
The amount of time required to perform the functional test of the present invention is independent of the capacity of the drive. In one embodiment of the present invention, the amount of time required to functionally test a disk drive is approximately two minutes.
In one embodiment of the present invention, the functional testing method retrieves a manufacturer name from the disk drive, compares the retrieved manufacturer name against one or more user supplied manufacturer names, and terminates the functional test if the retrieved manufacturer name does not match any of the one or more user supplied manufacturer names.
In one embodiment of the present invention, the functional test resides in software. In an alternate embodiment, the functional test resides in firmware within the disk drive. The functional test of the present invention operates on SCSI disk drives.