1. Field of the Invention
The present invention relates generally to disk fault correction techniques for storage devices and, more particularly, to a method of logging commands and error condition codes associated with drive errors.
2. Description of Related Art
The vast majority of personal computer (PC) systems available today come equipped with a peripheral data storage device such as a hard disk (HD) drive. Hard disks are comprised of rigid platters, made of aluminum alloy or a mixture of glass and ceramic, covered with a magnetic coating. Platters vary in size and hard disk drives generally come in two form factors, 5.25 in or 3.5 in. Typically, two or more platters are stacked on top of each other with a common spindle that turns the whole assembly at several thousand revolutions per minute. There is a gap between the platters, making room for a magnetic read/write head, mounted on the end of an actuator arm. There is a read/write head for each side of each platter, mounted on arms which can move them radially. The arms are moved in unison by a head actuator, which contains a voice coilxe2x80x94an electromagnetic coil that can move a magnet very rapidly.
Each platter is double-sided and divided into tracks. Tracks are concentric circles around the central spindle. Tracks physically above each other on the platters are grouped together into a cylinder. Cylinders are further divided into sectors. Depending on the disk drive vendor, a sector is typically comprised of 512 bytes of user data, followed by a number or number of cross-check bytes, a number of error correction code (ECC) bytes and other vendor specific diagnostic information. Thus, these devices are complex electro-mechanical devices and, as such, can suffer performance degradation or failure due to a single event or a combination of events.
There are generally two general classes of failures that can occur in disk drives. The first class is the hard or catastrophic type of failure which causes the drive to quickly and unpredictably fail. These failures can be caused by static electricity, handling damage, or thermal-related solder problems. The second class of failures result from the the gradual decay of other electrical and/or mechanical components within the drive after it is put in service.
Drive failure prediction techniques for this second class of failures are discussed in U.S. Pat. No. 5,828,583 to Bush et al. and U.S. Pat. No. 5,761,411 to Teague et al.
With respect to the first class of failures, the nature of these failures causes them to be very difficult to predict. However, even if prediction is impractical there is still a need to understand the root cause of the hard failure to determine if there is a design or manufacturing defect present in the drives.
Typically, the determination of a hard failure is performed in a lab of the manufacturer after the defective drive has been returned. If the drive is at all operable, certain tests can be performed on the drive to exercise a wide array of operations in an attempt to recreate the failure. Laboratory equipment, such as an ATA bus analyzer, is used to capture information pertaining to the operations and the sequence of operations in order to have a history of information for diagnosing the failure. If the failure can be recreated, the root cause of the problem can be understood.
However, the hard failures are very difficult to recreate. Certain hard failures render the drive fully inoperable. Other hard failures may render the platter or media inoperable while the electronics still function. Still other failures may never be recreated because of certain environmental conditions that are not known.
Therefore, there is a need for an improved means for diagnosing hard failures in disk drives.
According to a preferred embodiment, the present invention includes a method, apparatus and computer system for logging errors of a storage device. The storage device is capable of executing commands received from a host processor and detecting errors in the performance of those commands. The storage device also includes a non-volatile memory or media for storing data and other information as described herein. As commands are received by the storage device a list of previously executed commands is maintained by the storage device. When an error is detected by the storage device, a set of error conditions, such as the ATA task file read registers are stored in the non-volatile memory along with the command list to create an error log. The error log is a useful source of diagnostic information for errors that are difficult to replicate on the storage device.
The storage device is responsive to a retrieve error log command for providing the error logs to a host computer when the command is received. A predetermined number of error logs are maintained by the storage device in a circular buffer with a pointer providing an indication as to which error log is most recent. Additionally, each command has associated with it a time stamp indicating the time when the command was received by the storage device and each error has a time stamp associated with it indicating when the error was detected by the storage device. Additionally, since the storage device is capable of operating in a number of states, such as SLEEP, STAND-BY, ACTIVE/IDLE, and OPERATING state at the time of the error is also stored in each error log. Additionally, the storage device maintains a running count of the number of errors detected.