1. Technical Field
This application generally relates to analyzing drive errors in data storage systems.
2. Description of Related Art
Computer systems may include different resources used by one or more host processors. Resources and host processors in a computer system may be interconnected by one or more communication connections. These resources may include, for example, data storage devices such as those included in the data storage systems manufactured by EMC Corporation. These data storage systems may be coupled to one or more servers or host processors and provide storage services to each host processor. Multiple data storage systems from one or more different vendors may be connected and may provide common data storage for one or more host processors in a computer system.
A host processor may perform a variety of data processing tasks and operations using the data storage system. For example, a host processor may perform basic system I/O operations in connection with data requests, such as data read and write operations.
Host processor systems may store and retrieve data using a storage device containing a plurality of host interface units, disk drives, and disk interface units. The host systems access the storage device through a plurality of channels provided therewith. Host systems provide data and access control information through the channels to the storage device and the storage device provides data to the host systems also through the channels. The host systems do not address the disk drives of the storage device directly, but rather, access what appears to the host systems as a plurality of logical disk units. The logical disk units may or may not correspond to the actual disk drives. Allowing multiple host systems to access the single storage device unit allows the host systems to share data in the device. In order to facilitate sharing of the data on the device, additional software on the data storage systems may also be used.
Disk drives are used as primary data storage devices in data storage systems and other modern computer systems and networks. While very reliable, today's disk drives occasionally fail. In addition to causing computer system downtime, such disk drive failures can result in the loss of some or all of the data stored in the disk drive. Accordingly, disk drives commonly perform Predictive Failure Analysis (PFA) using Self-Monitoring, Analysis and Reporting Technology (SMART), to predict disk drive failure caused by the gradual decay of electrical and/or mechanical components of the disk drive. The primary goal of PFA is to predict when disk drive failure is imminent to allow the data stored in the disk drive to be archived.
PFA is generally performed during the operation of the disk drive by monitoring key disk drive attributes that are indicative of the health of the disk drive. Additionally PFA can be implemented by performing periodic self-diagnostic tests on the disk drive. Present methods of performing PFA in disk drives will predict imminent disk drive failure based upon errors associated with a single attribute (e.g., read errors, seek errors, fly-height errors, etc.). In these methods, errors corresponding to the single attribute are monitored and compared to a threshold value. When the errors exceed the threshold, a warning of imminent disk drive failure is provided to the user.