This invention relates generally to the status of computer storage devices, and more particularly to a standardized mechanism for predicting storage device failures.
Corporations and other enterprises have a need to monitor the performance and status of elements of their computer networks to prevent data loss and to maximize resource efficiency. The computer industry is addressing this need by putting together the concept of Web-Based Enterprise Management (WBEM). WBEM is an industry initiative for developing a standardized, nonproprietary means for accessing and sharing management information in an enterprise network. The WBEM initiative is intended to solve the problem of collecting end-to-end management and diagnostic data in enterprise networks that may include hardware from multiple vendors, numerous protocols and operating systems, and a legion of distributed applications.
The founding companies of the WBEM initiative developed a prototype set of environment-independent specifications describing how to access any type of management instrumentation. An industry-wide initiative known as the Common Information Model (CIM) was started by a consortium of companies, including Microsoft and Compaq, who voluntarily ceded control of their developed work to the Distributed Management Task Force (DMTF, an industry standards body previously known as the Desktop Management Task Force). The CIM specification describes the modeling language, naming, and mapping techniques used to collect and transfer information from data providers and other management models. Windows Management Instrumentation (WMI) is one implementation of the CIM. WMI provides for developing a standardized technology for accessing management information in enterprise environments.
One component of WMI is the Windows Driver Model (WDM) provider for kernel component instrumentation. The WDM provider interfaces with a kernel mode component that provides services to allow WDM-enabled drivers to implement WMI, and also acts as an interface to the WDM provider that resides in the user mode. WMI uses the WDM provider to publish information, configure device settings, and supply event notification from device drivers.
One of the elements that needs to be monitored is hardware storage devices, such as hard disk drives, floppy disk drives, tape storage devices, CD-ROMs, DVD ROMs and RAM disks. Prediction of storage device failures can be based on a variety of factors, for example, temperature, height of the head to the platter, and number of retries required to perform a read or write operation. Hardware storage devices communicate via device drivers. In the past, prediction and reporting of storage device failures or potential storage device failures was the responsibility of the manufacturer/vendor of the hardware device or the developer of the device driver. If a manufacturer/vendor of a hardware device wanted to include storage failure prediction/reporting, the manufacturer/vendor was responsible for developing the storage device failure application, as well as the details of an application programming interface (API) that other vendors of management applications could use.
Leaving the responsibility of storage device failure prediction to individual manufacturers/vendors of storage devices or developers of device drivers causes several problems. First, writing such an application is a time consuming task for each vendor, which can result in several negative consequences. For example, the vendor may opt not to include a storage device failure prediction application, or a vendor may include a storage device failure prediction application that adds to the cost of the device and is more prone to bugs than using a single mechanism for the reporting of storage device failure prediction. Another problem is that the end result is often an inconsistent user interface and an inconsistent API set for obtaining the information.
Some devices are Self-Monitoring Analysis and Reporting Technology (SMART) system devices. Currently, SMART system devices include some SCSI (Small Computer System Interface) and some ATA/ATAPI (Application Programming Interface first used by the IBM PC AT system) devices. ATA and ATAPI hardware interfaces can be used to communicate with an IDE (Integrated Device Electronics) device. SMART ATA/ATAPI devices follow the Information Technologyxe2x80x94AT Attachment with Packet Interface SMART command set specification, which is known in the art. SMART SCSI devices follow the Informational Exceptions Control page specification as defined in the SCSI specification, which is known in the art. SMART devices employ a technology that monitors and predicts device performance and reliability. SMART devices use various diagnostic tests in order to detect problems with devices with the object of increasing productivity and protecting data.
Typical enterprise consumers already have an infrastructure to manage hundreds of servers and thousands of personal computers (PCs). These consumers would like the management application that they are currently using to seamlessly integrate with the storage failure prediction application. In order to accomplish this seamless integration, the vendors of each storage device should be able to propagate storage device failure information to all prominent management application vendors. The existing infrastructure may only be able to report imminent failures in SCSI and ATA/ATAPI devices.
Accordingly, a need exists for a standardized mechanism for predicting and reporting storage device failures. The standardized mechanism should be capable of use with all storage devices, including currently supported SMART devices, currently supported non-SMART devices, and devices that are not currently supported, such as CD-ROMs, DVD ROMS, tape storage devices, or RAM disks.
The present invention is directed to a computerized method and system for a standardized way of predicting and reporting storage device failures for any type of storage device. The system includes one or more device drivers, one or more storage management drivers, and one or more management applications. Each of the device drivers interfaces with a hardware storage device. The interface between the hardware storage device and the device driver includes status information which is used for the prediction of storage device failures. The management application is responsible for reporting the storage device failures. The storage management driver receives storage device failure status from each of the device drivers and propagates the storage device failure status to the management applications.
In accordance with other aspects of the invention, the device drivers and the storage management drivers reside in kernel mode and the management applications reside in user mode.
In accordance with still other aspects of the invention, WMI extensions to WDM provider is also included. The storage management driver propagates the storage device failure status information to the management applications via the WMI extensions to WDM provider. A portion of the WMI extensions to WDM provider resides in kernel mode and a portion of the WMI extensions to WDM provider resides in user mode. Alternatively, a failure prediction agent residing in user mode may be included in lieu of the WMI extensions to WDM provider.
In accordance with yet other aspects of the invention, storage device failure information is transmitted from a device driver to the storage management driver. The storage management driver then determines whether storage device failure status information should be propagated. If the storage management driver determines that storage device failure status information should be propagated, it propagates the storage device failure information to the management applications.
In accordance with still further aspects of the invention, a management application transmits a request for storage failure status information to at least one of the device drivers via the storage management driver. The storage device failure status information is determined and propagated to the management application via the storage management driver.
In accordance with another aspect of the invention, a display is included for a user to view the storage device failure status information.
In accordance with still other aspects of the invention, a device driver may include a failure prediction filter driver. A failure prediction filter driver can perform statistical analysis in order to determine whether to report a storage failure and/or it may send standard and/or proprietary commands (i.e., those commands that do not conform to the SMART specification be they of a standard other than SMART or hardware device specific, respectively) directly to a hardware device if the hardware device itself can determine if failure is being predicted.
In accordance with yet other aspects of the invention, a method is provided for uniform prediction and reporting of storage device failures. The method xe2x80x9cqueriesxe2x80x9d at least one storage device for status information using a procedure that is uniform for a variety of storage devices. The xe2x80x9cqueryingxe2x80x9d of a storage device can be at the request of a storage management driver or at the request of a management application. A determination is made whether a storage failure error should be reported based on the storage device status information. If it is determined that a storage failure should be reported, the storage failure error is reported.
In accordance with further aspects of the invention, the variety of storage devices about which storage failures are reported includes those SCSI devices and ATA/ATAPI devices that support SMART. The variety of devices can also include other storage devices, such as RAM disks, CD-ROMs, DVD ROMs, tape storage devices, and other types of disk drives that do not follow SMART standards. The method of xe2x80x9cqueryingxe2x80x9d the device is dependent on the type of device. For example, a SMART SCSI device is xe2x80x9cqueriedxe2x80x9d by examining the sense codes returned by an Input/Output (I/O) operation, such as a read or write, and a SMART ATA/ATAPI device is xe2x80x9cqueriedxe2x80x9d using a SMART Read Status command.
In accordance with still further aspects of the invention, the method of predicting and reporting storage device failures can be performed repeatedly. This repeated performance can be based on a timed interval. Predicting and reporting storage device failures can also be performed based on a request, such as at boot or based on a user request.