1. Field of the Invention
The present invention relates to the maintenance and repair of computer systems, and more particularly to the use of failure analysis to diagnose and correct computer system failures.
2. Background of the Related Art
Failure analysis is a process of analyzing a system, such as a computer system, to attempt to determine the cause of a failure or to prevent a failure from occurring or recurring. Predictive Failure Analysis (PFA) is a technology developed by IBM for anticipating the failure of components of a computer system. According to PFA, some key physical parameters of a hardware device (e.g. the head flying height of a hard disk drive) can be measured and compared against predefined thresholds to predict if failure of the device is imminent. The hardware device can generate an alert in advance of (e.g. up to 48 hours prior to) a likely failure of the device. This advance notice of potential failure gives the system administrator ample warning to either hot-swap the component (if applicable) or schedule downtime for the component to be changed or refreshed.
Cost considerations limit the extent to which failure analysis tools and methods can be implemented on some computer systems. For example, the cost constraints of desktop workstation blades in a blade server environment may limit or preclude the use of hardware required to implement some of the predictive failure analysis tools, such as counting single-bit memory errors, parity errors on hard disk drive memory reads, or memory bit drop-outs in a flash device with more than 100,000 write/erase cycles. Additional hardware and associated cost are required for each PFA function. To be cost competitive with stand-alone workstations, this additional cost is prohibitive.