The present invention relates to problem isolation, and more specifically, this invention relates to monitoring for, and detection of, performance problems in hardware.
Hardware systems, such as virtual tape servers, network systems, etc. often include multiple hardware components that function in a very similar manner. For example, in a virtual tape server such as the IBM TS7700, include a disk cache subsystem which gets installed with a gamut of physical disk drive media (DDM).
Due to the nature of virtual tape servers, if any number of DDMs are defective (e.g., suffering from faulty microcode, manufacturing problems, mechanical breakdown, etc.), the problem is not easily identifiable and/or avoidable once the DDM is installed in virtual tape system in the field.
By analogy, similar problems may occur in deployments of any type of hardware. Determining performance problems of hardware systems, e.g., such as those described above, is often best performed with the use of fine-tuned values and/or thresholds. However, establishing performance thresholds can be problematic, because if any device passes below the threshold (e.g., even in a case where the device is not under-performing), the device will likely be classified as under-performing even if it is not. Furthermore, an incorrectly set threshold may lead to the mis-categorization of a device, which in itself could potentially lead to a device functioning improperly and/or not at all. These issues often require device thresholds to be frequently fine-tuned, which itself leads to a plethora of problems.