1. Field of the Invention
The present invention generally relates to computing systems. More particularly, the present invention is directed to the quantitative measurement of the autonomic/self-managing capabilities of computing systems.
2. Related Art
Autonomic computing (AC) describes the self-management capability of a computing system where the components anticipate computing system needs and resolve problems with minimum human intervention. Today, most major hardware and software vendors invest heavily in AC features. To this extent, it is important to quantify the AC capability of computing systems.
Disturbance injection (e.g., the injection of a fault) is a technique commonly employed by testing organizations to evaluate the availability of autonomic systems. An illustrative benchmarking system 10 employing disturbance injection in accordance with the prior art is depicted in FIG. 1. The benchmarking system 10 includes a benchmark driver 12 and a system under test (SUT) 14. The benchmark driver 12 subjects the SUT 14 to a workload 16 designed to be representative of typical system use and receives responses 18 from the SUT 14. Benchmark results 20 are derived from how quickly the SUT 14 can satisfy the imposed workload 16, as measured by the benchmark driver 12. Disturbances (faults) 22 are injected into the SUT 14 by the benchmark driver 12 to evaluate the ability of the SUT 14 to “self-heal.”
An illustrative disturbance injection methodology 24 in accordance with the prior art is illustrated in FIG. 2. The disturbance injection methodology 24 will be described below with reference to components of the benchmarking system 10 illustrated in FIG. 1. As shown, during an “injection slot” 26, one or more disturbances 22 are injected into the SUT 14 by the benchmark driver 12, while the workload 16 is applied to the SUT 14. A disturbance 22 may comprise, for example, a software fault, an operator fault, a high-level hardware failure, etc. Each injection slot 26 comprises a plurality of different time periods including a startup interval 28, an injection interval 30, a detection interval 32, a recovery interval 34, and a keep interval 36. During the startup interval 28, the SUT 14 is run with the workload 16 applied until a steady state condition is achieved. During the injection interval 30, the SUT 14 is run at the steady state condition for a predetermined period of time, after which a disturbance 22 is injected into the SUT 14 by the benchmark driver 12. The detection interval 32 is the amount of time between the injection of the disturbance 22 into the SUT 14 and the initiation of a (scripted) recovery procedure by the benchmark driver 12. The recovery interval 34 represents the amount of time required by the SUT 14 to execute the recovery procedure. During the keep interval 36, the SUT 14 continues to run (steady state). The impact of the injected disturbance 22 on the SUT 14 is evaluated at the end of the keep interval 36. The disturbance 22 is removed (optionally) at the end of the keep interval 36.
There are three types of AC systems, each of which provides different responses to disturbances:    1. Non-autonomic—manual disturbance detection and manual recovery initiation. For example, an operator of a database system is informed by the help desk that numerous complaints related to a particular process have been received. In response, the operator terminates the undesirable process in the database system.    2. Fully autonomic—automatic disturbance detection and automatic recovery initialization. For example, an autonomic manager determines that there is a undesirable process in a system and terminates the process automatically without any human intervention.    3. Partially autonomic—automatic disturbance detection and manual recovery initialization. For example, an autonomic manager determines that there is a undesirable process in a system and sends out an alert/message. A human operator detects the problem by receiving an alert/message on a console or pager. In response, the operator locates the undesirable process based on the information provided in the alert/message and terminates the process.
With the traditional fault injection method, a disturbance 22 is injected into the SUT 14 during steady state operation. After injection of the disturbance 22, the benchmark driver 12 waits a predetermined amount of time (i.e., the detection interval 32), based on the type of disturbance 22, before initiating the recovery procedure. Thus, the only variable in the traditional fault injection method is the length of the recovery interval 34.
There are several problems with the traditional approach described above, including, for example:    Problem 1: There is flexibility in handling a partially autonomic system that provides alerts/messages to an operator regarding a detected problem and information on how to fix the detected problem. This type of partially autonomic system is predominant, for example, in many database systems where alerts/messages are communicated to a database administrator via a pager or other communication device. The use of a fixed detection interval 32 (e.g., derived from the. Mean Time To Recover (MTTR)—the average time that it takes to repair a failure) will not work in this type of situation, as the automatic provision of an alert/message will cut the length of time for the detection of a problem significantly. To this extent, use of a fixed detection interval 32 in the presence of partially autonomic features will not provide an accurate and/or repeatable measurement of AC capability.    Problem 2: If the system is a fully autonomic self-healing system, the benchmark driver 12 has no control over the timing of the detection of or recovery from a problem. An example is RAID5 disk fault tolerance in database systems where the disk sub-system automatically detects a disk failure and automatically bypasses the failed disks.
Accordingly, a need exists for an improved method for quantitatively measuring the autonomic capabilities of systems having different degrees of automation (i.e., non-autonomic, fully autonomic, and partially autonomic).