1. Field of the Invention
The present invention relates to mechanisms for testing computer systems. More specifically, the present invention relates to a method and an apparatus for testing a computing system by injecting faults into the computer system while the computer system is running.
2. Related Art
The need for reliable computing systems has lead to the development of xe2x80x9chighly availablexe2x80x9d computer systems that continue to function when one or more of the subsystems and/or components of a computing system fail.
In order to ensure that highly available computer systems operate properly, it is necessary to perform rigorous testing. This testing is complicated by the fact that highly available computer systems typically include a large number of components and subsystems that are subject to failure. Furthermore, an operating system for a highly available computer system contains a large number of pathways to handle error conditions that must also be tested.
Some types of testing can be performed manually, for example by unplugging a computer system component, disconnecting a cable, or by pulling out a computer system board while the computer system is running. However, an outcome of this type of manual testing is typically not repeatable and is imprecise because the manual event can happen at random points in the execution path of a program and/or operating system that is executing on the highly available computer system.
What is needed is a method and an apparatus that facilitates testing a computer system by injecting faults at precise locations in the execution path of an operating system and/or program that is executing on a computer system.
One embodiment of the present invention provides a system for testing a computer system by using software to inject faults into the computer system while the computer system is operating. This system operates by allowing a programmer to include a fault point into source code for a program. This fault point causes a fault to occur if a trigger associated with the fault point is set and if an execution path of the program passes through the fault point. The system allows this source code to be compiled into executable code. Next, the system allows the computer system to be tested. This testing involves setting the trigger for the fault point, and then executing the executable code, so that the fault occurs if the execution path passes through the fault point. This testing also involves examining the result of the execution.
In one embodiment of the present invention, if the fault point is encountered while executing the executable code, the system executes the fault point by: looking up a trigger associated with the fault point; determining whether the trigger has been set; and executing code associated with the fault point if the trigger has been set.
In one embodiment of the present invention, the fault point calls a fault function that causes the fault to occur.
In one embodiment of the present invention, the fault point includes code that causes the fault to occur.
In one embodiment of the present invention, the trigger has global scope and is stored in a kernel address space of an operating system within the computer system.
In one embodiment of the present invention, the trigger is stored in an environment variable associated a method invocation.
In one embodiment of the present invention, the trigger is stored within an object reference. In a variation on this embodiment, the trigger causes the fault to be generated when the referenced object is invoked.
In one embodiment of the present invention, the fault can include: a computer system reboot operation, a computer system panic operation, a return of an error code, a forced change in control flow, a resource allocation failure, a response delay, and a deadlock.