Test engineers endeavoring to provide Quality Assurance (QA) are expected to test and verify that a complex system of hardware and software components, which is provided to customers, can survive single or multiple component failures. However, there is currently no known way to do this across multiple products and with any degree of accuracy. One current methodology is for a test engineer to write test cases that fail individual components within the complex system and hope that this covers all the ways that a component can fail.
In today's computing environment, however, most systems do not have a single machine in isolation. Most systems combine a complex network of resources. Below is an example of a complex system, which in this case uses JMS (JAVA™ Message Service and HTTP (Hypertext Transfer Protocol):
JMS=>Cluster=> WebServices system AHTTP  => Server 1=> WebServices system B  => Server 2=> WebServices system C  => Server 3=> WebServices system DDatabase  => v7  =>v8(Java and all Javabased trademarks and logos are trademarks of Sun Microsystems Inc in the United States, other countries, or both.)
This exemplary complex system has a topology of components that produces two points of entry, three servers in a cluster, four backend services and two databases. This complexity produces greater than ten parts that potentially need to be crashed in a testing method, in addition to the testing requirements for crashing parts of the system.
A known testing method is AIS (Automation Infrastructure and Solutions), which provides a test framework for executing tests. The framework gathers results from each component at the end of a test run. A report of test success/failure/coverage/etc. is then produced. A similar scenario is the automatic verify and install tests which some components provide, such as IVT (Install Verification Test) or power-on self checks used in, for example, a ThinkPad. These test routines provide validation that a component has been installed correctly. They are shipped with the component and executed when verifying that the component functions correctly.
The problem to solve is how to provide a suitable methodology that will allow a testing engineer to quality assure a complex system and provide statistics on how well the engineer has tested the crash and recovery of the overall solution. It is known in the prior art to carry out testing on a complex system, and it is known to analyze the results of the testing, but it is not currently possible to judge if the testing that has been carried out has been sufficiently rigorous to fully test the complex system at issue.