The present invention relates to improving the testing of systems, and more particularly, to using configuration information, performance measurements, and historical problem data to tune systems to improve test effectiveness.
Organizations that participate in system testing and software testing are interested in running their test systems in ways that uncover/discover problems, especially problems that are disruptive to the system and/or software. Through experience, testers learn how to tune their systems and workloads to bring out problems. However, this is an intuitive, trial-and-error, subjective, and labor-intensive process. Further, the amount of information available for making tuning decisions is beyond any human's or collection of humans' ability to process. In addition, hardware and software used in the test system change often, requiring a relearning process as to how to stress and overload the test system.
Current methods of tuning systems exclusively use performance measurements to determine how to tune the system to avoid problems. This tends to make it difficult or impossible to replicate observed problems, and therefore testers are not capable of fully realizing what is causing problems.
According to one embodiment, a test system includes a data collection module adapted for collecting data from a test system, a storage module adapted for storing the collected data in an organized format, the data including problem data, associated configuration information, associated performance information, and activity data, an analysis module adapted for analyzing the collected data to define at least two activity zones by correlating the problem data, the associated configuration information, the associated performance information, and the activity data, the at least two activity zones including a safe zone where the test system operates normally and a danger zone where the test system is susceptible to operational problems, and an adjustment module adapted for adjusting available resources and/or workload of the test system to cause the test system to operate in the danger zone thereby increasing a likelihood of fault occurrence for testing purposes, wherein the problem data includes symptoms and/or markers of the problem.
In another embodiment, a method for tuning a system includes collecting data from a test system, the data including problem data, associated configuration information, associated performance information, and activity data, storing the collected data in an organized format, analyzing the collected data to define at least two activity zones by correlating the problem data, the associated configuration information, the associated performance information, and the activity data, the at least two activity zones including a safe zone where the test system operates normally and a danger zone where the test system is susceptible to operational problems, and adjusting available resources and/or workload of the test system to cause the test system to operate in the danger zone thereby increasing a likelihood of fault occurrence for testing purposes.
In yet another embodiment, a computer program product for tuning a system includes a computer readable storage medium having computer readable program code embodied therewith. The computer readable program code includes computer readable program code configured to: collect data from a test system, the data including problem data, associated configuration information, associated performance information, and activity data; store the collected data in a database; analyze the collected data to define at least two activity zones by correlating the problem data, the associated configuration information, the associated performance information, and the activity data, the at least two activity zones including a safe zone where the test system operates normally and a danger zone where the test system is susceptible to operational problems; and adjust available resources and/or workload of the test system to cause the test system to operate in the danger zone thereby increasing a likelihood of fault occurrence for testing purposes by adjusting the available resources and/or the workload of the test system includes at least one of: starting or stopping one or more jobs including a thrasher that consumes resources and/or causes timing variations, starting or stopping one or more transactions, starting or stopping one or more tasks, varying an available number of central processing units (CPUs), varying an available amount of memory, and bringing online or taking offline one or more input/output (I/O) devices. The associated configuration information includes at least one of: a number of CPUs operating, an amount of available memory, a number of I/O devices connected, and connected I/O device types, the problem data includes symptoms and/or markers of the problem, the associated performance information includes at least one of: I/O queuing information, an I/O activity rate, an I/O response time, CPU utilization, memory utilization, direct access storage device (DASD) response time, transaction response time, and paging information, and the activity data includes at least one of: a number of active transactions for each component and/or application of interest, a number of queued transactions for each component and/or application of interest, a number of jobs for each component and/or application of interest, and a number of tasks for each component and/or application of interest.
Other aspects and embodiments of the present invention will become apparent from the following detailed description, which, when taken in conjunction with the drawings, illustrates by way of example the principles of the invention.