Performance and reliability testing is a fundamental part of an enterprise system development Measurements related to scalability, response times, availability, capacity, achievable transaction-per-second rates, etc represent some of the key areas assessed in system test and performance engineering. In performance or reliability test runs, a key objective is to monitor the system under load, where the load is applied based on a large number of virtual users who exploit an operational profile and subject the system under test to load at a constant transaction per second rate (TPS).
Conventional performance and reliability tools (like Mercury LoadRunner™, Rational Robot™, Rational Performance Test™ etc.) provide the capability to load a system under test with a number of virtual users, where each of the virtual users exploits an operational profile. In enterprise load testing using the conventional performance and reliability tools, the number of virtual user can be as low as 500, or as high as several million users.
During the execution of performance and reliability test runs, situations often times occur where the virtual users fail/drop/terminate abruptly (abnormally). These situations could occur for a number of reasons such as subsystem failures, functional issues resulting in complete destruction of the virtual user's session, and so forth.
Assessment of system availability is another fundamental parameter in performance and reliability testing. System availability is generally given as a measurement of active users over a sustained time. Generally, system stability is assessed by executing a 7-day test run as a solid baseline. However, it is not uncommon for systems to remain under a constant load test for several weeks. It is desirable in measuring aspects of system availability that the number of users during the test run remains constant. In other words, it is desirable that the load (expressed as TPS) is sustained for the duration of the test run. In conventional test ran systems, when the virtual users fail (due to failures related to one of the enterprise system's components), the TPS drops and the constant TPS that was decided at the outset of the test run is no longer maintained. As the overall TPS rate is derived from the aggregation of all users on the system, TPS rates drops down if the number of users m the run is reduced. This creates a number of disadvantages for the test runs.
For example, a scenario where the test analyst sets up the test run on a Friday, monitor the initial phases of the test run during the day to ensure that all is well with the target TPS sustained, and allow the test run to continue over the weekend before returning, after a period of time, on Monday to take stock of the current situation. In the event some of the virtual users are abruptly terminated during the test run due to some reason or another, the effective number of active virtual users is less than the targeted number and hence the TPS targets are not met. In such situations conventional tools do not have the capability to compensate for the terminated virtual user in order to maintain the number of active virtual users and hence maintain the TPS load constant throughout the test run.
If 50% of the virtual users have been terminated, then the TPS will have effectively been halved. While the failures associated with the terminated (virtual) users has great value to testers, a fundamental disadvantage is the absence of answers to the questions on “how would the system have behaved if the constant TPS rate decided at the outset was sustained” or “would the system have remained available at the 72nd hour”. Consequently, methods performed by conventional tools forces the test analyst to analyse the cause of abrupt termination of virtual users, clean the system and bring the system to the base state, and then re-run the system. This means that the test analyst has to administer multiple runs by incrementally resolving the cause of abrupt user terminations. A disadvantage is the delays associated with the test analyst in understanding and assessing the system availability concerns, and the multiple incremental runs also leading additional effort and time consumption, thereby leading to slippage in project schedules. A further disadvantage is that architectural changes may be necessary to achieve enterprise capacity numbers resulting in additional cost. A further disadvantage in the existing tools is that several runs are required along with incremental debugging related to abnormal terminations before one can get to flushing out the high availability issues. Yet a further disadvantage is that significant time and knowledge is lost during performance and reliability runs because of this shortfall.