Computer applications (“software applications” or “software programs”) have quality measures called service levels, which include response time, availability and error rate; however, maintaining these service levels while the applications are subjected to increased usage, data volumes, and processing requirements is complex and, often, unachievable. One key to maintaining service levels is to collect data while the application is subjected to stress variations that approach its stress limits. This data can be analyzed to assist in various design, maintenance, and management tasks such as (a) automatically adding and removing resources available to the application, for example, additional computers or computing components, (b) identifying software and hardware bottlenecks to assist in performance and scalability tuning, (c) managing the application with the goal of reducing the amount of hardware or software resources required by an application, and (d) proactively identifying potential service level problems before being encountered by users of the application.
Certain techniques analyze data based on stress conditions that may occur naturally in operational situations. However, these techniques are often unable to obtain data collected during conditions that approach stress limits.
Other techniques are based on generating artificial workloads via execution of load scripts. While these techniques can be used to collect data during a wide range of stress conditions in both operational and non-operational environments, data collected during these conditions is often significantly different from data collected during stress conditions of real user workloads running in a real application environment with dependencies on real external services or databases. The data collected during these stress conditions is of little value for the aforementioned maintenance and management tasks.
Certain techniques add artificial workloads to real user workloads in volumes intended to cause application stress. These techniques are inefficient because they are known to increase overhead on the application's resources that can obtrusively affect user service levels that results in decreased user satisfaction and increased costs.
Other available techniques require human involvement in creating and updating load scripts that are often large and thus, serve as a barrier to revising the scripts when the application or workload characteristics change. Stress data collected while running old load scripts does not reflect current operational conditions and thus, prevents maintenance and management tasks from being effectively performed.
Some available techniques vary the stress on a horizontally scalable multi-node software application in an operational situation by using a load balancer to either reduce the number of nodes available to the application or otherwise varying the load on each node. These techniques obtrusively affect user service levels which decrease user satisfaction with the application.