Performance simulation of software systems running on one or more computers is a crucial consideration for developers deploying network-based services, such as those services available over the Internet, an intranet, or an extranet. For example, developers often want to determine how their software design decisions will affect future system performance. Likewise, system users want to determine the optimal mix of hardware to purchase for an expected system load level, and system administrators want to identify the bottlenecks in their system and the system load levels at which to expect performance problems.
During the design of such software services, a software developer may employ performance modeling tools to model the system prior to release, in hope of finding an optimal design and to identify and troubleshoot potential problems. With such preparation in the design and implementation phases of the systems, the developer stands an improved probability of maintaining the necessary system performance demanded by users under a variety of conditions. However, many developers merely use ad-hoc or custom performance modeling techniques based on simple linear regression models. More sophisticated and more accurate approaches are desirable.
One problem with existing approaches for modeling performance of a system is that contention for use of system resources is not accurately modeled at high load levels. Under such load conditions, a resource may be subject to events generated by multiple other resources within the system. Such events may exceed the performance capabilities of the resource; therefore, some of the events may be rejected or delayed by the resource until it is capable of handling the events.
Another problem with existing approaches for modeling performance of a system is that feedback about such resource contention is not easily and accurately communicated to the user. Feedback is either too abstract (a single delay value) or too detailed (detailed event logs). In some approaches, for example, the user merely receives an indication that the system ran the workload in X seconds. Alternatively, a user may receive an indication that an individual event in the workload required Y milliseconds to complete. However, neither approach offers any explanation as to why these simulation times resulted. It would be more helpful to offer feedback indicating that, for example, the system's ability to access a given disk or communication channel was overloaded, thereby resulting in a slower than expected performance.