1. Field of the Invention
The invention relates to modeling system performance. Specifically, the invention relates to an apparatus and method for modeling traffic server system performance under conditions characterized by highly variable traffic arrival rates.
2. Description of the Related Art
A dilemma faced by most high volume eBusiness websites, including web servers, application servers, and database servers, is that it's always difficult, though highly desirable, to find a cost-efficient way to meet some key performance metrics or services levels (especially those relating to availability) under unanticipated highly variable workloads without investing heavily in additional hardware resources that is idling most of the time.
The task of planning an optimum configuration for large Web servers has become ever challenging. This is because the hardware and software structure of large Web sites grows increasingly complex, and the characteristics of the associated traffic arrival patterns and associated workloads are at best poorly understood, or at worst, essentially unknown because the system has yet to be implemented.
Even with this growing complexity, typical IT infrastructures can be analyzed and related models (i.e. simulators) can be developed to assist in predicting and planning how to meet future requirements. Network loads can be characterized by identifying key traffic parameters that affect network sizing and performance, such as packet size distribution, packet throughput, and packet interarrival time distribution. However, the results are often not satisfactory. The predictions can become complex when, as is often the case, there are many different hardware and software configurations that must be tested, and there are numerous performance criteria that must all be simultaneously met, while at the same time maximizing system throughput for the number of concurrent users supported by the system.
Capacity planning and performance modeling of complex computer systems generally require detailed information about the traffic arrival patterns and workload assumed to be running on those systems. Studies have shown that network traffic tends to be “bursty”, rather than evenly distributed over time. Traffic burstiness may be defined as the tendency of data packets to arrive in bursts, with the inter-packet arrival time within a burst being much smaller than the average inter-packet arrival time outside of the burst.
Bursty traffic can have a significant effect on the queuing delays and response times of a network, since it can cause unpredicted capacity overloads from which the network must recover. Extended overloads contribute to network congestion and increase the probability of buffer overruns and dropped packets. Dropping packets to prevent extended overloads affects the quality of service and usually results in degraded performance.
The introduction of high speed networking technologies and high performance personal computers and workstations, which are capable of transmitting packets at a very high rate, has increased the potential variability of network traffic dramatically. In addition to the variability in network load and packet arrival rates, packets transmitted by these systems are generally closely related. The packets associated with the same application tend to arrive at the same destination over a short time period. This correlation is evident, for example, when a large file is transmitted from a file server to a diskless workstation.
Detailed performance studies of a complex server system typically involves queuing theory, a specialized branch of mathematics that studies the servicing of a succession of requests on a resource. For example, queuing theory has been widely applied in the study of highway traffic patterns, network servers, and even patrons of a bank. The basis for many of these performance studies is the analysis of (1) the arrival of requests and (2) the time to service the requests. If the average time to service a request is greater than the average arrival time, a large queue will form.
Service providers typically are interested in achieving metrics associated with a maximum time that a request waits for service (queuing delay), and a maximum total time until the request is satisfied (total delay). The total delay is typically the sum of the queuing delay and the service delay. It is desirable in system modeling to provide a system configuration that provides a minimum total delay in almost all cases.
Simple capacity planning can be done by calculating the number of users per second that can be processed without exceeding the maximum utilization requirements of any of the system resources (i.e. processors, disks, network). More detailed estimates that also project the overall response time per user (factoring in queuing effects on various resources) can also be made. Modeling queuing delay performance often requires a projection of an average arrival rate and an assumption of an arrival distribution pattern.
Models employing queuing theory generally predict the behavior of systems that service randomly arising demands. A Poisson pattern is usually assumed wherein the probability of an arrival is proportional to the length of a time interval. It follows that the inter-arrival times of a Poisson pattern are a sequence of independent and identically distributed random variables with an exponential density function.
However, web site traffic can be highly variable under certain conditions such as “Stock Market Storms,” “Holiday sales,” “breaking news stories”, and other unanticipated events. Under these conditions, inter-arrival times between web user visits often include periods of high activity followed by periods of low activity, resulting in an arrival distribution that deviates substantially from an exponential distribution. This highly variable inter-arrival pattern results in an average response time that can be much longer than the prediction a typical Poisson model would predict.
A metric known as a “coefficient of variation”, a measurement of the inter-arrival time standard deviation divided by the inter-arrival time mean, can be applied to systems as a measurement of variability. In some systems, the distribution of inter-arrival times cluster tightly about the mean, producing a relatively small standard deviation and hence a coefficient of variation much less than unity. For example, in deterministic arrival processes the inter-arrival times are synchronous. Thus, the coefficient of variation is zero. The tick of a clock illustrates a deterministic inter-arrival time distribution. In a deterministic inter-arrival time distribution, there is no variation.
An exponential inter-arrival time distribution produces a coefficient of variation with a value of one. An exponential arrival process assumes a random arrival pattern. Historically, queuing models have assumed a random arrival pattern and used the exponential inter-arrival time distribution. However, studies of many “real world” traffic arrival rate patterns, such as web site traffic, show highly variable, bursty arrival patterns with coefficient of variation values significantly greater than one. Similarly, studies of highway traffic patterns reveal the same type of bursty arrival patterns with clusters correlating to physical locations and times of the day. Consequently, modeling such systems assuming an exponential arrival time distribution predicts shorter response times than are experienced in real life, and can lead the service provider to underestimate the server capability required to meet response time metrics.
There exists an accepted basis in queuing theory to solve for cases wherein the coefficient of variability is greater than one. Unfortunately, the queuing theory equations are complex and often require inputs that are not readily available. For example, significant historical arrival rate information may be required. To be statistically valid, the historical arrival rate information may be required to span periods that encompass traffic patterns demonstrating low and high arrival rates. Furthermore, the historical data may be required to be representative of the future arrival patterns. Companies with dynamic growth rates and changing business patterns may have great difficulty obtaining historical data that truly represents their future traffic arrival patterns. In addition, the established queuing theory equations applicable to high values for inter-arrival coefficient of variation are not acceptably accurate for low levels of server utilization.
Accordingly, a need exists for an apparatus and method for simply and accurately modeling highly variable queue arrival rates. In particular, the apparatus and method should generate model results that facilitate high quality predictions for resources required to satisfy the highly variable queue arrival rates within predefined quality of service parameters. In addition, the apparatus and method should generate model results that substantially correspond to real world experience for comparable queuing systems and should accurately portray the effect of different levels of resource utilization.