1. Technical Field
The present invention relates to performance modeling of information technology (IT) systems. More specifically, the present invention relates to the online performance modeling using inference for real production IT systems.
2. Description of the Related Art
Performance modeling has been of great theoretical and practical importance in the design, engineering and optimization of computer and communication systems and applications for several decades. A modeling approach is particularly efficient in providing architects and engineers with qualitative and quantitative insights about the system under consideration.
However, as Information Technology (IT) matures and expands in the scope of available applications, IT systems increase at an increasing rate in both size and complexity. For example, today, a typical Web service hosting center may have hundreds of nodes and dozens of different applications simultaneously running on it. Each of the nodes in turn has often multiple processors and layered caches. These nodes make use of both local and shared storage systems. The size and complexity of such systems make performance modeling much more difficult, if at all tractable. Detailed modeling, fine tuning and accurate analysis can be carried out only on very small systems or very small components in a system.
In addition, due to the rapid evolution of hardware technology, components in these systems are upgraded at a much higher pace than in the recent past, in order to meet demand and to improve the Quality of Service (QoS) parameters of performance and availability. Hence, performance modeling should be done in a very short time frame in order for the model and analysis to be relevant.
These constraints made performance modeling work on modern IT systems very expensive, and often unaffordable. In order to obtain relatively accurate performance evaluation results with a short turnaround time, i.e., before the system under consideration becomes obsolete, heavy investments are necessary in human and computer power.
On the other hand, IT systems have become critical in most businesses. Losses of millions of dollars per minute when a company's IT system goes down are well-documented. Thus, it is natural that users impose more and more stringent QoS requirements on their systems. In the case of IT outsourcing, service-level agreements (SLA) signed between the parties stipulate, among other things, the service quality guarantees, often with associated penalties in case of violations. As a consequence, predictive modeling is truly vital in the capacity planning and QoS management of such systems.
To build performance models in a short time frame, where typically there is no time to set up any testing environment, one should consider to tune the model on-line using performance data from production IT system. There are some fundamental challenges in doing so, since the production system is a non-controlled environment. The workload is typically volatile, non-stationary, having peak/off-peak regimes, and also having daily and weekly or seasonal patterns. There is no detailed knowledge of the transaction mix as it is also transient.
Furthermore, there are only limited monitoring/performance measurements that can be collected to help model development so that such measurements are not too intrusive to the production system. Such monitoring and performance measurements are typically collected through periodic probing from various geographic locations, which incur a further challenge in the sense that such end-to-end delay measurements include different and also transient network delays as they are from different geographic locations.
Queuing network models have been and continue to be the most popular paradigm for the performance analysis of such systems (See, e.g., L. Kleinrock; Queueing Systems Volume II: Computer Applications; John Wiley and Sons, 1976; and D. Menasce and V. Almeida; Capacity Planning for Web Performance; Prentice hall, 1998). People also use discrete event simulation methodology to model a complex IT system. This type of approach requires feeding detailed modeling parameters to the simulation engine, but direct measurement of these parameters in general is very costly, time consuming and very intrusive to the production system.