As known, queuing network models are a powerful framework to study and predict the performance of computer systems, i.e. for capacity planning of the system. However, their parameterization is often a challenging task and it cannot be entirely automatically performed. The problem of estimating the parameters of queuing network models has been undertaken in a number of works in the prior art, in connection with IT systems and communication networks.
One of the most critical parameters is the service time of the system, which is the mean time required to process one request when no other requests are being processed by the system. Indeed, service time estimation is a building block in queuing network modeling, as diagrammatically shown in FIG. 1A.
To parameterize a queuing network model, service time must be provided for each combination of service station and workload class. Unfortunately, service time measurements are rarely available in real systems and obtaining them might require invasive techniques such as benchmarking, load testing, profiling, application instrumentation or kernel instrumentation. On the other hand, aggregate measurements such as the workload and the utilization are usually available.
According to the utilization law, the service time can be estimated from workload (=throughput of the system) and utilization using simple statistical techniques such as least squares regression. However, anomalous or discontinuous behaviour can occur during the observation period. For instance, hardware and software may be upgraded or subject to failure, reducing or increasing service time, and certain background tasks can affect the residual utilization. The system therefore has multiple working zones, each corresponding to a different regression model, which shall be correctly detected and taken into consideration. This task, according to the prior art, cannot be efficiently automatically performed.
Two examples of a poor detection of regression models is shown in FIGS. 1B and 1C: here the single regression line is not effectively and correctly representing the behaviour of sampled data from two IT systems.
The problem of simultaneously identifying the clustering of linearly related samples and the regression lines is known in literature as clusterwise linear regression (CWLR) or regression-wise clustering and is a particular case of model-based clustering. This problem has immense applications in areas like control systems, neural networks and medicine.
This problem has already been addressed by using different techniques, but usually it requires some degree of manual intervention: i.e., human intelligence is required to detect at least the number of clusters within the dataset points and to supply the correct value of some parameters to the chosen algorithm.
An object of the present invention is hence to supply an enhanced method for estimating these regression models and correctly classifying observation samples according to the regression model that generated them, so as to correctly plan capacity and upgrading of the system.
In other words, given n observations of workload versus utilization of an IT system, it is required to identify the number k of significant clusters, the corresponding regression lines (service time and residual utilization), cluster membership and outliers. Based on this identification, estimation of the IT system behaviour over a wide range of workload and utilization can be inferred, so that automatic upgrading or allocation of hardware/software resources can be performed in the system.
However, the clustering results do not carry any time-related information, which is crucial to understanding the past history of the system and predicting how it will be able to handle future workloads. The ability to detect when the system changes from one configuration to another also allows the detection of performance-related issues, such as performance degradations or utilization spikes due to non-modeled workloads. Therefore, starting from an accurate clustering of the points, a timestamp analysis has to be performed.
The identification of multiple system configurations and their grounding into identifiable time-frames or recurring patterns can bring control of complex and dynamic environments to the capacity planner, easing the necessity to rely on external information, which is hard to obtain (for example deployment of an updated application) and a time-consuming activity.