The present invention relates to a computer platform, and in particular, to a system and method to forecast the performance of computing resources.
The computing resources of a large business represent a significant financial investment. When the business grows, resource managers must ensure that new resources are added as processing requirements increase. The fact that the growth and evolution of a computing platform is often rapid and irregular complicates management efforts. This is especially true for computing platforms common to banking institutions and telecommunications companies, for example, whose computing platforms typically include hundreds of geographically distributed computers.
To effectively manage the vast resources of a computing platform and to justify any requests for acquisition of new resources, managers need accurate forecasts of computing platform resource performance. However, conventional forecasting tools may not be adequate for use on computing platforms. For example, conventional sales performance forecasting tools, which use linear regression and multivariable regression to analyze data, commonly factor in such causal variables as the effect of holiday demand, advertising campaigns, price changes, etc. Similarly, pollution forecasting tools typically consider the causal effect of variations in traffic patterns. As such, using these tools to forecast computing platform resources may be problematical because causal parameters generally are difficult to establish and are unreliable.
Other conventional forecasting tools may be limited by the amount of data they can process. For example, some forecasting tools may not adequately purge older or non-essential data. Other forecasting tools may not appropriately incorporate new data as it becomes available. Still other forecasting tools may not have the computing power to perform calculations on large amounts of data.
The limitations of established forecasting tools are particularly troublesome when forecasting resources in computing platforms that are expanding or are already re-engineered. These computing platforms need a forecasting system and method that deal appropriately with new data as well as unneeded data. Moreover, these computing platforms need a forecasting system and method that augment causal-based forecasting tools to provide accurate and reliable forecasts.
Presented herein is a system and method to forecast computing platform resource performance that overcomes the limitations associated with conventional forecasting tools. An embodiment applies an autoregressive model to electronically generated empirical data to produce accurate and reliable computing platform resource performance forecasts. An embodiment of the present invention also statistically collapses large amounts of data, eliminates unneeded data, and recursively processes new data. The forecasts are compared to actual performance data, which may be graphically displayed or printed. A specific type of data is not important for the present invention, and those skilled in the art will understand that a wide variety of data may be used in the present invention. For example, the present invention contemplates any data that may be collected and verified over time. These data include, for example, Internet metering data, marketing data on the success or failure of product offerings, telephone usage patterns, cash flow analyses, financial data, customer survey data on product reliability, customer survey data on product preference, etc.
The system and method operate within a computing platform. In one embodiment, the computing platform may be a multiple virtual storage (MVS) computing platform. In another embodiment, the computing platform may be a UNIX computing platform. In other embodiments, the computing platforms may be disk operating system (DOS) computing platforms. Those skilled in the art will appreciate that a variety of computing platforms may be used to implement the present invention.
The computing platform includes at least one resource whose performance is forecast. In one embodiment, the computing platform resource may be a central processing unit (CPU). In another embodiment, the computing platform resource may be a memory storage unit. In other embodiments, the computing platform resource may be a printer, a disk, or a disk drive unit. A specific computing platform resource is not important for the present invention, and those skilled in the art will understand that a number of resources may be used in the present invention.
Each resource includes at least one aspect. The aspect may be a performance metric. The performance metric may be resource utilization. xe2x80x9cUtilizationxe2x80x9d is defined generally herein as the percentage that a particular computing platform resource is kept busy. Utilization is often termed xe2x80x9cconsumption.xe2x80x9d
In another embodiment, the performance metric may be resource efficiency or resource redundancy. xe2x80x9cEfficiencyxe2x80x9d is defined generally herein as the measure of the useful portion of the total work performed by the resource. xe2x80x9cRedundancyxe2x80x9d is defined generally herein as the measure of the increase in the workload of a particular resource. Of course, those skilled in the art will appreciate that a particular performance metric is not required by the present invention. Instead, a number of performance metrics may be used.
In one embodiment, the computing platform includes a resource manager. The resource manager collects performance data from its associated resource. The performance data is associated with a performance metric. In one embodiment, the resource manager collects performance data representing a CPU utilization performance metric.
The resource manager collects the performance data in regular intervals. In one embodiment, regular intervals include one-second intervals, for example. That is, in this embodiment, the resource manager collects performance data from its associated computer(s) every second. The interval size in which performance data is collected may be determined by the particular use for the performance metric, the particular resource, the particular computing platform, etc.
The computing platform also includes a plurality of statistical collapsers that statistically collapse the performance data into a series. In one embodiment, the series may be a time series representing a performance metric. A xe2x80x9ctime seriesxe2x80x9d is defined generally herein as any ordered sequence of observations. Each observation represents a given point in time and is thus termed a xe2x80x9ctime point.xe2x80x9d Accordingly, a time series includes at least one time point.
A first statistical collapser generates a first time series representing a performance metric as though its associated performance data had been collected at a first interval. The first time series includes a first set of time points. In one embodiment, the first statistical collapser generates a time series representing a performance metric as though its associated performance data had been collected in fifteen minute intervals. Accordingly, the time series includes four time points for each hour. In another embodiment, the first statistical collapser generates a time series representing a performance metric as though its associated performance data had been collected hourly. Accordingly, the time series includes one time point for each hour. It will be understood by persons skilled in the relevant art that the present invention encompasses statistical collapsers that generate time series representing performance metrics as though their associated performance data had been collected at any of a variety of suitable intervals. The interval size and corresponding number of time points generated by the first statistical collapser may be determined by the particular use for the performance metric, the particular resource, the particular computing platform, etc.
The computing platform also includes a database that stores data. In one embodiment, the database stores the time series representing the performance metric as though its associated performance data had been collected at fifteen-minute intervals.
The computing platform also includes a data extractor to extract data from the database. According to one embodiment, the data extractor extracts from the database the time series representing the performance metric as though its associated performance data had been collected at fifteen minute intervals.
The computing platform also includes a second statistical collapser. The second statistical collapser statistically collapses the first time series, producing a second time series. The second time series includes a second set of time points. In one embodiment, the second statistical collapser statistically collapses the fifteen minute time series into a one-week time series. That is, the second statistical collapser generates a time series representing a performance metric as though its associated performance data had been collected weekly. Accordingly, the time series includes approximately four time points for each month. In another embodiment, the second statistical collapser generates a time series representing a performance metric as though its associated performance data had been collected daily. The corresponding time series includes approximately thirty time points for each month. It will be understood by persons skilled in the relevant art that the second statistical collapser may generate time series representing a performance metric as though its performance data had been collected at any of a variety of suitable intervals. As described above with reference to the first statistical collapser, the interval size and corresponding number of time points generated by the second statistical collapser may be determined by the particular use for the performance metric, the particular resource, the particular computing platform, etc.
The computing platform also includes a time series analyzer to determine whether the second time series is statistically stationary. The time series analyzer uses a plurality of X2 (chi-square) tests to make this determination. The time series analyzer also evaluates autocorrelation statistics and autocovariance statistics. If the time series analyzer determines that the time series is statistically nonstationary, which is likely the case, then the time series analyzer converts the statistically nonstationary time series to a statistically stationary time series by differencing each time point in the time series. The statistically stationary time series now represents the differenced values of performance data.
The computing platform also includes a time point converter. If the time series is already statistically stationary or after the time series analyzer converts the time series to statistical stationarity, the time point converter applies a statistical data set to the time series. Recall that the time series represents the performance metric as though its associated performance data had been collected from the computing platform at regular intervals. As such, the time series includes information indicating the time that the performance data was collected. In one embodiment, this information includes a date/time stamp. That is, each data point in the time series includes a date/time stamp. The statistical data set converts each date/time stamp in the time series into a value representing a decimal number equivalent to the date/time stamp.
One feature of the present invention is an autoregressive modeling tool, which is applied to the converted time series to forecast a particular aspect of the computing platform. The autoregressive modeling tool is chosen by calculating autocorrelation, inverse autocorrelation, and partial autocorrelation functions, and by comparing these functions to theoretical correlation functions of several autoregressive constructs. In particular, one embodiment applies a first order mixed autoregressive construct, such as an autoregressive moving average (ARMA) construct, to the differenced time series. Another embodiment applies an autoregressive integrated moving average (ARIMA) construct to the differenced time series. In the embodiment where the performance metric is resource utilization and the resource is a CPU, the resulting autoregressive modeling tool reliably forecasts CPU consumption with a ninety-five percent accuracy, provides an upper ninety-five percent confidence level, and provides a lower ninety-five percent confidence level. Conventional systems and methods that rely on linear regression or multivariable regression techniques may carry a lower confidence level.
Another feature of the present invention is that it uses empirical data as inputs to the autoregressive modeling tool. Using empirical data rather than causal variables provides more accurate forecasts. In the embodiment where the performance metric is resource utilization and the resource is a central processing unit, the empirical data is actual historical performance data, including logical CPU utilization information as well as physical CPU utilization information. Moreover, the system and method generate recursive forecasts whereby actual future performance data is fed back into the autoregressive modeling tool to calibrate the autoregressive modeling tool.
The computing platform includes a results processor, which generates graphical representations of a performance metric. The results processor also generates information for use in written reports that document the results of the forecasting process. The graphical and textual representations demonstrate the greater accuracy and reliability the present invention provides over conventional forecasting systems and methods.
In one embodiment, the results processor may be a graphical display unit, such as a computer display screen. In another embodiment, the results processor may be a textual display unit, such as a printer. In the embodiment where the performance metric is resource utilization and the resource is a central processing unit, the results processor produces reports and graphical representations of comparisons of actual CPU utilization with CPU utilization forecasts.
Further features and advantages of the present invention as well as the structure and operation of various embodiments are described in detail below.