1. Field of the Invention
This invention relates to a computer system and method for predicting a future value of a series of data and in particular, but in no way limited to, a computer system and method for predicting a future value of a series of communications data or product data.
2. Description of the Prior Art
Predicting future values of a series of data is a difficult problem that is faced by managers and operators of processes and systems such as communications networks or manufacturing processes. For example, a communications network has a limited capacity and traffic levels within that network need to be managed effectively to make good use of the available resources whilst minimising congestion. However, previously, methods of predicting future values of data series such as communications data or product data have not been accurate enough to be used effectively. Another problem is that such methods of predicting are often computationally expensive and time consuming such that predicted values are not available far enough in advance to be useful.
Network and service providers typically enter into contracts with customers in which specified quality of service levels and other metrics are defined. Penalty payments are incurred in the event that these agreed service levels and metrics are not met and this is another reason why predicting future values of data series, such as communications data is important. By using such predicted values better management of communications network resources could be made such that contractual agreements are met.
Previously the approach of statistical process control (SPC) has been used to analyse data series. Data samples were obtained, such as traffic levels in a communications network at a particular time and data from these samples would then be used to make inferences about the whole population of traffic level data over time for the communications network. Typically, statistics such as the mean and standard deviation or range were calculated for the sample data for each parameter, and these statistics compared for different samples. For example, if the mean was observed to move outside a certain threshold range an xe2x80x9cout of controlxe2x80x9d flag would be triggered to alert the network operators to a problem in the communications network. If trends were observed in the data, for example, an increase in the mean, the operator could be alerted to this fact and then an investigation carried out.
Several problems with these statistical approaches are known. For example, an inference is made that the data sets fit a standard type of distribution, such as a normal or Poisson distribution. However, this is rarely the case for communications network data in which many outlying values are typically observed and which are often bimodal or show other irregular distributions. Also, data may be obtained from a small sample of the actual data series and used to make inferences about the whole population of data. This means that the statistics calculated are often not an accurate reflection of the process being analysed.
Another problem is that data that is available is often not suitable for statistical analysis. This is because the data sets are often small, incomplete, discontinuous and because they contain outlying values. However, this type of data is typically all that is available for communications network management, process control or other purposes.
The problems mentioned above also apply to process control and to data series of product data. Another problem in process control is being able to deal with the fact that the inputs to the process vary. For example, if components are supplied to a manufacturer for assembly into a final product, those components may vary from batch to batch and from supplier to supplier. However, it is very difficult to analyse how the components vary and this is time consuming and expensive. Also, it is difficult to determine what effect variations in the components may have on the manufacturing process that is being controlled. These problems increase for more complex products that involve many components, such as circuit boards. For this reason, many manufacturers aim to limit variability by attempting to strictly control all the initial build conditions which includes the supply base. This is often not possible if it is necessary to vary the supplier for other reasons, for example to attain a good price or to achieve continuity of supply. Many manufacturers of electronic systems rely heavily upon their suppliers to ensure that materials and components used in the fabrication of products are compliant to specification. Often, electronic components are not examined before they enter factories. Investment programmes for test equipment at the component level have shown that it is not practical to distinguish between batches of components and also that the instances of non-compliant components are negligible. For these reasons many manufacturing companies have wound down their incoming component inspection processes. Instances do occur where manufactured products exhibit changes in performance that are attributed to changes in the components but no effective way of dealing with this problem has been found.
A particular problem in process control involves the situation where a manufacturing process is set up in a particular location, such as the USA, and it is required to set up the same process in a new location, say Canada, in order to produce the same quality of product with the same efficiency. It is typically very difficult to set up the new process in such a way that the same quality of product is produced with the same efficiency because of the number of factors that influence the process.
Failure mode effect analysis is another problem in management of communications networks, communications equipment, or in process control. In this case, a failure occurs in the process and it is required to analyse why this has occurred and what corrective action should be taken. Current methods for dealing with failure mode effect analysis include schematic examination and fault injection techniques but these are not satisfactory because of the problems with the data mentioned above.
JP8314530 describes a failure prediction apparatus which uses chaos theory based methods. A physical quantity, such as an electrical signal, showing the condition of a single installation is measured repeatedly at regular intervals in order to collect a time series of data. This time series of data is then used to reconfigure an attractor which is used to predict future values of the time series. These predicted values are compared with observed values in order to predict failure of the installation. This system is disadvantageous in many respects. The input data must be repeated measurements from a single apparatus taken at regular intervals. However, in practice it is often not possible to obtain measurements at regular intervals. Also, JP8314530 does not address the problems of dealing with communications data, product data and non time series data such as product data obtained from many products which will vary. Also, JP8314530 is concerned with failure prediction only and not with other matters such as monitoring performance and detecting changes in behaviour of a process. Moreover, JP8314530 does not describe the process of identifying nearest neighbour vectors and determining corresponding vectors for these.
It is accordingly an object of the present invention to provide a computer system and method for predicting a future value of a series of data which overcomes or at least mitigates one or more of the problems noted above.
According to a first aspect of the present invention there is provided a method of predicting a future value of a series of data comprising the steps of:
(i) forming a set of vectors wherein each vector comprises a number of successive values of the series of data;
(ii) identifying from said set of vectors, a current vector which comprises a most recent value of the series of data;
(iii) identifying at least one nearest neighbour vector from said set of vectors, wherein for each nearest neighbour vector a measure of similarity between that nearest neighbour vector and the current vector is less than a threshold value;
(iv) for each nearest neighbour vector, determining a corresponding vector, each corresponding vector comprising values of the series of data that are a specified number of data values ahead of the data values of the nearest neighbour vector in said series of data; and
(v) calculating the predicted future value on the basis of at least some of the corresponding vector(s); wherein said series of data comprises either a plurality of values each measured from a different product or a series of communications data.
A corresponding computer system for predicting a future value of a series of data comprises:
(i) a processor arranged to form a set of vectors wherein each vector comprises a number of successive values of the series of data;
(ii) an identifier arranged to identify from said set of vectors, a current vector which comprises a most recent value of the series of data;
(iii) a second identifier arranged to identify at least one nearest neighbour vector from said set of vectors, wherein for each nearest neighbour vector a measure of similarity between that nearest neighbour vector and the current vector is less than a threshold value;
(iv) a determiner arranged to determine, for each nearest neighbour vector, a corresponding vector, each corresponding vector comprising values of the series of data that are a specified number of data values ahead of the data values of the nearest neighbour vector in said series of data; and
(v) a calculator arranged to calculate the predicted future value on the basis of at least some of the corresponding vector(s); wherein said series of data either comprises a plurality of values each measured from a different product or a series of communications data.
This provides the advantage that product data from a manufacturing process, or communications data can be analysed and used to provide a prediction about performance in the future. This removes any xe2x80x9ctime lagxe2x80x9d between obtaining data about the manufacturing or communications process and allows immediate modification to reduce waste. This reduces costs and improves efficiency. The manufacturing or communications process can be effectively controlled using the data despite the fact that this data may not fit a recognised statistical distribution and is not suitable for statistical analysis. The effects of inputs to the manufacturing or communications process, such as new suppliers or new communications equipment is monitored or controlled without the need to carry out measurements or tests on the inputs. In the case that the manufacturing or communications process fails the failure situation can be analysed by comparing the predicted and actual data.
According to another aspect of the present invention there is provided a method of substantially determining an attractor structure from a series of data comprising the steps of:
(i) forming a set of vectors wherein each vector comprises a number of successive values of the series of data;
(ii) calculating a set of eigenvectors and a set of eigenvalues from said set of vectors using the method of principal components analysis; and
(iii) transforming the said set of vectors on the basis of said set of eigenvectors; wherein said series of data either comprises a plurality of values each measured from a different product or comprises a series of communications data.
A corresponding computer system for substantially determining an attractor structure from a series of data comprises:
(i) a processor arranged to form a set of vectors wherein each vector comprises a number of successive values of the series of data;
(ii) a calculator arranged to calculate a set of eigenvectors and a set of eigenvalues from said set of vectors using the method of principal components analysis; and
(iii) a transformer arranged to transform the said set of vectors on the basis of said set of eigenvectors; wherein said series of data either comprises a plurality of values each measured from a different product or comprises a series of communications data.
This provides the advantage that a series of data can be analysed by determining an attractor structure. If no effective attractor structure is identified for a given parameter then this parameter is known not to be a good input for the prediction process. This enables the costs of obtaining data series to be reduced because ineffective data parameters can be eliminated. Another advantage is that two separate manufacturing or communications processes that are intended to produce the same result can be compared by comparing their attractor structures. Adjustments can then be made to the processes until the attractor structures are substantially identical and this helps to ensure that the same quality of product or service is produced.
An algorithm bank is compiled containing prediction algorithms suitable for different types of data series, including those exhibiting deterministic behaviour and those exhibiting stochastic behaviour. Recent past values of a data series are taken and assessed or audited in order to determine which of the algorithms in the bank would provide the optimal prediction. The selected algorithm is then used to predict future values of the data series. The assessment or auditing process is carried out in real time and a prediction algorithm selected using a xe2x80x9csmart switchxe2x80x9d such that different algorithms are used for different stages in a given series as required. This enables good prediction of data series which change in nature over time to be obtained. The assessment method allows a level of deterministic behaviour of the data series to be determined quickly and in a computationally inexpensive manner. The data series may contain outlying values, noise, and contain samples separated by irregular intervals. Any suitable type of data may be used such as communications data or product data. For example, traffic levels at a node in a communications network are successfully predicted using the method.
According to a first aspect of the present invention there is provided a method of predicting one or more future values of a series of data, said method comprising the steps of:
selecting a plurality of past values of said series of data;
assessing the level of deterministic behaviour of said series of data on the basis of said selected plurality of past values;
selecting a predictive algorithm from a store of predictive algorithms on the basis of said assessment of the level of deterministic behaviour of the series of data; and
using said selected predictive algorithm to predict said one or more future values of the series of data.
A corresponding computer system is provided for predicting one or more future values of a series of data, said computer system comprising:
an input arranged to accept a plurality of past values of said series of data;
a processor arranged to assess the level of deterministic behaviour of said series of data on the basis of said selected plurality of past values;
an input arranged to access a store of predictive algorithms and wherein said processor is further arranged to select one of said predictive algorithms on the basis of said assessment of the level of deterministic behaviour of the series of data; and
an output arranged to provided one or more future values of the series of data obtained by using said selected predictive algorithm.
A corresponding computer program is provided, stored on a computer readable medium, said computer program being arranged to control a computer system for predicting one or more future values of a series of data, said computer program being arranged to control said computer system such that:
a plurality of past values of said series of data is accepted;
an assessment of the level of deterministic behaviour of said series of data is made on the basis of said selected plurality of past values;
a store of predictive algorithms is accessed and one of said predictive algorithms selected on the basis of said assessment of the level of deterministic behaviour of the series of data; and
one or more future values of the series of data are obtained by using said selected predictive algorithm.
This provides the advantage that data, such as data from a communications process can be analysed and used to provide a prediction about performance of the process in the future. For example, the data may relate to traffic levels in a communications network. The selection of appropriate predictive algorithms in this manner may be carried out dynamically whilst a stream of data is being received and future values predicted. Advantageously, changes in the nature of the data are accommodated because different predictive algorithms are selected, in real time if required, and used to provide an optimal prediction at all times.
According to another aspect of the present invention there is provided a method of assessing a level of deterministic behaviour of a series of data comprising the steps of:
Using a predictive algorithm to predict a value of said data series which corresponds to a past value of said data series, said prediction being made on the basis of a subset of said past values;
Repeating said step (i) immediately above a plurality of times using the same predictive algorithm and wherein said subset of said past values is larger for successive repetitions of said step (i); and
Assessing the effect of the size of said subset of past values on the performance of said predictive algorithm.
A corresponding computer system is provided for assessing a level of deterministic behaviour of a series of data said computer system comprising:
(i) A processor arranged to use a predictive algorithm to predict a value of said data series which corresponds to a past value of said data series, said prediction being made on the basis of a subset of said past values; and
(ii) Wherein said processor is further arranged to repeat said step (i) immediately above a plurality of times using the same predictive algorithm and where said subset of said past values is larger for successive repetitions of said step (i); and
(iii) Wherein said processor is further arranged to assess the effect of the size of said subset of past values on the performance of said predictive algorithm.
This provides the advantage that it is possible to assess a level of deterministic behaviour of a series of data quickly and easily. Once this level of deterministic behaviour is determined it is possible to analyse or treat the data whilst taking into account this level of deterministic behaviour. For example, an appropriate algorithm for predicting future values of the data series can be chosen. It is also possible to assess the level of deterministic behaviour in a computationally inexpensive manner which may be calculated in real time.