1. Field of the Invention
The present invention relates to a method of rationalising data stored in a physical form.
It has particular utility in relation to rationalising data that reflects a time-variant behaviour.
2. Related Art
In many cases, in order to model a time-variant behaviour it is necessary to gather training data over the course of time. Normally, the training data comprises a plurality of examples, each of which provides values of a plurality of parameters, which values characterise that example. The examples reflect the time-variant behaviour that is to be modelled.
Usually, a time-variant behaviour is modelled by running a computer program which controls the computer to output predicted values of one of the predicted parameters given an incomplete example that only provides values for the other parameters. Where the parameter whose value is being sought takes a few discrete values (or falls within one of a few discrete value ranges) then the model can be said to provide a classification of the incomplete example.
Conventionally, one of two approaches is used in gathering data over the course of time in order to model a time-variant behaviour.
Firstly, data can simply be accumulated over time. The disadvantage of this approach is that after any change in the time-variant behaviour the training data includes examples that reflect aspects of the time-variant behaviour that no longer subsist. The resulting increase in the proportion of the training data which is no longer applicable leads to the training data reflecting the time-variant behaviour less accurately. This results in any models that are produced on the basis of the training data also becoming less accurate.
Secondly, existing training data can be frequently replaced by training data relating to more recent events. However, if the behaviour that is being modelled includes rare events that are of interest (as is the case in relation to fraudulent calls, or failed calls in telephone networks, for example) then the paucity of data relating to such events results in the model being unsatisfactorily inaccurate.
The skilled person has therefore, up until the advent of the present invention, been faced with a trade-off. On the one hand, if he or she accumulates training data over time then models based on the training data lack adaptability. On the other hand, if he or she frequently replaces the training data then the accuracy of the model is limited.