1. Filed of the Invention
The invention relates to the detection of changes in data streams. In particular, the present invention seeks to detect a change in a data stream created by a target event.
2. Description of the Related Technology
Change detection systems find a wide variety of applications, including, but not limited to, fraud detection (for example, looking for changes in patterns on credit card usage), security systems (for example, detecting attacks on computer networks), process, fault and condition monitoring (for example, looking for changes in the pattern of vibration in vehicle engines), environmental monitoring systems (for example, identifying chemical spillage and pollution), and health monitoring (for example, to alert medical workers of sudden changes in the condition of patients). To detect changes in practice, entities and processes are typically monitored by taking regular measurements of critical parameters such as those parameters that are most likely to contain information about the changes of interest. Changes are usually identified by comparing the parameters to thresholds, designed to indicate a target event, with an alert generated if the thresholds are exceeded. Such thresholds are usually not fixed, but are functions of statistics extracted from a data stream containing information about the system being monitored, such as its mean, variance or percentiles.
The change detection performance of such systems is limited by several factors. Firstly, unless the thresholds are effectively set to extreme percentiles (such as the 99th) the number of false alerts—those generated even when no significant change in the data stream has occurred—will be too high for many practical applications. For example, a classic application of the invention is to monitor calls in large telecommunications networks for unusual activity that may be indicative of fraud. Since it is not uncommon for such networks to support 100 million calls per day, a threshold based on the 99th percentile would generate around 1 million false alerts per day (assuming that the percentile estimate is accurate, and that the call statistics are ergodic), which is over 1,000 times the number that can be processed by the fraud investigation teams employed by most telecommunications companies. In practice, gradual changes in the way in which the telecommunications network is used will cause the false alert rate to be even higher than the 1 percent that the 99th percentile would prima facie imply. For example, the growth of internet usage has produced a gradual increase in the number of exceptionally long calls—those over two hours—which has been sustained over several years. The affects of such changes on the false alert rate of percentile-based algorithms cannot readily be alleviated by increasing the percentile used to, for example, the 999th because not only does that increase the risk of missing the more subtle frauds, but more extreme percentiles also take longer to estimate. This time factor is important because a reasonable estimate of the percentile must be formed before the algorithm can produce useful fraud alerts, increasing the risk that major frauds are missed because they are committed before the algorithm is ready to detect them.
A more fundamental limitation of the use of thresholds is that not all changes that can conceivably occur in a data stream can be detected by them. Assume, for example, that a change detection algorithm is used to monitor the condition of the suspension of a car through the periodic measurement of the extension of a spring attached to one of the car's wheels. Ten seconds worth of simulated measurements are shown in the top graph shown in FIG. 1. Assuming that the normal behaviour of the suspension is that there is wide variation in the spring length, as shown in the regions of the top graph of FIG. 1 that lie outside the dashed lines, it is possible that some modes of failure, such as periodic seizure, can cause the spring length to show less variation, as occurs between the dashed lines. This change in behaviour cannot be detected using thresholds because there is no threshold that can be placed on the measured parameter—that is, spring length—that would cause substantially more alerts to be generated when the suspension behaves pathologically than when it behaves normally. In summary, threshold based systems can only detect changes where there is a substantial shift in the proportion of the probability mass of the contents of the data stream from below to above the threshold (of vice versa) and the choice of threshold is necessarily restricted to extreme percentiles to minimise the rate at which false alerts are generated.