There exist numerous applications in which real time data analysis may be required. For example, data events may be collected in a financial setting to identify potentially fraudulent activity, in a network setting to track network usage, in a business setting to identify business opportunities or problems, etc. Often, it may be necessary to examine individual data events as they occur to immediately investigate any suspect behavior. Challenges however arise when analyzing data events in real time since historical data values are typically necessary to identify trends and patterns. Namely, accessing historical data can be a relatively slow process, and thus limits real time processing.
There exist various known techniques (e.g., running estimates, moving windows, etc.) for analyzing data events in real time (or near real time). In such techniques, the historical data is essentially “built in” to the currently calculated estimate, thus providing a running statistical summary in a single value. Such techniques utilize little or no historical data to provide a statistical analysis of detected event values. Instead, they, e.g., maintain a running value, which is updated each time a new data event value is collected. New data event values can then be compared to the existing statistical summary for the associated stream of data events to identify irregularities.
In some cases, it is desired to track irregularities by comparing values in an individual data stream against a peer group of data streams. For example, all people working with a particular job in a call center should work at pretty much the same rate. Thus, productivity and trends should be similar for all employees. Accordingly, one could track (and adapt) ordinary behaviors and irregularities of the peer group as a whole, and also compare the individual entities against the peer group profile to look for irregularities.
One of the challenges is to provide a system that allows for situations where a group of entities are expected to behave in a similar overall manner, but not necessarily behave identically. For example, insurance agents or retail businesses have very different types and sizes of offices or stores; e.g., some may have a regular big turnover and others a less regular, lower turnover. However, there are overall industry trends that one would expect to apply to all similar entities. Thus, where there is a general downturn in a particular insurance segment or shopping pattern, one would not want the system to issue an “exception” (e.g., warning) for all agents or all shops. Rather, one would want the system to recognize the overall trend for the segment and compare trends of individual entities with a trend of the entire segment, e.g., identify agents whose activity has dropped even more than the industry trend, or agents who have bucked the trend.
While this type of group-based trend analysis is applied in various fields, no effective techniques exist for performing this type of analysis in real time. For example, it is common to analyze industry trends, such as in the oil segment, and then analyze individual companies within the segment. Similarly, in the stock market, it is often useful to identify a company whose stock price is on the rise when the overall industry is in a decline. However, given the need to track numerous data streams, there exist no effective real time techniques that can: (1) ascertain the overall trend (or other statistical summary) for the group; (2) ascertain trends (or other statistical summaries) for individual entities within the group; and (3) compare the individual trends against the overall trend.