1. Field of the Invention
This invention relates generally to analysis of data streams in general, and in particular to a method for summarization of streams of time-ordered information.
2. Description of Background
Before our invention, in the standard stream model, each value is associated with a (timestamp, stream-id) pair. However, the stream-id itself may have some additional structure. For example, it may be decomposed into (location-id, type)=stream-id. Each such component of the stream-id an ‘aspect’. This additional structure should not be ignored in data exploration tasks, since it may provide additional insights. Thus the typical “flat-world view” is insufficient. In summary, even though the traditional data stream model is quite general, it cannot easily capture some important aspects of the data.
As an example, consider the monitoring of natural habitats. Sensitive wildlife and habitats need constant monitoring in a non-intrusive and non-disruptive manner. In such cases, wireless sensors are carefully deployed throughout those habitats to monitor the microclimates such as temperature, humidity and light intensity in and around those areas, as well as the voltage level of the sensors, in real time. In this regard, measurements from a particular sensor and from a given location (say temperature around a nesting burrow) as a single stream. In general, there are large numbers of such streams that come from different sensor types and locations. Thus, sensing location (which is hard-coded through the hardware sensor id) and sensor type give us two different ‘aspects’ of those streams, which should be clearly differentiated. In particular, it is not sensible to blindly analyze all streams together (for example, it is unusual to compare the humidity at location ‘A’ to the sensor voltage at location ‘B’). Instead, the streams of same type but different locations are often analyzed together for spatial variation patterns. Similarly, the streams at the same location but of different type can be studied for cross-type correlations. In general, a more challenging problem is how to analyze both location and type simultaneously and over time.
Other applications include computer cluster monitoring (example ‘aspects’: host-id and metric type), intelligent building monitoring (example ‘aspects’: location and sensor type) and management, monitoring of network traffic volumes (example ‘aspects’: source node and destination node) for any type of network (e.g., road networks, computer networks) or graphs (e.g., social networks with nodes corresponding to individuals and measurements of “level of interaction” at different time instants). Finally, any number of aspects, in addition to the timestamp aspect, is possible. For example, network traffic volume measurements may have an additional ‘aspect’ of traffic type (besides the source node and destination node aspects), which may include the dimensions of, e.g., voice, video, and data (for computer networks).
Improved methods of mining additional structure from streams of data in part gives rise to the present invention.