Techniques for processing data streams have gained importance in recent years because of the great ease with which stream data can be collected due to hardware technology advances. There is much existing literature on the extension of data mining techniques to the case of data streams.
A known problem in the data stream environment is that of detecting patterns of interaction among a set of entities operating in such an environment. A convenient way to model entity interaction relationships is to view them as graphs in which the nodes correspond to entities and the edges correspond to the interactions among the nodes. The weights on these edges represent the level of interaction between the different participants.
For example, in the case when the nodes represent interacting entities in a business environment, the weights on the edges among these entities could represent the volume of business transactions. A community of interaction may therefore be defined to be a set of entities with a high degree of interaction among the participants.
The problem of finding communities in graphs has been discussed in the literature, see, e.g., C. Cortes, D. Pregibon and C. Volinsky, “Communities of Interest,” Proceedings of Intelligent Data Analysis, (2001); C. Cortes, D. Pregibon and C. Volinsky, “Computational Methods for Dynamic Graphs, Journal of Computational and Graphical Statistics,” vol. 2, pp. 950-970, (2003); D. Gibson, J. Kleinberg and P. Raghavan, “Inferring Web Communities from Link Topology,” Proceedings of the 9th ACM Conference on Hypertext and Hypermedia, (1998); D. Kempe, J. Kleinberg and E. Tardos, “Maximizing the Spread of Influence Through a Social Network,” ACM KDD Conference, (2003); J. Kleinberg, “Authoritative Sources in a Hyperlinked Environment,” ACM SODA Conference, (1998); R. Kumar, J. Novak, P. Raghavan and A. Tomkins, “On the Bursty Evolution of Blogspace,” Proceedings of the WWW Conference, (2003); S. Rajagopalan, R. Kumar, P. Raghavan and A. Tomkins, “Trawling the Web for emerging cyber-communities,” Proceedings of the 8th WWW conference, (1999); N. Imafuji and M. Kitsuregawa, “Finding a Web Community by Maximum Flow Algorithm with HITS Score Based Capacity,” DASFAA, pp. 101-106, (2003); and M. Toyoda and M. Kitsuregawa, “Extracting evolution of web communities from a series of web archives,” Hypertext, pp. 28-37, (2003).
Since most of the existing techniques are designed for applications such as the web (“web” commonly refers to the World Wide Web), they usually assume a gradually evolving model for the interaction. Such techniques are not very useful for a fast stream environment in which the entities and their underlying relationships may quickly evolve over time. Examples of environments where the interaction among different entities can rapidly evolve over time include environments where entities comprise sets of businesses which interact with one another, sets of co-authors in a dynamic bibliography database, or the entities could be hyperlinks from web pages.
Accordingly, there is a need for improved techniques for detecting patterns of interaction among a set of entities and analyzing community evolution in a stream environment.