(1) Field of Invention
The present invention relates to a system for detecting group behaviors in data, and more particularly, to a system for detecting group behaviors in data by monitoring individual behaviors for events of interest.
(2) Description of Related Art
Data mining is the process of extracting patterns from data. Presently, data is being collected at a rate that can no longer be manually analyzed. Therefore, there is a strong need to automate the knowledge extraction and processing tasks. Existing data mining and knowledge discovery from data (KDD) algorithms have begun to address these needs; however, existing techniques are not effective with data from dynamically evolving relational domains. With current techniques, it is very difficult to detect and monitor group behaviors from data, such as when a group of individuals collude, or work in unison, together. Two major limiting factors are the quantity of information required for the detection of group behaviors and the relationships involved.
In a survey paper of the interdisciplinary field of network science, Vespignani et al. describe different types of networks, such as (un)directed and (non)weighted networks, as well as issues such as node and edge properties in “Network Science” in the Annual Review of Information Science and Technology 41, 537-607, 2007. Vespignani et al. is hereby incorporated by reference as though fully set forth herein. The authors briefly discuss methods for sampling large datasets where the entire dataset cannot be observed at the same time. The paper also presents some methods for modeling dynamic networks and for network visualization.
Additionally, Chakrabarti et al. analyze properties of real-world graphs and discuss methods for generating graphs with similar properties (e.g., power laws, small diameters, community effects) in “Graph Mining: Laws, Generators, and Algorithms” in Association for Computing Machinery (ACM) Computing Surveys, Vol. 38, No. 1, March 2006. Chakrabarti et alt is hereby incorporated by reference as though fully set forth herein. A problem related to this paper is that of detecting abnormalities (e.g., outliers) in a graph.
Zhou et al. present methods for identifying communities within weblogs with common interests in “Discovering Web Communities in the Blogspace” in Proceedings of the 40th Hawaii International Conference on Systems Sciences, 2007. Zhou et al. is hereby incorporated by reference as though fully set forth herein. The authors define a community as a group of blogs that have a high density of edges within them and a low density of edges between groups. The problem of community identification is related to collusion, however, there are differences in that there may be communities which are not colluding together or there may be a large community where only a small number of members are involved in the collusion. In their analysis, Zhou et al. also require a “full picture” of the network of interest and the relationships between them.
Finally, Flake et al. analyze Internet data to model self-organization and community identification in “Self-Organization and Identification of Web Communities” in Computer, 2002. Flake et al. is hereby incorporated by reference as though fully set forth herein. Their work focuses on links between websites, rather than weblogs and it detects the communities by assuming one or more seed Web sites and then recasts the problem as a maximum flow problem, making the community identification problem tractable.
The prior art described above deals with searching, community formation/classification, and epidemics (i.e., disease or information spreading). Thus, a continuing need exists for a system which makes it possible to identify group patterns in data through monitoring of individual events. The invention described herein provides a system that allows analysis of data from dynamically evolving relational domains.