It can be very useful to know about activities between individuals. For example, what individuals are associated with other individuals? Which individuals communicate with other individuals? When two or more individuals get together is there an intended purpose? Who are the leaders or important individuals of a group? What is the organizational structure of the group? It can prove useful further yet to have the capability to actually model the above types of interactions and associations. To an extent, this type of social research has been addressed by employing the disciplines of data mining and community generation.
Examples of such problems include mining movie data to find out how actors/actresses, directors, and producers are linked to different movies and how the movies are linked to different awards; mining on Web community or topic related documents to find out where the hubs and authorities or the related documents are and how they are linked together; mining the commercial merchandise sales data of a franchise store nation-wide to determine the associations (or correlations) among a group of merchandise items; mining customer search topic data collected over a period of time in a library to identify a group of related common interests and their relationships; and mining the traffic data collected from a wide network of geographical locations nation-wide or within a specific area (e.g., NY City) to find out the traffic accident pattern correlations among a group of locations. The government or civilian sector also has a number of requirements for such a capability. Such examples include the identification of terrorist cells, crime rings such as money laundering, drug interdiction and the identification of tactical units in the battlefield.
In some of the problems the data is given with existing links such as the movie data with actor-movie links and the Web data with Web links while in others the data is given completely in isolation and no link information is available such as sales data, customer search topic data collected from a library, or traffic records collected in different geographical locations. The goal then is to generate communities based on yet-to-be-determined links between the data items. Current research in community generation focuses on the former and is addressed under the area of relational data mining and learning in the literature. But what happens when you don't have explicit link/relationship information? To our knowledge, nobody has systematically addressed this class of problems and in fact it has not even been identified as another paradigm within the community generation area let alone the data mining community. To this avail, we have entitled this set of problems as the Uni-party Data Community Generation (UDCG) problem. To facilitate the comparison, we call the former class of problems (where we know or are given the relationships) as Bi-party Data Community Generation (BDCG) problems.