1. Technical Field
The disclosure relates generally to data mining and knowledge discovery.
2. Description of Related Art
A variety of person-to-person communication forms have been created throughout history. While many forms are still in use today, electronic mail, “e-mail,” currently has become a ubiquitous tool in both the business and private sectors of everyday life. The use of e-mail and content of an e-mail message can be analyzed to derive other information not necessarily inherent in the content itself. When applied to e-mail messaging and e-mail content, natural language processing techniques and pattern recognition techniques can be used to derive other, non-inherent, information. For example, within an organization's computer network, based on an analysis of e-mail message header and attachment information, a system administrator may derive reports based on that information rather than the content to determine appropriate uses of e-mail in the network without reading the message content itself. As another example, monitoring and displaying to a user a variety of e-mail usage statistics may provide information that may affect the user's own e-mail usage practices and habits.
Identifying organizational patterns of individuals or other entities has been a focus for data mining and knowledge discovery researchers. An entity, whether formal or informal, that provides opportunities for communication among its members may be threaded by groups of individuals who may have similar goals and a shared understanding of their activities. These “communities-of-practice” may be generally thought of as informal groups of individuals or entities associated due to common interests or a common goal. For example, within a corporation, communities-of-practice may be generally though of as informal networks of collaboration between individuals or departments that naturally grow and coalesce toward achieving corporate goals.
Organizational hierarchy and organizational communities-of practice knowledge is a useful tool for many types of studies. For example, an organization may have an interest in understanding their hierarchy and communities-of-practice communication flow as a way of improving knowledge sharing, communications efficiency, and facilitation of member interactions and collaborations. With respect to businesses, hierarchical constructs—usually in the form of known manner “organization charts”—for the entire organization, for subunits thereof, or for informally related individuals or entities, are often created through extensive and expensive manual labor; the creator must be given access to precise, given data, namely, each employee's name, title, ranking of such a title, and the like. There is a need for data mining and knowledge discovery techniques for reducing such extensive manual labor tasks and improving derivative results.
It is known that a graph can be said to have community structure if it consists of subsets of vertices, or nodes, with many edges, or nodal interconnects, connecting vertices of the same subset but few edges between subsets. See e.g., Girvan, M., & Newman, M. (2002), “Community structure in social and biological networks,” Proc. Natl. Acad. Sci. USA 99, 8271-8276, incorporated herein by reference, and referred to hereinafter as a “Girvan & Newman model.” Partitioning a graph into discrete groupings of nodes can be based on the idea of “betweenness centrality.” See e.g., Freeman, L. (1977), “A Set of Measures of Centrality Based on Betweenness,” Sociometry 40, 35-41 and referred to hereinafter as “Freeman model;” Wilkinson, D. and Huberman, H. (2002), “A Method for Finding Communities of Related Genes,” http://www.hpl.hp.com/shl/papers/communities/index.html, hereinafter referred to as “Wilkinson & Huberman method”; Brandes, U. (2001), “A Faster Algorithm for Betweenness Centrality,” Journal of Mathematical Sociology 25 (2): 163-177, referred to hereinafter as the “Brandes algorithm”; and Newman, M. (2001), “Who is the Best Connected Scientist? A study of scientific co-authorship networks,” Phys Rev. E64; and Newman, M. (2003), “The structure and function of complex networks,” SIAM Review, June 2003. Each of these references is incorporated herein by reference.