1. Field of the Invention
The present invention generally relates to the field of social network analysis. More particularly, the invention relates to a system for analyzing social networks that consist of one or a plurality of network dimensions, one or a plurality of content dimensions, and, that evolve over time.
2. Description of Related Art
With the emergence and rapid proliferation of social media, such as instant messaging (e.g., IRC, AIM, MSN, Jabber, Skype), sharing sites (e.g., Flickr, Picassa, YouTube, Plaxo), blogs (e.g., Blogger, WordPress, LiveJournal), wikis (e.g., Wikipedia, PBWiki), microblogs (e.g., Twitter, Jaiku), social networks (e.g., MySpace, Facebook, Ning), to mention a few, anyone can produce content which (i) exists in a highly connected web of contexts (e.g., social groups, geographic locations, time, etc.), and (ii) is attributable to its creator. For example, over 92% of blogs contain explicit personal information on their front pages, and over 31% include full names of the authors. Furthermore, an increasing number of people, especially among the younger generation, casually accept that the story of their lives could be found by anyone at any time and even tend to “think of themselves as having an audience'.”
There is little doubt that networks which arise from all sorts of digital and social media, which combine content with people and context, are becoming prevalent and here to stay. If anything, such corpora will become even more abundant and easier to access. Therefore, there is a clear need for methods and techniques to analyze, navigate and search them. We focus specifically on networks where the context of each content item is a set of direct neighbors. Such social networks arise from pair-wise communications, such as email, instant messaging (IM) or mobile text messaging (SMS).
Typical approaches to analyzing such networks focus on either the content (e.g., list of words or terms) or on the pair-wise associations, in isolation. Furthermore, time is usually ignored in the analysis.
On one hand, content-based analysis such as latent semantic indexing analyzes the relationship between documents and terms by identifying hidden concepts related to documents and terms. On the other hand, social network analysis, such as graph partitioning, tries to identify communities among people based only on links among individuals. In the past, many techniques considered these two aspects in isolation. In this case, ad-hoc post-processing has to be done in order to glue the results from each aspect (content- and network based), which creates overhead in terms of both performance and quality. The main reason is that in many traditional settings, content is associated with nodes in the graph (e.g., words in a web page) rather than with edges (e.g., the words used in all emails between two specific individuals).
Content analysis methods, such as latent semantic indexing, examine the relationship between documents and terms and identify hidden term concepts. On the other side, social network analysis methods, such as graph partitioning, try to identify communities among people. Content and network analysis have been traditionally treated separately, even though most applications generate content-based networks. Moreover, the network and content aspects are rarely independent (e.g., I may talk with co-workers about work-related topics, while I talk with friends about entertainment-related topics). As a result, ad-hoc post-processing is necessary to glue the results from two different sources (content and network), which causes deterioration of both performance and quality.
In addition to the content and network dimensions, most applications also include a time dimension. The set of social contacts, the set of topics as well as the inter-relationship among them all evolve over time: new groups are formed, old groups or contacts may lose their strength, new topics may emerge whereas others may become less important, or the strength of the relationship among a social group and the topics it discusses may change over time. A limited number of methods exist which analyze either social network evolution or content evolution in isolation. However, methods for joint analysis and summarization over time do not exist.