Blog analysis refers to a set of technologies of organizing, analyzing and extracting useful information from blogs. Andreas Aschenbrenner et al., “Blog mining in a corporate environment”, September 2005, incorporated as a reference herewith, describes blog analysis technique in detail.
Web mining refers to an application of data mining techniques to discover pattern from websites. The web mining can be divided into three different types: 1) web usage mining; 2) web content mining; and 3) web structure mining. Web usage mining is an application that uses data mining to analyze and discover intersecting patterns of user's usage data on websites. Web content mining is a process to discover useful information from text, image, audio or video data in the websites. Web structure mining is a process of using graph theory to analyze connection structure of websites.
Social networks such as Facebook® and Myspace® include entity relation information such as who is a friend of whom as well as entity property information such as posts, comments and messages posted by bloggers and/or owners of Facebook® pages. While using the social networks, users may be interested in one or more of: which groups of people belongs to a same community and which groups of companies have close partnerships. (An “entity” refers to a user or a company.)
Traditional solutions to obtaining these information (e.g., which groups of people belongs to a same community and which groups of companies have close partnerships) is mostly based on use of graphical and graph theory techniques, i.e. the traditional solutions are casted as graph-partition problems and algorithms such as a minimal cut (the number of edges crossing a cut is minimal). A major drawback of the traditional solutions is that the traditional solutions treat all edges as the same, which usually is not applicable in real applications. For example, in blog analysis, a link between two posts sharing little or no content similarity usually happens when a blogger A is a friend of a blogger B. This type of links (i.e., links indicating friendship) should not be treated the same as links between posts with content similarity, in which case the two bloggers simply discuss same topics without even knowing each other in person. Furthermore, missing links between two posts with content similarity indicates more information (e.g., information indicating that two bloggers do not know each other) than missing links with no content similarity. A missing link refers to a link or edge that represents a relationship (e.g., friendship or partnership) between entities but somehow is unobserved due to privacy issues or data collecting processes (e.g., data mining process).
Hence, it is desirable that a method and/or system perform discovering communities or groups of entities using mathematical techniques that treat edges between entities of different relationships (e.g., content similarity or community similarity (i.e., friendship)) differently.