The present invention relates to a computer apparatus, a computer program and a method, the three of which are for calculating the importance of an electronic document on a computer network, and particularly relates to a computer apparatus, a computer program and a method, the three of which are for calculating the importance of a first electronic document based on comments on the first electronic document included in a second document which is associated with the first electronic document targeted for calculating the importance.
Various techniques have been developed for finding something interesting to a network user with a high degree of accuracy in a short time from among a large number of electronic documents included in discussion threads, web pages, blogs and the like, which are scattered on the network. An electronic document which agrees with the interests of the user is highly important to him/her.
As one of techniques for automatically judging the importance of a web page on the computer network, PageRank of Google is well-known. The details are described in Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd, “The PageRank Citation Ranking: Bringing Order to the Web”, 1998.
In short, PageRank is a technique for judging the importance of a group of web pages on the network on the basis of the recursive relation where a web page linked from a larger number of higher-quality web pages is also a higher-quality web page. Specifically, computing the importance of a certain web page is based on the number of links from other web pages to the certain web page, the existence of a link to the certain web page provided by a highly recommended web page, and the number of links on a page having a link to the certain web page.
In other words, PageRank is a technique for calculating the relative importance of each web page by use of links between web pages.
PageRank does not provide functions to analyze the contents of a first web page, and to calculate the importance of a second web page based on comments on the second web page included in the contents of the first web page.
Additionally, in recent years, attempts have been made to analyze web postings and discussions in consumer-generated media such as a social networking service or a blog, to specify potential needs and senses of values of consumers, and to make use of the specified needs and the like for developing a new product and marketing.
“Influence Diffusion Model in Text-Based Communication,” Journal of the Japanese Society for Artificial Intelligence (2002), no. 3 vol. 17 SP-B, pp. 259-267, discloses a method for measuring to what degree a specific comment on an electronic bulletin board is quoted in the following replies to the comment by text analysis, and thereby for calculating the degree of influence of the specific comment over other comments.
“Mining and Summarizing Conversation Data on Electrical Message Boards,” the 16th Annual Convention of the Japanese Society for Artificial Intelligence (2002), discloses a method for calculating, for each posting on an electronic bulletin board, the importance of a posting based on three indexes of (1) how much the contents of a current posting is related to a topic in a previous posting to which the current posting is made for a reply, (2) how many new words are used, (3) how many postings exist after a topic is provided as new information in a posting until the topic is cited as old information in a later posting.
However, these documents do not describe a method for analyzing comments on the contents of another posting included in the contents of each posting, for example, agreeing or disagreeing comments, and thereby for determining the importance of the contents of each posting.
Hironori Tomobe and Katashi Nagao (2005), “Discussion Mining: gijiroku shuugou karano chishiki hakken (Discussion Mining: Knowledge Discovery from Sets of Minutes),” the 67th Annual Convention of the Information Processing Society of Japan, discloses a method of calculating the importance of a remark by use of active propagation, based on a notion that a remark linked from an important remark and a remark linked to an important remark in a collection of minutes are important.
In other words, the document discloses a method of analyzing minutes from the aspect of a network configuration, which does not include analyzing the contents of each remark to thereby calculate the importance of each remark.
A reputation analysis solution disclosed in IBM Japan Ltd., Jul. 26, 2004, “Homepagejyono hyouban wo shunjini bunseki (Instantaneous Analysis of ‘Reputation’ on Web site),” relates to a technique of instantaneously classifying customers' comments sent to a company into a “favorable” one and an “unfavorable” one by applying IBM (registered trademark) TAKMI (Text Analysis and Knowledge Mining).
However, this technique does not include calculating the importance of each message included in a discussion thread on a network. Therefore, no method is disclosed or suggested, for calculating the importance of a certain message by use of an analysis of the contents of another thread responding to the certain message.
As described above, according to the conventional techniques, it is not possible to analyze the contents of each message in a chain of messages responding to the previously posted messages such as a discussion thread on a computer network, and to automatically determine the importance of each message on the basis of a comment on the message included in another message, for example, whether the comment is critical (a negative one) or is agreeing (a positive one), and the like.