1. Field of the Invention
The present invention relates to a reputation information processing program, method, and apparatus for extracting reputation information written by users from text documents and the like of web pages on the Internet and analyzing them; and, particularly relates to a reputation information processing program, method and apparatus for extracting evaluation pairs, in which objects and evaluation expressions are combined, as evaluation information from text documents written by users and analyzing them.
2. Description of the Related Arts
Conventionally, there have been several processing methods which belong to the field of data mining used in analysis or marketing by extracting reputation information of commercial products, makers, etc. contained in text documents written by users from webs on the Internet, and, for example, there are the methods listed below.    (1) Methods of searching documents including both objects and evaluation expressions (JP2001-1550212 and JP2005-063242).    (2) Methods of determining evaluation expressions as reputation information of searched words if the evaluation expressions are present within a predetermined distance from the searched words (JP2002-091981 and JP2002-175330).    (3) Methods using patterns of formats of word sequences for extraction (JP2003-271609 and JP2004-1578416).    (4) Methods of extracting reputation information with respect to search words provided by users (JP2001-155021, JP2002-091981, JP2002-175330 and JP2005-063242).
However, such conventional methods of extracting reputation information written by users on the Internet and using it in analysis or marketing have the following problems. The methods of (1) and (2) have problems that the accuracy of extracted reputation information is low since objects and evaluation expressions which accidentally appear in the same document or in the vicinity of the same document are also extracted. In the method of (3), although extraction can be performed by a pattern when an object and reputation appear continuously as shown by underlines in
“Japan is a good place to live”,
they often appear away from each other in actual documents, like
“I live in Japan now, and it is a very good place to live”;
thus, it has a problem that picking-out accuracy is low merely by use of a pattern. The method of (4) has drawbacks that reputation information about objects which are not input by users cannot be obtained, and that comparison between a plurality of objects is difficult. Moreover, as a method for visualization for analysis, merely a method of plotting the number of remarks in simple distribution of remarks or in a temporal sequence has been proposed; however, analysis satisfactory for marketing cannot be performed merely with that. Furthermore, important information for analysis includes attributes. For example, casting, music, and story are provided for an object “movie”; and CPU speed, memory capacity, HDD capacity, etc. are provided for an object “personal computer”. However, such information of attributes is provided merely by manpower, which takes high cost.