In recent years, with the realization of computers of high performance, capacity enlargement of a storage medium, and spread of computer networks, large quantities of digitized documents can be circulated and exploited on computer systems on a daily basis. The documents mentioned herein mean documents or the like shared on a network, e.g., news articles, e-mail, and web pages. Further, the documents mentioned herein also mean documents exploited in each business enterprise (e.g., discrepancy information of products, information of queries from customers, and others).
In general, there are needs for finding recent topics gaining attention from news articles or blogs in these documents. Likewise, in business enterprises are growing needs for tracking down recently increasing problems from discrepancy information of products that accumulates on a daily basis and requiring early countermeasures and needs for finding new demands from information of queries from customers and exploiting them for product planning.
In regard to these needs, for example, according to a conventional topic extraction system, scoring is performed based on an appearance frequency with respect to each term included in a document set in a designated span, and extraction and hierarchization of topic words are carried out. Furthermore, according to the conventional topic extraction system, history information of a score of each topic word is held, and a status such as “new arrival” is presented based on a difference from a score obtained at the time of previous extraction.
The above-described conventional topic extraction system is usually not problematic, but there is room for further improvement in this system according to the examination conducted by the present inventor.
For example, the conventional topic extraction system uses a method of presenting a status such as “new arrival” based on history information of a score of each topic word. However, this method is suitable for a usage application of becoming aware of a topic “of the moment” in a fixed-point observation manner, but it is insufficient for a usage application of becoming aware of a transition of topics within a fixed span, e.g., one week or one month.
A problem to be solved by the present invention is to provide a topic extraction apparatus and program that can present a transition of topics in a designated target span.