(1) Field of the Invention
The present invention relates to a dictionary creation device which creates and updates a dictionary used for searching, classifying, or filtering information written as text.
(2) Description of the Related Art
In recent years, hard disks, Digital Versatile Discs (DVDs), and the like have become widespread due to price reduction, and therefore it has become possible to easily accumulate moving picture information such as TV programs. Furthermore, due to the increased capacity of these hard disks and DVDs, it is possible to accumulate large quantities of moving picture content.
On the other hand, with an electronic program guide of a TV program and the like, it is possible to acquire, as text information, information regarding each TV program. Accordingly, it has become possible to accumulate TV programs in accordance with the tastes of a viewer and classify the accumulated TV programs using the text information. In order to use the text information of the TV program for selection of the TV program and classification of the accumulated TV programs, it is necessary to judge, based on TV program guide information, which keyword expresses characteristics of the TV program. Accordingly, in order to extract an important keyword from the text information of the TV program and exclude unnecessary keywords beforehand, an approach is taken in which a dictionary is built in advance.
As such a dictionary, dictionaries such as the following exist: an extraction dictionary, in which the keywords used for classification, searching, and extraction of the text information are written; an unnecessary word dictionary, which collects unnecessary keywords in order to exclude keywords that are of no use in classifying, searching, or extracting the text information; and so on.
In order to create an extraction dictionary, keywords that actually appear are picked up from sample data of a mass of text information groups that are to be classified, searched, and so on; furthermore, only keywords that are characteristic in the classification and searching of the target text information groups are employed as the extraction dictionary. For example, assuming an Electric Program Guide (EPG) as the target text information, in the case where names of actors and general nouns are considered to be useful in classification and searching, the names of actors and the general nouns appearing in the actual EPG data are extracted; the extraction dictionary is created from these names and nouns.
In the same manner, when creating an unnecessary word dictionary, keywords that are useless in, or, more importantly, keywords that interfere with classification and searching, are extracted from among the keywords that appear in the sample data of the target text information group, and those keywords are employed as the unnecessary word dictionary. For example, in the case where a keyword that appears in most of the EPG data is present, that keyword cannot characterize individual EPG data, and thus can be considered an unnecessary keyword.
As the abovementioned type of approach, a TV program recommendation system has been provided, which sets, in advance, a dictionary regarding a plurality of themes, and uses that theme dictionary to classify and search TV programs (for example, see Patent Reference 1: Japanese Laid-Open Patent Application No. 2002-320159). With such a TV program recommendation system, when the theme is, for example, “travel,” it is possible to search/classify TV programs about “travel” by using a dictionary which has set such characteristic keywords as “inn,” “lodgings,” “wheel window,” “cruise,” and so on. Furthermore, by using operation information from a user to build a profile in accordance with that user's tastes, providing TV programs catering to that user can be realized.
In addition, a digital broadcast reception device is provided which makes the TV program display easier for the user to understand by changing the color of a TV program in a TV program chart depending on a theme (genre) set in advance (for example, see Patent Reference 2: Japanese Laid-Open Patent Application No. 2003-134412).