The present invention relates to an apparatus, a method, a recording medium and a program which extract a keyword automatically from title character string information and detailed character string information of contents such as EPG (Electronic Program Guide) information.
In a digital television broadcast which has got into full swing in recent years, EPG information including information designating a program title (title character string information), information explaining details of the program (detailed character string information), information designating a genre of the program or the like is transmitted from the broadcast station together with video and audio data of the program. In a television receiver designed to have correspondence with the digital broadcast, it is possible to display an electronic program guide on a screen according to the EPG information.
Further, there is also an analog television broadcast in which such EPG information is transmitted.
In a case when a user searches for a program he wants to watch, he utilizes this electronic program guide so as to search from a title, to search by reading detailed character string information or the like after selecting a rough genre (for example, sport, drama or the like).
However, how to attach a title of a program has infinite variety, so that it is not always easy for a user to perform a search from a title. Also, detailed character string information of a program is described in the form of a sentence and it is not rare that a number of pages are covered for this, so that it is troublesome for a user to search from the detailed character string information.
On the other hand, it is very easy for a user to search if a program search is made possible, for example, by using a keyword of a name of a professional entertainer or the like. However, a keyword is not included independently in EPG information transmitted from the broadcast station at present. Therefore, it is necessary to extract a keyword from the EPG information in order to make the search using a keyword possible.
Heretofore, there existed a method as an extraction method of the keyword in which a user appoints head and tail end words of a character string which are desired to be determined as a keyword within a sentence of detailed character string information in an electronic program guide displayed on a television receiver by means of a cursor or the like.
However, according to this conventional extraction method, a user himself should perform an operation for appointing a keyword, so that it is complicated and at the same time, it is difficult to extract a large number of keywords in a short period of time.
On the other hand, a method called Japanese language morphological analysis is known as a general automatic keyword extraction method. However, according to this method, a program size and/or a dictionary size to be used become very large and at the same time, the CPU is to be subject to a large amount of load. Consequently, it is extremely inefficient to use this method in home electric appliances such as a television receiver in which the throughput capacity or the memory capacity of the CPU is not so large.
Further, a method called a character type separation method, is also known as a general automatic keyword extraction method. According to this method, a keyword is to be extracted by detecting differences of character types among Chinese characters, Katakana, Hiragana, letters, numerical characters and the like. However, it is not possible to perform an extraction of a keyword for searching a program accurately only according to this character type separation method. More specifically, with respect to a name of a professional entertainer which has a Chinese character for his family name and Hiragana or Katakana for his first name (for example, such a name as “akari ISHIDA”), it is not possible to extract the whole name, because the family name and the first name are to be separated. Further, it is not possible either to extract a foreigner's name whose first name is written in letters while the family name is written in Katakana or a foreigner's name in which “•” (midpoint) is inserted between his first and family names (for example, such a name as “B •Dooley”), because the family name and the first name are to be separated.
In view of the aforementioned aspect, the present invention was done according to a problem in which it becomes possible for a user to extract a keyword for searching the contents automatically, efficiently and moreover accurately from the title character string information and detailed character string information of contents such as EPG information even in home electric appliances in which the throughput capacity or the memory capacity of the CPU is not so large.