The recent popularization of personal computers among the general public has been accompanied by widespread use of full-text search systems over the Internet, such as Yahoo! (registered trademark) and Google (trademark). A typical full-text search system prompts a user to designate a keyword that is used by the system as a basis for searching the entire Internet or a designated range thereof for anything applicable to the designated keyword.
The full-text search scheme is also applied to broadcast content search systems that utilize EPG (Electronic Program Guide) data. EPG data is made up of content information such as broadcast content titles and introductions to content details. A full-text search by a typical broadcast content search system is illustratively aimed at searching the EPG data for content information about the broadcast content provided by the broadcast stations located in a user-designated area over a predetermined period of time starting from the day of search.
However, the total number of content tiles included in the PEG data applicable to the keyword search is typically as small as 4,000. Only 40 to 80 percent of the available broadcast content is covered by the content information. In addition, how much content information exists about the broadcast content varies according to genre. For example, large percentages of content information exist with respect to dramas, sports and movies; whereas only limited percentages of content information are available about documentaries.
The content information included in the PEG data is most often expressed in text form. Furthermore, each of the broadcast content titles in the EPG data is about 20 characters long in text, and the introductions to these content titles are less than 100 characters long in text each. That is, the numbers of the characters making up the content information (i.e., text length) are appreciably limited.
Because the EPG data is primarily constituted by article and stories derived from newspapers and magazines and turned into electronic form, large portions of the data have been semantically compressed for space reasons. (Illustratively, a “five-minute cooking session” is abbreviated to “cooking”). The semantic compression and abbreviation lead to numerous homonyms and acronyms being produced in the EPG data. When the user enters a keyword for a search, a large number of homonyms can thus be encountered in the content information in EPG.
Conventionally, as shown in FIG. 1, a typical broadcast content search system presupposes that a search keyword space 3 designated by the user with a search keyword 1 is identical to the search keyword 1 (i.e., search keyword 1=search keyword space 3). Since an EPG data space 4 (an aggregate of EPG data subject to searches) is not very large, the range of content titles actually retrieved by a search with the keyword 1 can be smaller than the user's expected space 2 which is an aggregate of search results hoped for by the user. That is, the number of broadcast content titles retrieved using the search keyword 1 can be significantly smaller than the number of broadcast content titles expected by the user. This state of affairs has obviously been disappointing to the user.