Recently, a vast amount of various kinds of media data such as video, audio, image, and text data have come to circulate via various media, and the importance in searching techniques for finding those efficiently has become increased. When searching media data in general, a search query and media data are compared directly or the search query and meta data showing contents of the media data are compared to calculate similarity between the search query and the media data, and the media data similar to the search query is taken as a result of the search.
As examples of a case where a search query and media data are directly compared for searching the media data, there are a document search using a text query, a similar image search using an image query, and the like. In the meantime, when it is difficult to directly compare the search query with the media data, meta data added to the media data is utilized. “Meta data” is data that shows the contents of the media data. Even though the meta data may be created manually, it is desirable to be created automatically from the media data when a vast amount of data is the search target. For example, through putting spoken contents in the video/speech data into a text by speech recognition or putting images and character information in video/image data into a text by image recognition and letter recognition, it is possible to give the meta data to the video/audio/image data. This makes it possible to search the video/audio/image data also with the text query.
However, with the searches described above, the search accuracy becomes deteriorated if there is an error in the media data itself or in the meta data. For example, a mistyping in a document or a noise contained in image data can be considered an error in the media data itself. Because of those errors, the similarity between the search query and the document or the image cannot be calculated correctly. Thus, the accuracy of the document search and the similar image search becomes deteriorated. Further, when meta data is created by speech recognition or image recognition, errors are to be included in the meta data anyhow. Therefore, the similarity between the search query and the meta data cannot be calculated correctly, so that the search accuracy becomes deteriorated.
Now, an information search device depicted in Patent Document 1 will be described as an example of techniques related to coping with errors in meta data. This related information search device is a device for lightening deterioration in the search accuracy caused by the errors in the meta data generated due to misrecognition, when the meta data is created from video/audio data by speech recognition. As shown in FIG. 15, this related information search device 600 is configured with an input device 601, a speech recognition device 602, an expansion key extracting device 603, an expansion word extraction device 604, a related information search device 605, an external database 606, a speech document description creating device 607, and an output device 608.
The related information search device 600 operates as follows. That is, spoken contents of audio data inputted from the input device 601 are put into a text by the speech recognition device 602. The expansion key extracting device 603 extracts a predetermined part of speech and words that satisfy the condition of reliability from a recognition result text. The related information search device 605 searches the related text stored in the external database 606 by using the extracted expansion key. The expansion word extracting device 604 extracts important words from the searched related document as expansion words. The speech document description creating device 607 embeds the extracted expansion words to the recognition result text, and the output device 608 outputs it.
Flows of this operation will be described by referring to an example.
Considered is a case where the actual speech inputted from the input device 601 is “Hokkaido has been in a heavy snowfall because of a cold snap hitting there since last night, and schedules of public transportations such as flight services leaving from Shin-chitose airport were greatly disturbed”, but the recognition result by the speech recognition device 602 turns out as “Hokkaido has been in a heavy snowfall because of a cold snap hitting there since the last sight, and schedules of public transportations such as flight services leaving from Shinchi-tosei airport were greedy disturbed”.
In this case, the expansion key extracting device 603 extracts “cold snap, hit, Hokkaido, public transportations, disturb”, for example, as highly reliable words that are nouns and verbs from the recognition result. The related information search device 605 searches the external database 606 by using those expansion keys. It is assumed that the expansion word extracting device 604 has extracted “Shin-chitose airport, cancelled flights, Hokkaido, cold snap, public transportations” as important words from the searched related document. The speech document description creating device 607 embeds those expansion words to the recognition result text, and the output device 608 outputs the recognition result text to which the expansion words are embedded as the meta text of the inputted speech data. With this, even when search is conducted with a text query “Shin-chitose airport” that is missed out from the recognition result because of misrecognition, for example, this speech data can be searched correctly since the similarity between the text query and the meta text can be increased because “Shin-chitose airport” is added to the meta text.    Patent Document 1: Japanese Unexamined Patent Publication 2004-246824
An issue of the related information search device is that given speech data may become searched with a query that is irrelevant to that speech data.
The reason is that there exits a text that is irrelevant to the meta text of the given data as a result of having a speech recognition error, so that it sometimes happens that the similarity between the speech data and the meta text is judged as large even with the query that is irrelevant to the speech data.
In the above-described case, there are wrong texts “tosei” and “greedy” in the meta text as the result of the speech recognition error. Therefore, the similarity between the query and the texts becomes large even when the search is conducted by using the query such as “tosei” or “greedy”, which is irrelevant to the speech data. Therefore, this speech data is retrieved even though it is irrelevant to those queries.
An object of the present invention is to provide a similarity calculation device and the like, which can show small similarity between media data and a query when the media data and the query are irrelevant, even if some kinds of errors are contained in the media data or the meta data.