The present disclosure relates to a search apparatus, a search method, and a program, and particularly to a search apparatus, a search method, and a program capable of obtaining flexible search results in sound search.
As a sound search method which uses sound input by a user for searching a word string such as a set of text corresponding to the sound, a method using only a sound recognition apparatus is exemplified (see Japanese Unexamined Patent Application Publication No. 2001-242884, for example).
In the sound search using only a sound recognition apparatus, sound recognition is performed on input sound while word (vocabulary) sequences registered in advance in a dictionary are used as targets of a sound recognition result in a sound recognition apparatus, and the sound recognition result is output as a search result word string which is the result of the search for the word string corresponding to the input sound.
Accordingly, since the word string as the target of the search result for the word string corresponding to the input sound (hereinafter, also referred to as a search result target word string) is limited to the word string (including one word in this specification) as the sequence of words registered in a dictionary, which is the target of the sound recognition result, in the sound search using only a sound recognition apparatus, utterances of a user are restricted to the sequences of words registered in the dictionary used for sound recognition.
Thus, a sound search method called voice search has been proposed in recent years.
According to the voice search, a language model such as N-gram or the like is used to perform sequential sound recognition, and matching between the sound recognition result and the sets of text registered in a DB (database) separately prepared in addition to a dictionary to be used for sound recognition (search for sets of text corresponding to a sound recognition result from the sets of text registered in the DB) is performed.
Then, the highest ranking set of text or the highest ranking N sets of text which match the sound recognition result are output as search result word strings based on the matching result.
According to the voice search, since the sets of text registered in the DB separately prepared in addition to the dictionary to be used for sound recognition become search result target word strings, it is possible to perform sound search by registering many sets of text in the DB and using the many sets of text as the search result target word strings.
That is, according to the voice search, it is possible to perform sound search with precision to some extent within the range of sets of text registered in the DB as the search result target word strings even for the utterance of a user including words other than the words registered in the dictionary used for sound recognition.
In addition, a method has been proposed in which sound recognition is performed on a multimedia file storing sound and images to generate sets of text for indexing from the sound in the multimedia file and the sound in the multimedia file is searched based on speaker specification (see Japanese Unexamined Patent Application Publication No. 2000-348064).