1. Field of the Invention
The present invention relates to a voice chat system, an information processing apparatus, a speech recognition method, a keyword detection method, and a program.
2. Description of the Related Art
Speech recognition has been studied for a long time, and it has come to be able to provide a quite highly accurate recognition rate regarding speech that is read. However, it is still difficult to provide high performance in the recognition of natural conversation between humans.
In recent years, earnest research has been performed on a technology for extracting the subject of conversation from speech, which is also known as a topic detection technology. When employing the topic detection technology, a speech recognition unit that extracts text information from the speech plays an important role.
As a method for extracting keywords from a speech, there are known a method that extracts keywords from a speech only paying attention to the keywords and a method that recognizes the entire speech using large-vocabulary speech recognition and then extracts keywords from a recognition result. Specifically, the former method uses a method of extracting a word sequence, for example, from a phoneme lattice, a lattice of phonemes of which the sequence was recognizable. Meanwhile, the latter method uses LVCSR (large-vocabulary continuous speech recognition). If the number of keywords is great, the latter method is advantageous because of its computational efficiency. In any of the methods, it is necessary to have linguistic understanding on the vocabularies to be recognized, and this can be solved by using information on the frequency of occurrence of the vocabularies to be detected.
The speech recognition can be classified into isolated word recognition that recognizes a isolated word and continuous word recognition that recognizes a word sequence composed of plural words. The continuous word recognition uses a language model, “a database storing the likelihood of linkage between words,” to thereby prevent “a word sequence having similar sound but totally different meaning” from being output as a recognition result.
However, the language model describes only the information of the words that are originally recognizable (hereinafter, referred to as known words); therefore, it is difficult to properly recognize words that are registered later (hereinafter, referred to as registered words). On the other hand, in the case of the isolated word recognition, once words are registered in a recognition word dictionary, the words are recognized immediately after the registering. However, in the case of the continuous word recognition, only the registering of words is not sufficient but is necessary to be reflected onto the language model; unfortunately the reflecting onto the language model is generally difficult.
In this respect, an example of the related art, JP-A NO. 2004-252121 discloses a method that classifies registered words into categories such as “personal name” and “place name,” provides a language model corresponding to the categories, and correlates the registered words with the categories using the language model, whereby new vocabularies are recognizable by continuous speech recognition.
Meanwhile, selection of the registered words has large issues. In particular, proper nouns are often important keywords because recognition of the proper nouns allows providing users with useful information.
In this respect, as an example of the related art, JP-A NO. 2002-216026 discloses a method that acquires keywords from the information on the Internet and extracts a keyword from the acquired keywords.
There are however numerous proper nouns; therefore it may be practically difficult to register all the words that users will speak for speech recognition in advance.