1. Field of the Invention
The present invention relates to an apparatus, method, and medium for speech recognition, and more particularly to an apparatus, method, and medium for dialogue speech recognition using topic domain detection that can detect a dialogue topic of a speaker during the dialogue and employ a topic-based language model, thereby improving performance of dialogue speech recognition.
2. Description of the Related Art
Speech recognition technology is used to recognize or understand what people are saying by analyzing speech via a computer. Speech is converted into an electrical signal, and frequency characteristics of the voice signal are extracted from the electrical signal based on the fact that speech has a specific frequency depending on the shape of the mouth shape and the position of the tongue, thereby recognizing the pronunciation. Recently, the speech recognition technology has been extensively used in various applications, such as phone dialing, language studies, and control of toys and household electrical appliances.
In general, a continuous speech recognition apparatus has a structure as shown in FIG. 1. FIG. 1 is a schematic view illustrating a structure of a conventional continuous speech recognition apparatus. Referring to FIG. 1, a feature-extraction module 10 converts a voice input into the speech recognition apparatus into a feature vector by extracting information useful for speech recognition from the voice. A search module 20 searches for a word lattice having the highest probability from the feature vector by using a viterbi algorithm with reference to an acoustic module database (DB) 40, a pronunciation dictionary DB 50, and a language module DB 60, which have already been obtained through the learning process. For the purpose of large vocabulary recognition, vocabularies subject to recognition are provided in the form of a tree. Thus, the search module 20 searches for the vocabulary tree. A post-processing module 30 removes phonetic signs and tags from the search result, and performs a gather write in a syllabic unit, thereby providing text as a final recognition result.
The above conventional continuous speech recognition apparatus employs an acoustic module DB 40, a pronunciation dictionary DB 50, and a language module DB 60 for the purpose of speech recognition, and the language module DB 60 consists of frequency data of words established in a study text DB and probability data, which are probabilities of a Bigram or a Trigram operated by using the frequency data. The Bigram expresses a word lattice consisting of two words, and the Trigram expresses a word lattice consisting of three words.
When a topic domain of a speaker is changed, a previous language model may not perform its own functions. Thus, a new language model must be established corresponding to the change of the topic domain of the speaker. For instance, words used in the topic domain for a weather forecast have rules and features different from those of words used in the topic domain for travel. Accordingly, if a read speech language model suitable for weather forecast speech recognition is used for travel-related speech recognition, which requires a conventional speech language model, the performance of the travel-related speech recognition may be degraded. That is, the language model dedicated to a specific topic domain may degrade the performance of speech recognition if the topic domain is changed.
In order to solve the above problem, a language model used for various topic domains, rather than one topic domain, has been suggested. Such a language model includes a global language model, a parallel language model, and a topic dependency language model. The global language model can reduce consumed resources because only one language model is established. However, complexity of the language model is increased, so accuracy of speech recognition may be degraded. In addition, although the parallel language model can reduce complexity and search time, it uses many resources and an optimum result must be selected.
For this reason, a topic dependency language model is preferably used because it can reduce the complexity of the language model, the search time, and the amount of consumed resources. In addition, it is necessary to provide an apparatus, a method, and a medium that is capable of enhancing the efficiency of speech recognition by improving the performance for topic domain detection and language model conversion.