1. Field of the Invention
The present invention relates to a speech recognition device and a method thereof, and more particular, to a device using a Chinese word description to recognize a Chinese word and a method thereof.
2. Description of Related Art
Along with the progress of electronic technology, and prevailing of wireless communication and Internet, portable devices with “light, thin, short, and small” design have gradually become a new-generation platform for accessing information. However, not all the devices are provided with input/output devices such as screens, keyboards, or mousse that people are familiar with. Therefore, in the future, human-machine interface between human beings and smart equipments may be controlled by voice, which is the most natural and convenient way for making communications. Furthermore, in daily life, more and more multimedia audio and visual information may be acquired, and if the voice information can be converted into words while the multimedia audio and visual information is played, users can quickly grasp the theme and ideas conveyed therein. However, the accuracy in speech recognition is quite important no matter in terms of voice control or conversion of voice into words.
A conventional Chinese speech recognizer substantially includes a front-end processor, a lexicon database, an acoustic model, and a language model. When a voice signal is received, the front-end processor retrieves a voice frame of the voice signal, and obtains a feature helpful for speech recognition from the voice frame, e.g., Mel-frequency cepstral coefficient (MFCC). The acoustic model is generally a hidden Markov model (HMM) taking phoneme, syllable, or word as a unit, for comparing the above feature with the established acoustic model to determine the sound of the voice frame of the voice signal. Then, some Chinese words probably corresponding to this sound are searched from the lexicon database in a way similar as looking up a dictionary. In the meanwhile, the language model determines which one of the searched Chinese words is the most proper one in the sentence through probability and statistics. In this manner, the Chinese words corresponding to the voice signal are recognized.
U.S. Pat. No. 6,163,767 has disclosed a speech recognition method and system for recognizing an isolated or un-correlated Chinese character. FIG. 1 is a schematic view of a conventional speech recognition system. Referring to FIG. 1, the speech recognition system includes a speech recognizer 110 based on the Chinese character description, a grammar analyzer 120 based on the Chinese character description, and a Chinese character generator 130. The speech recognizer 110 differs from the conventional speech recognizer in that, the language model of the speech recognizer 110 is further provided with one language model based on the Chinese character description.
As disclosed in this patent, the syntax rules of the Chinese character description are established in the language model. When the speech recognizer 110 receives a Chinese character description, e.g. “tai2 tou2 de5 tai2 ()”, and recognizes the Chinese characters included in the Chinese character description one by one, the language model based on the Chinese character description compares and determines the syntax rules of the recognized Chinese character description, e.g. “tai2 tou2 de5 tai2 ()” belongs to a syntax rule of “a Chinese word+de5 ()+a Chinese character”, thereby recognizing the input Chinese character is “tai2 ()”.
In Chinese, a word is composed of at least one Chinese character, a sentence is composed of at least one word, and a paragraph is composed of at least one sentence. If the Chinese words or sentences input by the user are recognized one by one in Chinese characters according to the above patent, the time spent for recognizing is rather long. For example, when a Chinese word “yang2 ming2 shan1 ()” is inputted, “tai4 yang2 de5 yang2 ()”, “ming2 tian1 de5 ming2 ()”, and “gao1 shan1 de5 shan1 ()” are taken as the Chinese character descriptions for recognizing the correct Chinese characters.
In addition, in the field of Chinese speech recognition technique, the recognition of an isolated word is quite important. Generally, all the words are collected to build a lexicon for recognition, but the larger the lexicon is, the higher ambiguity it may cause. Since the recognition of the isolated word is not made based on the context, the isolated words with similar pronunciations, such as “da4 dao4 ()”, “da4 dao4 ()”, and “da3 dao3 ()”, or the isolated words with a short word length may easily result in recognition errors.