1. Field of the Invention
The present invention relates to speech recognition technology and in particular to a system and method for recognizing multiple languages in a single speech signal.
2. Description of the Related Art
Currently, the main methods of recognition of a multi-lingual speech signal are described as follows. A recognition system constructed by several independent uni-lingual speech recognition subsystems must select a language desired by users or computers in advance and designate a uni-lingual speech recognition subsystem to recognize speech signals. Obviously, the mentioned method only can deal with one language at one time, being unable to handle various languages simultaneously. Strictly speaking, although the mentioned method includes different speech recognition subsystems, it does not provide multi-lingual speech recognition functionality.
A second method utilizes one language to simulate other languages. That is, the phonetic transcriptions of one main language are utilized to simulate the pronunciation of other languages. For example, if Chinese is selected as the main language, then phonetic transcriptions of Chinese will be used to simulate other languages, such as English or Japanese. As an example, “DVD” in English might be simulated by “dil bil dil” in Chinese. The second method can partially resolve multi-lingual speech recognition problems. However, one difficulty of the second method is that many parts of speech cannot be simulated. Thus, an incomplete simulation may affect the whole recognition result. To give an example, the “V” in English cannot be simulated properly by Chinese phonetic transcriptions, whereby improper simulation will affect the whole recognition result.
The third method utilizes global phonemes to label the speech of all languages and then refers to a decision tree to classify and recognize the labeled speech. The third method can avoid the mentioned incomplete simulation problem, however, if there is a large amount of vocabulary, interference among different languages will be significant, degrading the recognition result.