1. Field of the Invention
This invention relates to a speech recognition system, and more particularly, to a multilingual speech recognition system, which can recognize speech in various languages.
2. Description of Related Art
In recent years, speech recognition systems has been developed with several advantages, such as the convenient use, reduction of fabrication cost, and so on. Due to those advantages, the speech recognition system is widely applied to all kinds of businesses in various fields of trade. For example, a customer service center usually receives a lot of phone calls from its clients. In most phone calls, the clients often ask the same questions. In this situation, a lot of manpower would be consumed answering the same questions repeatedly. However, if the answering service is done through a speech recognition system with prerecorded speech providing the answers, then the recorded speech can be used to answer the usually standard questions. As a result, the manpower of the customer service center can be used in a more efficient way, so that the personnel cost in the company can be further reduced.
Since there has been a trend toward internationalization and the use of languages in our daily life has been much more diversified, the speech recognition system with an ability to recognize only one language has not been satisfying the needs of the market. On the other hand, the multilingual speech recognition system, which has been able to recognize speech of various languages, has become more valuable in the business market. Currently, the multilingual speech recognition system usually has the following design:
1. Several monolingual speech recognition system units with respect to their own specific languages are assembled into one unit as a multilingual speech recognition system. In this manner, each one of the independent monolingual speech recognition system units should have a fully functional language recognition unit. The speech recognition system usually performs one of two types of methods. One is that a language identification process is performed on the input speech signals, and then one corresponding monolingual speech recognition system is selected to recognize the input speech signals, according to the result of the language identification. The other method is that the speech signals are simultaneously input to all of those monolingual speech recognition system units and each of the monolingual speech recognition system units will recognize the input speech signals and give an estimated score. The one having the highest estimated score is taken as the output of the multilingual speech recognition system.
The foregoing conventional manner of recognizing speech has several disadvantages, including the following:
(a) Since each one of the monolingual speech recognition system units by itself should be one complete speech recognition unit, it will take a great deal of manpower and resources to build the multilingual speech recognition system.
(b) It will reduce the performance of speech recognition if the language identification is performed in advance. That is because if the language identification has got an error, then the recognition error occurred after speech recognition. However, if the language identification is not performed in advance, then the computation load of the speech recognition system would be heavy due to various languages being involved.
2. A language independent acoustic model is needed to be built up so as to be suitable for fitting into various languages. The speech recognition system, which is formed according to the acoustic model, needs not many speech recognition apparatuses, and also needs not a great amount of collection of the language related information for each language. The whole speech recognition system only needs a speech recognition apparatus with language independent capability. However, it is very difficult to perform this method since those various languages all have different properties. It is indeed true that an acoustic model that can be simultaneously suitable for many different languages is vary difficult to set up.
3. A speech recognition system is designed to allow the users themselves to build up new vocabularies. When the users use the new vocabularies or the vocabularies for the other languages, the new vocabularies can be added into the vocabulary acoustic model by the users themselves. The newly added vocabularies and the original vocabularies are used together to perform the speech recognition operation, and then it depends on the score determined by the recognition apparatus and accordingly decides to output the speech language output that has the highest score. In this manner, it has the disadvantages that the acoustic model of the newly added vocabularies is speaker dependent. Each one of the users is required to build up or her own acoustic model. As a result, the convenience for the recognition system is decreased.
In the various applications of the multilingual speech recognition system, since the users very frequently use a great amount of vocabulary for the primary language, but the users only use the vocabularies for languages other than the primary language in a rare situation. For example, an automatic phone inquiry system that uses the primary language at the local area, such as Mandarin in Taiwan, uses Mandarin for the name most of time. Occasionally, the phone inquiry system may state the person's name with English or the local dialect. Therefore, with respect to the applications for the separation between the primary language and the language other than the primary language, if one uses any one of the above three designs, and builds up a very complicated speech recognition system for multilingual use, in which the speech recognition system has equal speech recognition capability for any one of the various languages. In this manner, operation of the speech recognition system consumes resources, since the application for those languages other than the primary language has a need in speech recognition capability much less than the need for the primary language.