1. Field of the Invention
The present invention relates generally to the field of Nature Language Processing. More specifically, the present invention relates to a device and method for language model switching and adaptation.
2. Description of the Prior Art
Language model technology is one of the key constituents in Natural Language Processing. It is widely used in many fields, such as Speech Recognition (SR), Optical Character Recognition (OCR), Predictive Text Entry (PTE, often for none English language and foil sentence text entry, is often called Sentence-Level Input Method) and etc. Generally, a language model is often used to estimate the probability of a sentence. For example, in speech recognition, the acoustic recognizer gives the acoustic hypothesis sequence, which could generate different sentence candidates. And then, each candidate sentence is scored by the language model, and the one with the highest score is considered to be the best candidates. Similarly, text entry for such none-English languages as Chinese or for such 10-button devices as mobile phone is difficult, because the user need to input a code sequence and choose the desired candidate from a long list. Language model can help to choose the desired candidates automatically, for example, digit sequence “4663” on a mobile phone corresponds to three English word candidates “good/home/gone”, if the previous word is “go”, the language model can automatically predict “home” to be the first candidate. In one word, a language model can be used to choose candidates when language model-related ambiguity occurs.
But the performance of a language model is quite domain-dependent. If a language-model-based application works in a domain different from the training field, the performance will degrade dramatically. To solve this problem, the language model should foe modified when the domain changes, but if the application needs to switch between many distinct domains frequently, the performance cannot be benefited from the model modification, or the modification even makes the model unusable. This phenomenon will also be explained in the coming sections.
As mentioned above, the general problem in language modeling is the domain-dependent problem. If the destination application works in a fixed domain, this problem may not seem remarkable, but if the application is used among many domains which are quite different from each other, this problem will restrict the language model performance.
General speaking, there are two popular methods for solving the domain-dependent problem. The first one is language model adaptation (LMA), and the second one is language model switching (LMS). Both of them try to enhance the model according to the information provided by the recent input data, such as the input text generated by the input method.
The traditional language model adaptation supposes that the current topic is local stationary, that is, the domain is unchanged through out the procedure of the usage of the language model. Therefore, the recent output text can be used to modify the model so that it will work better in the following procedure. The most popular measure is to establish a cache model using the recent text, and combines the general model with the cache model using interpolation. In some cases, such as the speech recognition for a long document, or the OCR for a long printed document, this method works well.
The traditional language model switching method also supposes that the current topic is local stationary. While in these cases the recent text stream is tar from enough to enhance the language model; instead, the recent text stream is used to judge the current topic, and select a pre-established appropriate model for the current topic.
Because the traditional methods only use the recent text stream for language model adaptation and switching, we call them text-stream-based language model adaptation/switching methods.
As mentioned above, the text-stream-based LMA/LMS methods both suppose that the current topic is local stationary, so the recent text stream can be used to enhance the model. Actually, this suppose is not always satisfied. In some cases, the amount of text stream is too small to be used in language model adaptation, and this text is almost helpless for language model adaptation. In some other cases, the language model applications can switch from a context to another context frequently without providing any text stream, that is, the local stationary property is destroyed. Therefore, neither the language model adaptation nor language model switching method works well.
Particularly, the only thing the text-stream-based methods can use is the recent text stream. Because of the topic's non-stationary nature, the language model adaptation or switching can foe misled. Moreover, when the application is running, the domain of the current application can switch among many fields. Current existing solutions deal with this problem by using the recent text stream to modify the model, or to select a model. Obviously, if the domain-switching is very frequently, the model will be modified dramatically, or the domain changes as soon as the new model is just selected. This will lead to a serious consequence that the previous measure is not consistent with the following input requests. It will impair the model performance rather than bringing improvement.
Take the current widely used Chinese input methods for example, they only know that the current edit field needs to fill in a text string, and they do not care what preference the current application or the current field has. Actually, if the user is filling in an item in a contact manager, edit fields like name, address, position, hobbies and telephone number are necessary. Obviously, these fields are quite different from each other, and the information adapted from the name input can not improve the address input, even more it can mislead the address input, in this case, the text-stream-based methods do not work at all.
Take the sentence level input method for 10-button mobile phones for another example. When the user inputs a short message, the domain is a short message conversation. When the user fills in the name field of the address, book, the domain is name. When the user surges Internet via smart phone, he/she need to fill in the address bar of the browser with a Internet URL, and when the user dials his/her friends, the input domain is telephone number. Similarly, the text-stream-based methods do not help in this case either.
If a speech recognition system replaces the input method in these two examples, the status is similar.
In a word, we can find that the pure text-stream-based methods do not offer an effective mechanism to identify which domain the language model is currently used for (or there is no such fix domain at all) in the above context-sensitive cases, and they do not have an effective method to deal with the domain-dependence problem when a LMB engine application switches among many domains frequently. Furthermore, since the domain detection is inaccurate, the model adaptation is conducted hit or miss.