The present invention relates to an intelligent Mandarin speech input method and an intelligent Mandarin dictation machine. The present invention is the improvement of R.O.C. patent application 82106686 filed by the applicant of the present invention. More accurate and convenient input of Chinese characters can be realized by means of Mandarin speech input in accordance with the present invention.
Today, the methods for entering Chinese characters into computers are numerous, for instance, those based on phonetic symbols, character radicals or strokes thereof. However, none of these methods have been recognized as the best, since no input method is really convenient to the users. Some input methods may be relatively slow, while other methods require special training, or require recall of numerous rules for character radicals which may be forgotten as a result of infrequent use. For instance, the "phonetic symbol input method" is readily practiced by everyone without substantial training, but this input method is still not popular due to its relatively slow speed. Currently, the fastest input methods for Chinese characters are probably the so-called Tsang-jiieh method, Ta-yi method or other similar character radical methods, but these methods can only be used by professionals who must have received long-term training. The inconvenience of these methods basically results from the fact that each of the Chinese characters has been irregularly translated into several radicals represented by key strokes of the typical English keyboard. Actually, the English keyboard was initially designed for alphabetic languages such as English and it is, therefore, inconvenient for non-alphabetic Chinese characters to be entered.
A possible method is to enter Chinese characters by means of speech, which has long been proposed. However, because the method of inputting Chinese characters by speech encounters some critical technical problems that are almost not solvable, almost no such method has been commercialized. The major technical problems thereof are:
(1) The necessary vocabulary for the Chinese language is too large for speech recognition technology. The number of Chinese characters commonly used are at least 5,000 and the number of commonly used Chinese words are at least 100,000 (including all mono-character and poly-character words) that are beyond the feasibility of available technology;
(2) Too many homonym characters and words exist in the Chinese language that may not be easily distinguished even if the pronunciation had been correctly recognized; and
(3) It is difficult to translate Mandarin speech into Chinese characters in real-time, using low-cost devices, because the computation described in problems (1) and (2) can not be carried out in a very short period of time.
U.S. patent application Ser. No. 08/352,587 filed by the same applicant of the present invention can substantially mitigate the problems described above. The cited patent is incorporated herein by reference. The main contents of the patent is as follows:
(1) Mandarin mono-syllables are chosen as the acoustic units for recognition. Although the number of Chinese characters and words are huge, the number of different mono-syllables is limited to about 1,300 which can be realized by present speech recognition technology. The recognized syllables along with their preceding and following syllables together with some linguistic information can be used to decode the corresponding words and sentences constructed therefrom.
(2) Chinese language models can be established by means of the Markov Models based on Chinese text corpus. Numerous training texts are used to measure the probabilities with respect to each of the available characters preceded or followed by one or more other characters. These probabilities can be utilized to determine that if a particular syllable is preceded or followed by one or more other syllables, which character is most likely represented by the syllables in question. This method can solve most homonym problems, while the erroneous homonyms can then be manually corrected on the screen.
Based on the structure of the cited patent, the present invention has further developed two improved techniques.
(1) Use sub-syllable units as the acoustic units to generate "Hidden Markov Models" through special training algorithms such as an "Interpolation Training Algorithm", where the sub-syllable unit is an acoustic unit smaller than the syllable. Examples of sub-syllable units are the "initial" of a Mandarin syllable (the initial consonant), the "final" of a Mandarin syllable (the vowel or diphthong part including possible medials or nasal ending), and the phoneme such as a consonant and a vowel. These "Hidden Markov Models" along with "Tone Models", which deal with the characteristics of tone variation in Mandarin speech, "Search Algorithms and Pattern Matching Algorithms for Continuous Speech" are utilized to carry out improved recognition for Mandarin mono-syllables. In this way, the recognition technique can not only effectively recognize the "isolated mono-syllables", but also the "mono-syllables in continuous speech" in an accurate manner. The input speech of the user will not be limited to a sequence of "isolated mono-characters (mono-syllables). On the contrary, the input speech can also be "isolated words" (but the syllables in the poly-character words are continuous), "isolated prosodic segments" (the prosodic segment comprised of one or more words is the segment that is automatically segmented by the speaker to make a pause during his speech, where the syllables in the prosodic segment are continuous), or even the "whole sentence of continuous Mandarin speech".
(2) Based on a large amount of Chinese texts, calculate the probabilities with respect to the character (or word) adjacent to another character (or word) and the probabilities with respect to the character (or word) being present with another character (or word) in the same sentence. An improved "Chinese Language Model" can be constructed in accordance with the above probability information and the linguistic information or rules derived from wording and grammar analysis in the Chinese language. The improved "Chinese Language Model" augmented with an efficient search algorithm can be used to quickly distinguish the correct homonym character among all possible Mandarin mono-syllable candidates.
Both techniques are developed in view of the characteristics of Mandarin Chinese. These two techniques, when used together, can accurately recognize the "Chinese characters represented by continuous speech", so that users can conveniently and naturally enter speech in various formats. The required amount of computation will not be substantially increased, while the correct recognition rate may remain unchanged or even be improved. All techniques can be implemented by means of software which is easily incorporated into a computer or a DSP (Digital Signal Processing) board provided with a DSP chip (since such computers, chips and boards are available in the market, it is easy to develop various products through different computers, boards or chips). If the computation speed of the computer or the chip is fast enough and the memory space of the computer or the board is large enough, real-time input can be ensured. Such a board can be plugged into the slot of any AT (or above) personal computer. Therefore, it is very convenient for the users, and the cost can be dramatically reduced. Based on the fundamental techniques and features, the present invention further develops several "Intelligent Learning Techniques" to provide the dictation of the present invention with intelligence which can "learn" if taught. These techniques include: automatic learning of a user's voice so that new users can use the machine quickly; automatic learning of the user's environmental noise and adapting to such noise; and continuous on-line learning of the user's voice, special words, wording and sentence styles to increase the correct recognition rate. All these features will be explained in the detailed description of the preferred embodiment hereinafter.