Language specific word processing software has existed for many years. More sophisticated word processors offer users advanced tools, such as spelling and grammar correction, to assist in drafting documents. Many word processors, for example, can identify words that are misspelled or sentence structures that are grammatically incorrect and, in some cases, automatically correct the identified errors.
Generally speaking, there are two causes for errors being introduced into a text. One cause is that the user simply does not know the correct spelling or sentence structure. Word processors can offer suggestions to aid the user in choosing a correct spelling or phraseology. The second and more typical cause of errors is that the user incorrectly enters the words or sentences into the computer, even though he/she knew the correct spelling or grammatical construction. In such situations, word processors are often quite useful at identifying the improperly entered character strings and correcting them to the intended word or phrase.
Entry errors are often more prevalent in word processors designed for languages that do not employ Roman characters. Language specific keyboards, such as the English version QWERTY keyboards, do not exist for many languages because such languages have many more characters than can be conveniently arranged as keys in the keyboard. For example, many Asian languages contain thousands of characters. It is practically impossible to build a keyboard to support separate keys for so many different characters.
Rather than designing expensive language and dialect specific keyboards, language specific word processing systems allow the user to enter phonetic text from a small character-set keyboard (e.g., a QWERTY keyboard) and convert that phonetic text to language text. “Phonetic text” represents the sounds made when speaking a given language, whereas the “language text” represents the actual written characters as they appear in the text. In the Chinese language, for example, Pinyin is an example of phonetic text and Hanzi is an example of the language text. By converting the phonetic text to language text, many different languages can be processed by the language specific word processor using conventional computers and standard QWERTY keyboards.
Word processors that require phonetic entry thus experience two types of potential entry errors. One type of error is common typing mistakes. However, event if the text is free of typographical errors, another type of error is that the word processing engine might incorrectly convert the phonetic text to an unintended character text. When both of these two problems are at work on the same phonetic text input string, a cascade of multiple errors may result. In some situations, the typing induced errors may not be readily traced without a lengthy investigation of the entire context of the phrase or sentence.
The invention described herein is directed primarily to the former type of entry errors made by the user when typing in the phonetic text, but also provide tolerance for conversion errors made by the word processing engine. To better demonstrate the problems associated with such typing errors, consider a Chinese-based word processor that converts the phonetic text, Pinyin, to a language text, Hanzi.
There are several reasons why entry of phonetic text often yields increased typing errors. One reason is that the average typing accuracy on an English keyboard is lower in China than in English-speaking countries. A second reason is that phonetic text is not used all that frequently. During earlier education years, users are not as prone to study and learn phonetic spelling as, for example, English-speaking users are taught to spell words in English.
A third reason for increased typing errors during phonetic text input is that many people speak natively in a regional dialect, as opposed to a standard dialect. The standard dialect, which is the origin of phonetic text, is a second language. In certain dialects and accents, spoken words may not match corresponding proper phonetic text, thus making it more difficult for a user to type phonetic text. For instance, many Chinese speak various Chinese dialects as their first language and are taught Mandarin Chinese, which is the origin of Pinyin, as a second language. In some Chinese dialects, for example, there is no differentiation in pronouncing “h” and “w” is certain contexts; in other dialects, the same can be said for “ng” and “n”; and yet in others, “r” is not articulated. As a result, a Chinese user who speaks Mandarin as a second language may be prone to typing errors when attempting to enter Pinyin.
Another possible reason for increased typing errors is that it is difficult to check for errors while typing phonetic text. This is due in part to the fact that phonetic text tends to be long, unreadable strings of characters that are difficult to read. In contrast to English-based text input, where what you see is what you typed, entered phonetic text is often not “what you see is what you get.” Rather, the word processor converts the phonetic text to language text. As a result, users generally do not examine the phonetic text for errors, but rather wait until the phonetic text is converted to the language text.
For this last reason, a typing error can be exceptionally annoying in the context of Pinyin entry. Pinyin character strings are very difficult to review and correct because there is no spacing between characters. Instead, the Pinyin characters run together regardless of the number of words being formed by the Pinyin characters. In addition, Pinyin-to-Hanzi conversion often does not occur immediately, but continues to formulate correct interpretations as additional Pinyin text is entered. Thus, if a user types in the wrong Pinyin symbols, the single error may be compounded by the conversion process and propagated downstream to cause several additional errors. As a result, error correction takes longer because by the time the system converts decisively to Hanzi characters and then the user realizes there has been an error, the user is forced to backspace several times just to make one correction. In some systems, the original error cannot even be revealed.
Since mistakes are expected to be made frequently during phonetic input, there is a need for a system that can tolerate errors in the phonetic input. It is desirable that the system would return the correct answer even though the phonetic string contains slightly erroneous characters.
Language specific word processors face another problem, separate from the entry problem, which concerns switching modes between two languages in order to input words from the different language into the same text. It is common, for example, to draft a document in Chinese that includes English words, such as technical terms (e.g., Internet) and terms that are difficult to translate (e.g., acronyms, symbols, surnames, company names, etc.). Conventional word processors require a user to switch modes from one language to the other language when entering the different words. Thus, when a user wants to enter a word from a different language, the user must stop thinking about text input, switch the mode from one language to another, enter the word, and then switch the mode back to the first language. This significantly reduces the user's typing speed and requires the user to shift his/her attention between the text input task and an extraneous control task of changing language modes.
Accordingly, there is a need for a “modeless” system that does not require mode switching. To avoid modes, the system should be able to detect the language that is being typed, and then convert the letter sequence to one language or the other, dynamically, on a word-by-word basis.
This is not as easy as it may seem, however, because many character strings may be appropriate in both contexts. For example, many valid English words are also valid Pinyin strings. Furthermore, more ambiguities may arise since there are no spaces between Chinese characters, and between Chinese and English words, during Pinyin input.
As an example, when a user types a string of Pinyin input text “woshiyigezhongguoren”, the system converts this string into Chinese character: “” (generally translated to “I am a Chinese”).
Sometimes, instead of typing “woshiyigezhongguoren”, a user types the following:
wosiyigezhongguoren (the error is the “sh” and “s” confusion);
woshiyigezongguoren (the error is the “zh” and “z” confusion);
woshiygezhongguoren (the error is the “i” omission after “y”);
woshiyigezhonggouren (the error is the “ou” juxtaposition);
woshiyigezhongguiren (the error is the “i” and “o” confusion).
The inventors have developed a word processing system and method that makes spell correction feasible for difficult foreign languages, such as Chinese, and allows modeless entry of multiple languages through automatic language recognition.