The present invention relates to data processing systems. More particularly, the present invention relates to the input of a written language having ideograms such as Chinese and Japanese into a computer system.
The input of non-phonetic or non-alphabetic languages having ideograms into a computer system can be time-consuming and cumbersome. (As is known and as used herein “ideograms”, which are also known as “logograms” or “logographic”, are symbols that represent a word in a written language, as opposed to using phonemes or syllables to construct words from their component sounds.) One commonly used system is often referred to as IME (Input Method Editor), which is sold by Microsoft Corporation of Redmond, Wash. In this system, phonetic symbols are provided to a computer using a standard keyboard. The computer includes a converter module that converts the phonetic symbols to the selected language. For example, it is common to form Japanese text in a computer system by entering phonetic characters from an English or Latin keyboard. Inputting Japanese phonetic characters using the letters of the Latin alphabet is called “Romaji”. The computer system compares each of the Romaji characters with a stored dictionary and produces a “Kana” sequence (Kanas). Kanas are Japanese syllabic symbols that represent the sound of Japanese. The IME converter then converts the Kana formed into “Kanji” form, which is a formal Japanese writing language, through sophisticated linguistic analysis (the formal Japanese writing system actually consists of a mixture of Kanjis and Kanas, where the Kanjis represent most of the content information and bear no direction information about pronunciation).
However, in a conventional text processing system used in a Japanese word processor such as the IME system discussed above, the appropriate Kanji equivalent for the Kana sequence often must be selected or corrected using a so-called candidate display-and-choice method. Specifically, a number of Kanji candidates are displayed for a sequence of Kana so that the user can choose the appropriate one. This display-and-choice method is necessary since the Japanese language includes a number of homonyms and no explicit word boundaries, which cause inevitable Kana to Kanji conversion errors. By displaying the Kanji candidates, the user can view the possible candidates and select the appropriate Kanji representation.
Similarly, the text editing module used in Chinese word processors or other Chinese language processing systems also requires IME conversions, which convert from phonetic symbols (Pinyin) to the written Hanzi representations. Pinyin IME is the most popular phonetic Chinese IME and operates similar to the Japanese Kana IME discussed above. Generally, phonetic Pinyin string information is converted to Hanzi through the use of a Pinyin dictionary and language models. The lack of tone marks in Pinyin IME can cause far more homonyms to occur than with Japanese Kana IME. Often the list of homonyms for some Pinyin sequences can be too long to fit on the entire screen of the visual display.
Recently, speech recognition has been used in these systems, which naturally provide phonetic information previously inputted through the keyboard. However, the homonym problem discussed above still exists. In addition, speech recognition errors can be made during conversion, which may require even more use of the candidate display-and-choice method in order to obtain the correct ideogram.
Accordingly, there is an on-going need to more effectively and efficiently implement a system to obtain the written symbols for languages such as Chinese and Japanese having ideograms.