The Japanese written language contains three separate character strings. Simple Japanese characters representing phonetic syllables are represented by the hiragana and katakana character sets (together referred to as “kana”). Hiragana characters, which are characterized by a cursive style, are typically used for words native to Japan. Katakana characters, which are characterized by a more angular style, are typically used for words borrowed from other cultures, or for emphasis and sound effects. The third character set in Japanese is kanji. Kanji are the complex Japanese characters borrowed from the Chinese language. There are over 9000 kanji characters in the Japanese language. Approximately 4000 kanji are used on a semi-regular basis, while knowledge of 2000 kanji is generally required to read a newspaper or get around in Japan. The complexity of the Japanese written language poses several challenges for efficient text entry in computers, word processors, and other electronic devices.
FIG. 1A shows an example of Japanese hiragana and katakana characters. The hiragana 151 and katakana 152 character sets each contain 46 base characters. Both sets of kana have identical pronunciations and rules of construction, only the shapes of the characters are different to emphasize the different usage of the words. Some base kana characters are used in certain combinations and in conjunction with special symbols (called “nigori” and “maru”) to produce voiced and aspirated variations of the basic syllables, thus resulting in a full character set for representing the approximately 120 different Japanese phonetic sounds. If a Japanese keyboard included separate keys for all of the voiced and aspirated variants of the basic syllables, the keyboard would need to contain at least 80 character keys. Such a large number of keys create a crowded keyboard with keys, which are often not easily discernible. If the nigori and maru symbol keys are included separately, the number of character keys can be reduced to 57 keys. However, to generate voiced or aspirated versions of a base character requires the user to enter two or more keystrokes for a single character.
Common methods of Japanese text entry for computers and like devices typically require the use of a standard Japanese character keyboard or a roman character keyboard, which has been adapted for Japanese use. A typical kana keyboard has keys which represent typically only one kana set (usually hiragana) which may be input directly from the keyboard. A conventional method is to take the hiragana text from the keyboard containing the hiragana keys as an input, and convert it into a Japanese text using a process called Kana-Kanji conversion. A typical Japanese text is represented by hiragana, katakana and kanji characters, such as sentence 150, which has English meaning of “Watch a movie in San Jose”. The text 150 includes katakana characters 154 which are corresponding to a foreign word of “San Jose”, a hiragana character 155 that is normally used as a particle, and a kanji character set 153.
FIG. 1B shows a conventional method of converting a hiragana text to a Japanese text. Referring to FIG. 1, the Japanese hiragana characters are entered 1101 through a keyboard. The hiragana characters are converted 102 to Japanese texts by looking up characters in a database (e.g., dictionary). Then the user has to inspect 103 and check 104 whether the conversion is correct. If the conversion is incorrect (e.g., the dictionary does not contain such conversion), the user has to manually force the system to convert the hiragana text. A typical user interaction involves selecting 105 portions of the hiragana texts, which are converted incorrectly and explicitly instructing 106 the system to convert such portion. The system then presents 107 a candidate list including all possible choices. The user normally checks 109 whether the conversion is correct. If the conversion is correct, the user then selects 108 a choice as its best output and inserts the correct result to form the final output text. If the conversion is incorrect, the user reselects a different portion of the input and tries to manually convert the reselected portion again.
One of the conventional methods, transliteration (direct conversion from hiragana to katakana) normally does not provide a correct result for most of the cases, because typically users choose (e.g., in a method shown in FIG. 1B), instead of the katakana word, a segment containing the word and one or more trailing post particles that are written in hiragana in the final form. The normal transliteration will also convert all trailing post particles to katakana form which is incorrect.
Another conventional method generates alternative candidates by transliterating the leading sub-string of the string. This method takes advantage of the fact that the trailing particles are always trailing and are all in hiragana. This method creates many candidates that may include the correct one among them. Following is an illustration of an example of the a conventional method (in English):
input:inthehouseoutput 1:INTHEHOUSEoutput 2:i NTHEHOUSEoutput 3:in THEHOUSEoutput 4:int HEHOUSEoutput 5:inth EHOUSEoutput 6:inthe HOUSE - (correct one)output 7:intheh OUSEoutput 8:intheho USEoutput 9:inthehou SEoutput 10:inthehous Eoutput 11:inthehouseAs described above, the conventional method generates many candidates after the user selects a potion of the input text to be corrected, which may lead to confusion of the final selection, even though such candidates may include a correct choice. Another conventional method involves an analyzer, which can recognize post particles. It analyzes the range from the end until the analyzer cannot find post particles any more. However, the conventional methods require a user to interact thereby potentially lower efficiency in order to achieve accurate results.
One of the disadvantages of the conventional method is that if a Katakana word is not in the dictionary, the conversion containing the Katakana word usually fails. Another disadvantage of this method is that it involves user-specific interaction to convert and select the best candidate. It consumes more time and efforts if the user does not know the possible outputs of the conversion. Hence, a better method to automatically and efficiently convert Japanese hiragana character string to katakana character string is highly desirable.