The Japanese written language uses over 10,000 characters, called Kanji, which are not phonetically based. This large number of characters poses challenges for efficient text entry in computers. A common method of entering Japanese text is for the user to type text in phonetic characters, called Kana, and for the computer to convert the Kana characters into Kanji text using a process called Kana-Kanji conversion.
The Kana-Kanji conversion is a complex process. Recent Kana-Kanji conversion engines employ grammatical analysis, for example adjectives can come before nouns, as well as semantic analysis, for example “summer” can mean “hot temperature,” but is not likely to mean “spicy hot.” The Kana-Kanji conversion shares a very similar nature with character or voice recognitions in that for a given input there are multiple possible results, and the conversion process needs to rank the possible results in order to present the most probable output to the user. The output can be a wrong result, which is referred to as a conversion error. The frequency of conversion errors is measured by calculating the number of correctly converted words divided by the total number of words converted. Conversion accuracy is often the most important factor when a user chooses between Kana-Kanji conversion engines, and recent conversion engines have a conversion accuracy of 96-97%.
One class of conversion errors is referred to as the context dependent class. An example of a context dependent error occurs with different words that have the same pronunciation, and thus the same Kana phonetic character, but multiple Kanji characters depending on the context. For example, the Japanese phonetic sound “sousha” can mean “player” of musical instruments or “runner.” Both are pronounced “sousha,” but are written differently. If a user types in Kana: <the “sousha” of the piano was Mary>, the conversion engine's semantic analysis is able to determine that the Kana “sousha”should be converted to the Kanji character meaning “player” and not “runner” because of the context of “sousha” with “piano.” In contrast, if the user types in Kana: <The “sousha” was Mary>, the conversion engine's semantic analysis does not have proper context in which to interpret “sousha” and must make an arbitrary guess for the Kanji character, which may be incorrect.
Another common conversion error occurs with names, which may be pronounced the same but written differently. For example, a common Japanese name “Keiko” can be written more than ten different ways. If the user knows two people named Keiko, one the user's friend and the other the user's boss, the user might want to compose emails to both and type in Kana: <Hi, “Keiko”, let's go skiing this weekend> and <“Keiko”, let's talk about the project schedule>. The user would like the conversion engine to convert the first “Keiko” to the Kanji character associated with a friend named Keiko and the second “Keiko” to the Kanji character associated with a boss named Keiko. Unfortunately, the grammatical and semantic analysis used by existing conversion engines is unable to choose the correct Kanji character because the existing conversion engines do not know that one “Keiko” goes skiing while another “Keiko” talks about project schedules.
Although the above problem has been described in terms of Kana and Kanji, it applies equally to any language where different written words have identical pronunciations or identical phonetic representations. For example, the English written words “main” and “mane” have identical pronunciations. Semantic information is unhelpful in analyzing the spoken sentences: “The main was cut” versus “The mane was cut” where “main” refers to a pipe and “mane” refers to an animal's hair.
Since purchasers of conversion engines make buying decision based on conversion accuracy, providing a solution that performs more accurate conversion is critically important.