Spelling checkers, style checkers, and grammar checkers are common in modern word processing programs. The Japanese language presents interesting problems in this area because of several characteristics of the written language. First, the Japanese language employs several different alphabets, which may be used in combination. Second, Japanese text is typically written without any spaces between words. Third, the Japanese language is highly inflectional, which means Japanese words can undergo significant spelling changes to indicate case, gender, number, tense, person, mood, or voice.
The most commonly used Japanese alphabets (or writing systems) are Kanji, Hiragana, and Katakana. The Kanji alphabet includes pictographs or ideographic characters that were adopted from the Chinese alphabet. Hiragana and Katakana are phonetic alphabets that do not include any characters common to each other or to Kanji. Hiragana is used to spell words of Japanese origin. Katakana is used to spell words of foreign (primarily western) origin. Kanji pictographs are analogous to shorthand variants of Hiragana words in that any Kanji word can be written in Hiragana, though the converse is not true. A single Japanese word can include characters from more than one alphabet.
One of the functions performed by style checkers is to detect malformed phrases and suggest replacement text strings. One approach to performing style checking is to use error-rewrite pairs to identify malformed phrases and suggest replacement strings. It is theoretically possible to have a list of virtually all possible errors and a corresponding list of replacement or rewrite strings. Although this approach is straightforward, it would require a very large number of rules and a large amount of memory.
An alternative to using specific error-rewrite pairs is to define error classes, which would allow the number of rules to be drastically reduced. However, this approach requires the style checker to "generate" a replacement string based on the error that is detected and the correction that needs to be made. Phrase reconstruction can be accomplished by employing graph searching logic to arrive at a suitable replacement string. This type of approach would typically employ a breadth first search or a depth first search.
Depth first searching turns out to be impractical in this context because the search algorithm always goes to the specified maximum level in the hierarchy. This results in one of two problems: (1) no search results are obtained because the maximum specified level is insufficient; or (2) excessive searching time is required to search sufficient levels.
Breadth first searching is capable of finding the shortest match before all other matches, but is also impractical because of the amount of storage and computational power required. In a typical scenario, one level of morpheme transitions requires approximately 100 new child nodes. Thus, mere three levels of transitions would require that more than 1 million nodes be searched. This amount of searching is unacceptable in microcomputer environments in terms of both the amount of memory and time required.
Therefore, there is a need in the art for an improved method for identifying malformed Japanese text and generating replacement strings for the malformed text. An acceptable solution should be small enough (in terms of memory requirements) and fast enough to perform satisfactorily in a desktop computer environment.