In recent years, use of computers has increased dramatically worldwide. Users of computers utilize computer programs for a variety of purposes including word processing, database management, desktop publishing, and the like. Computer users are accustomed to using “checking” program modules (e.g., spell checkers and grammar checkers) that alert the user to words or sequences of characters found in the document that are questionable based on some predefined set of rules.
The written form of some languages includes sequences of complex characters and/or symbols. For example, South Asian languages like Thai, Vietnamese, and Hindi use combinations of various characters (also called simple characters herein) such as vowels, consonants, diacritics, tone marks, and accents to form complex characters. Those languages follow stringent syntactical rules that dictate which simple character is allowed next to or above or below another simple character in the composition of more complex characters used in the formation of words. In this context, a word can be composed of (a) one or more simple characters (e.g., consonant); (b) one or more complex characters (e.g., a complex character being formed by more than one simple character like a consonant and a tone mark); and (c) a combination of simple and complex characters. That is, the correct position of these simple characters in a complex character is necessary both syntactically and orthographically based on syntactical rules for each language. For example, in the Thai language, a leading Thai vowel must be followed by a consonant or a trailing vowel (also called following vowels) to form a valid Thai character. If other than a consonant or trailing vowel is input after the input of a leading vowel, then the sequence of the leading vowel and the subsequent non-consonant or non-trailing vowel character is an incorrect sequence and does not form a correct Thai language complex character or word. Similar character sequence and syntactical rules apply for other languages mentioned above, such as Vietnamese and Hindi. A problem arises for the person typing one of those languages because if the sequence of characters that should form a complex character is invalid, the complex character will not be rendered on the screen correctly, and therefore, the complex character will be meaningless.
Techniques to verify the validity of a sequence of input characters have been implemented, but those techniques are mainly oriented towards getting a proper display of the complex character and do not address the issue of enforcing a correct input sequence of characters according to the syntactical rules defined by the selected language. In some prior art systems, the validity of the sequence of characters is determined by comparing the typed sequence with a known valid displayable sequence (i.e., a sequence of characters that are valid in accordance with the rules of a selected language). Some prior art techniques allow the display of sequences of characters that are orthographically incorrect or simply display a symbol, such as a black box, whenever the sequence was not displayable due to errors in the sequence of input characters. Also, prior art sequence checking techniques do not allow for determining accurately the sequence context of a previously input sequence of characters once the user moves the cursor to a new location in the text.
Accordingly, there is a need in the art for an efficient method and system for checking the validity of a sequence of input characters according to the syntactical rules of a selected language. There is further a need for a method and system for determining the sequence validity context of a sequence of previously typed simple characters.