This application is related to the following application, which is filed on the same day as the present application and is assigned to the same assignee as the present application:
xe2x80x9cMethod And System For Character Sequence Checking According To A Selected Languagexe2x80x9d Ser. No. 09/345,195.
This invention relates to validation of and correction of sequences of input characters according to the syntactical rules of a selected language. More particularly, this invention relates to determining whether a typed sequence of characters is a valid sequence according to the character sequence and syntactical rules of the language being typed and correcting the sequence by replacing a previously validated character, if necessary, with a newly typed character to provide a valid sequence including the newly typed character.
In recent years, use of computers has increased dramatically worldwide. Users of computers utilize computer programs for a variety of purposes including word processing, database management, desktop publishing, and the like. Computer users are accustomed to using xe2x80x9ccheckingxe2x80x9d program modules (e.g., spell checkers and grammar checkers) that alert the user to words or sequences of characters found in the document that are questionable based on some predefined set of rules.
The written form of some languages includes sequences of complex characters and/or symbols. For example, South Asian languages like Thai, Vietnamese, and Hindi use combinations of various characters (also called simple characters herein) such as vowels, consonants, diacritics, tone marks, and accents to form complex characters. Those languages follow stringent syntactical rules that dictate which simple character is allowed next to or above or below another simple character in the composition of more complex characters used in the formation of words. In this context, a word can be composed of (a) one or more simple characters (e.g., consonant); (b) one or more complex characters (e.g., a complex character being formed by more than one simple character like a consonant and a tone mark); and (c) a combination of simple and complex characters. That is, the correct position of these simple characters in a complex character is necessary both syntactically and orthographically based on syntactical rules for each language. For example, in the Thai language, a leading Thai vowel must be followed by a consonant or a trailing vowel (also called following vowels) to form a valid Thai character. If other than a consonant or trailing vowel is input after the input of a leading vowel, then the sequence of the leading vowel and the subsequent non-consonant or non-trailing vowel character is an incorrect sequence and does not form a correct Thai language complex character or word. Similar character sequence and syntactical rules apply for other languages mentioned above, such as Vietnamese and Hindi. A problem arises for the person typing one of those languages because if the sequence of characters that should form a complex character is invalid, the complex character will not be rendered on the screen correctly, and therefore, the complex character will be meaningless.
Techniques to verify the validity of a sequence of input characters have been implemented, but those techniques are mainly oriented towards getting a proper display of the complex character and do not address the issue of enforcing a correct input sequence of characters according to the syntactical rules defined by the selected language. In some prior art systems, the validity of the sequence of characters is determined by comparing the typed sequence with a known valid displayable sequence (i.e., a sequence of characters that are valid in accordance with the rules of a selected language). Some prior art techniques allow the display of sequences of characters that are orthographically incorrect or simply display a symbol, such as a black box, whenever the sequence was not displayable due to errors in the sequence of input characters. Also, prior art sequence checking techniques do not allow for determining accurately the sequence context of a previously input sequence of characters once the user moves the cursor to a new location in the text.
Another problem with prior art techniques is that they do not allow for correcting an input sequence without re-typing the entire sequence. In many languages using complex characters, the complex character is not easily divisible into the simple characters comprising the complex character once the complex character is composed. For example, in some languages, like Thai language, simple characters are stacked vertically to form the complex character. Using conventional language input systems as are found in common word processors, data base programs, etc., the user may not place the cursor in the interior of a complex character to change or delete a simple character placed on top of another simple character. Accordingly, the entire sequence typically must be re-typed to edit an individual simple character included in the sequence. Also, prior art sequence checking techniques do not allow for determining the sequence context of a previously input sequence of characters once the user moves the cursor to a new location in the text.
Accordingly, there is a need in the art for an efficient method and system for checking the validity of a sequence of input characters according to the syntactical rules of a selected language. There is also a need for a method and system for automatically replacing a previously validated simple character (part of a previously validated sequence of simple characters) with a newly typed character in order to validate a sequence including the newly typed simple character without re-typing the entire sequence. There is further a need for a method and system for determining the sequence validity context of a sequence of previously typed simple characters.
The present invention satisfies the above-described needs by providing a method and system for checking the validity of a sequence of input characters according to the rules of a selected language. Each simple character is checked to determine whether that simple character may form a valid sequence of simple characters according to the rules for the selected language to which the simple characters belong. If an input character may not be appended to the previously input sequence according to the rules of the selected language, the newly input character may be prohibited from being appended to the sequence and displayed on the user""s computer. The present invention also allows for the replacement of previously input simple characters with the newly input simple character for formation and display of a valid sequence of characters containing the newly input character. Newly input simple characters may also be inserted within a sequence of previously input simple characters. The present invention provides for editing of previously input character sequences by determining the validity context of sequences of characters.
Generally described, when a user types simple characters in the formation of complex characters in selected languages, such as Thai, Hindi and Vietnamese, a determination is made as to whether each newly typed simple character may be added to the sequence of characters already typed by the user. If adding the new character to the sequence violates the rules of the selected language, an attempt is made to replace an existing character in the previously typed sequence with the new character or to insert the new character at a position within the previously typed sequence in a manner that does not violate the rules.
More particularly described, one aspect of the present invention provides a method of combining a new character with an existing sequence of characters where adding the new character sequentially to the existing sequence of characters violates the rules associated with a selected language. The method includes the steps of receiving the new character for appending sequentially to the sequence of characters and determining whether the new character may be appended sequentially to the sequence of characters according to the rules of the selected language. If the new character may not be appended sequentially to the sequence of characters according to the rules of the selected language, a determination is made as to whether the new character may be inserted between two characters of the sequence of characters to form a valid sequence according to the rules of the selected language. If the new character may be inserted between two characters of the sequence of characters, the new character is inserted between the two characters.
If the new character may not be inserted between two characters of the sequence of characters, a determination is made as to whether an existing character in the sequence of characters may be replaced by the new character so that the combination of the new character and other characters in the sequence of characters form a valid sequence according to the rules of the selected language. If an existing character in the sequence of characters may be replaced by the new character, the existing character is replaced with the new character. If an existing character in the sequence of characters may not be replaced by the new character, the new character is discarded.
The step of determining whether the new character may be inserted between two characters of the sequence of characters to form a valid sequence according to the rules of the selected language includes utilizing a state transition table and assigning a first state to the existing sequence of characters according to the rules associated with the selected language. A determination is made as to whether the new character is associated with a character insertion transition action that dictates where within the existing sequence of characters the new character may be inserted.
The step of determining whether an existing character in the sequence of characters may be replaced by the new character so that the combination of the new character and other characters in the sequence of characters form a valid sequence according to the rules of the selected language includes utilizing a state transition table and assigning a first state to the existing sequence of characters according to the rules associated with the selected language. A determination is made as to whether the new character is associated with a character replacement transition action that dictates which character within the existing sequence of characters may be replaced by the new character.
In another aspect of the present invention, a system is provided for combining a new character with an existing sequence of characters where adding the new character sequentially to the existing sequence of characters violates the rules associated with a selected language. A computer program module is operative to receive the new character for appending sequentially to the sequence of characters and to determine whether the new character may be appended sequentially to the sequence of characters according to the rules of the selected language. If the new character may not be appended sequentially to the sequence of characters according to the rules of the selected language, the program module is operative to determine whether the new character may be inserted between two characters of the sequence of characters to form a valid sequence according to the rules of the selected language. If the new character may be inserted between two characters of the sequence of characters, the program module is operative to insert the new character between the two characters. The program module is operative to determine whether an existing character in the sequence of characters may be replaced by the new character so that the combination of the new character and other characters in the sequence of characters form a valid sequence according to the rules of the selected language if the new character may not be inserted between two characters of the sequence of characters. Additionally, the program module is operative to replace the existing character with the new character, or if an existing character in the sequence of characters may not be replaced by the new character, to discard the new character.