The invention relates to encoding characters.
Many ways exist to encode characters. For example, the American Standard Code for Information Interchange (ASCII) and the Multinational Character Set (MCS) assign a binary code to each character where the value of the code is the position of the character in an arbitrarily ordered character set. ASCII, for instance, includes alphabet letters ("A-Z" and "a-z"), numerals ("0-9"), and other characters (e.g., "!", "#", "$", "%", or "&"). Each character has a position in the set the value of which is the character's code. The characters "A", "B", and "C", for example, are in positions 65, 66, and 67, and are assigned codes 1000001, 1000010, and 1000011, respectively.
MCS, on the other hand, subsumes the ASCII character set and further includes so-called "multinational" characters. These multinational characters include phonetic characters, such as ligatures (e.g., " ") and characters having diacritical markings (e.g., "A", "E", and "O"), as well as other characters such as " " and " ". Again, each character has a position in the set the value of which is the character's code. The characters "A", "A", and "A", for example, are in positions 193, 194, and 195, and are assigned codes 11000001, 11000010, and 11000011, respectively.
The codes in ASCII and MCS are often used to compare two characters from the same character set. A first character is greater than, less than, or equal to a second character if the value of its code is greater than, less than, or equal to the value of the code of the second character. For example, in MCS, "A" is less than "A" because 1000001 is less than 11000001.
The codes in ASCII and MCS are also used to compare strings of two or more characters from the same character set. To compare a first string and a second string, the character comparison described above is applied to a character in the first string and its corresponding character in the second string. The comparisons are repeated on successive corresponding characters until a character from the first string is greater than or less than its corresponding character in the second string, an operation referred to as a "character by character" comparison.
For example, a character by character comparison of the strings, "canoes" and "canons" indicates that "canoes" is less than "canons" because although the codes for "c", "a", "n", and "o" are equal, the value of the code for "e" (01100101) is less than the value of the code for "n" (01101110). Note, however, that a character by character comparison ends once unequal characters are found. In the present example, the character "s" is never compared. This aspect of the character by character comparison can produce undesired results when strings contain a mixture of uppercase characters, lowercase characters, and phonetic characters. For example, in MCS, a character by character comparison indicates that "McDougal" is less than "Mcdonald" and that "Muttle" is less "Muller". One method used to compare strings that contain a mixture of uppercase, lowercase, and phonetic characters is the "three pass comparison" described below.
In the three pass comparison method, the steps of the first pass are to 1) convert the characters of two strings to all uppercase characters, 2) reduce any phonetic characters to their base character, and 3) perform a character by character comparison on the remaining characters. For example, "Muller" and "Muller" become "MULLER" and "MULLER", "MacDonald" and "Macdonald" become "MACDONALD" and "MACDONALD", "MacDougal" and "MacDougal" become "MACDOUGAL" and"MACDOUGAL", and "Muttle" and "Muller" become "MUTTLE" and "MULLER". If the character by character comparison returns a value of equal, then the method proceeds to the second pass. For example, "MULLER"="MULLER", "MACDONALD"="MACDONALD", and "MACDOUGAL"="MACDOUGAL". Otherwise, the comparison returns either a result of greater than or less than and the method ends. For example, "MUTTLE"&gt;"MULLER".
The steps of the second pass are to 1) convert the characters of the two strings to all uppercase characters with phonetic characters left in, and 2) compare the strings character by character. For example, "Muller" and "Muller" become "MULLER" and "MULLER", "MacDonald" and "Macdonald" become "MACDONALD" and "MACDONALD", and "MacDougal" and "MacDougal" become "MACDOUGAL" and "MACDOUGAL". If the comparison returns that the strings are equal, then the method proceeds to the third pass. For example, "MACDONALD"="MACDONALD" and "MACDOUGAL"="MACDOUGAL". Otherwise, the comparison returns a result of greater than or less than and the method ends. For example, "MULLER"&lt;"MULLER".
The steps of the third pass are to 1) convert the strings to mixed uppercase and lowercase characters with phonetic characters, and 2) compare the strings character by character. For example, "MacDonald" and "Macdonald" become "MacDonald" and "Macdonald", and "MacDougal" and "MacDougal" become "MacDougal" and "MacDougal". If the comparison returns a result of equal, the method ends. For example, "MacDougal"="MacDougal". Otherwise, if the comparison returns a result of greater than or less than, the method ends. For example, "MacDonald"&gt;"Macdonald".