As a business or organization expands, there is a greater need for effective management of data and records. A vast number of businesses and organizations use computer databases to collect and organize their data and records. A database is a collection of information organized in such a way that a computer program can access desired pieces of data. One of the ways in which databases are particularly useful is that records can be sorted in any number of ways. One common sort order is alphabetic, for example by last name of a purchaser or by a business name.
Sort orders vary from language to language, and many specifications require variations. For example, in traditional German (e.g.,  and ) and Spanish (e.g., ch or ll), one character may compare as if it were two, or two characters may compare as if they were one character. This is commonly referred to as multiple mapping. In most languages, multiple mapping characters are sorted alphabetically as if they were two characters. In other words, in most languages multiple mapping characters are alphabetically ordered as if they were two characters (e.g.,  is ordered as if it were the letters A and E).
However, in some languages, the definition of a character can be altered by the preceding character in the word or character string. For example, in Japanese (in both Hiragana and Katakana character sets), a length mark lengthens the vowel of the preceding character. The length mark can be depicted three ways: as a full-width length mark (FIG. 1A), as a half-width length mark (FIG. 1B) and as a dash (FIG. 1C). Depending on the vowel of the preceding character, the length mark will sort in a different alphabetic order. For example, after the character “ka” (FIG. 1D), the length mark indicates a long “a” and comes alphabetically after the character “a” (FIG. 1E) rather than before. In another example, after the character “ki” (FIG. 1F), the length mark indicates a long “i” and comes alphabetically after the character “i” (FIG. 1G) rather than before.
In current multilingual database architectures, Unicode is often used to depict characters. Unicode is a superset of the ASCII character set that uses two bytes for each character rather than one. Because Unicode is able to handle 65,536 character combinations rather than just 256, it can house the alphabets of most of the world's languages. Unicode is a desirable character set because it easily enables a database user to enter in records in a number of different languages.
In one current database architecture, alphabetic sort orders are performed by determining a character's sort weight by accessing a collation weight table. A collation weight table provides a numerical value for a character for sorting. For example, in a collation weight table the letter “A” may have a sorting weight of 10 and the letter “B” may have a sorting weight of 15. When the sort order performs the sort, “A” will be ordered before “B” because it has a smaller value. The collation weight table can have a sort weight for every Unicode character, thus allowing for sorting in multiple languages.
Occasionally, there are situations where the collation weight table cannot be used to determine the sort weight of a character. As described above, one character may compare as if it were two characters or two characters may compare as if they were one character. To account for this behavior of certain characters, the collation weight table will refer to a multiple mapping table to determine the ordering of these characters. Accessing the multiple mapping table requires extra processing time, but because only a small percentage of characters in most languages reside in the multiple mapping table, the extra processing time is typically nominal.
However, in performing Japanese character alphabetic sort orders, the processing time is substantial. To account for the effect a length mark (FIGS. 1A–C) has on a character, it is necessary to look ahead to the next character. Current database management systems accomplish this by referring to the multiple mapping table for every Japanese character. As a result, the processing time is effectively doubled for each character. The increased processing time reduces the efficiency of an alphabetic sort order. As several thousand or millions of records can be sorted using a Japanese alphabetic sort order, users are subject to substantial time requirements to perform Japanese alphabetic sort orders.