Strings are generally considered fundamental data types and many computer applications generally have the ability to compare strings. Although string comparison functionality can be found in a number of software applications, one specific application is the use of strings to compare textual data. For example, comparisons between textual strings are utilized by software applications and operating systems to sort characters and words in various languages.
It is well known to one of ordinary skill in the art that there are many computer-implemented algorithms designed for comparing textual strings. For instance, string comparison algorithms exist in the core of many operating systems and are an integral part of most database programs. These existing systems are sufficient for conducting string comparisons, which is ultimately used for sorting and ordering text that represent various languages. As known to one of ordinary skill in the art, numerical codes are used in string comparison algorithms to represent characters in a string, and each character may represent a letter from an alphabet of any language. More specifically, the numerical codes that represent the characters are utilized by computing devices to order, sort and prioritize the character strings according to a desired format, such as for example a database that orders strings in alphabetical order.
One known universal coding and indexing system, generally referred to as Unicode, is commonly utilized in computing applications for sorting and ordering textual strings. General background information of the Unicode Standard can be found in the published document entitled “Unicode Standard Version 3.0, Addison Wesley, Unicode Consortium, ISBN 0-201-61633-5,” the subject matter of which is specifically incorporated herein by reference. The Unicode Standard is generally functional for allowing software applications to sort and order textual strings that represent various letters and words from a common language. More specifically, the Unicode Standard generally groups symbols from a common language as a series of successive 16-bit values. As can be appreciated by one of ordinary skill in the art, most commonly known languages are indexed in the Unicode system. However, there still exists many languages that comprise a plurality of alphabets and/or character sets, where the Unicode Standard does not provide a way to map, sort, and compare every word or character. These alphabets and/or characters that are not part of a standard indexing system are referred to as non-indexed characters.
One illustrative example of a textual string comparison application involves the Korean language, which incorporates Hangul. As will be generally understood by one skilled in the relevant art, modem Hangul has the desirable property that there is exactly one modem Hangul character per syllable. To facilitate comparison between modem Hangul characters, each modem Hangul character/syllable has consequently been assigned a unique numeric weight value. One skilled in the relevant art will appreciate that Unicode is a 16-bit encoding standard in which each character in a variety of languages is given a unique numerical representation. Accordingly, by assigning each modem Hangul character a numeric weight in an ascending manner, a comparison of Hangul characters is accomplished by mathematically comparing the character's numeric weights.
While the above-described system provides a system for comparing the most modem characters, some languages, such as the Korean language, present a unique situation in which certain characters, such as old Hangul characters, are not fully incorporated in existing coding or indexing systems. For instance, old Hangul characters are not entirely incorporated in the Unicode system. Thus, old Hangul characters cannot be readily compared to modem Hangul characters by the use of generally known character comparison and sorting methods.
Accordingly, in view of the above problems, there exists a need for a system and method that allows computing devices to execute string comparison functions that involve complex languages not fully indexed in a coding system. In addition, there exists a need for a system and method for sorting and processing old Hangul characters with modem Hangul characters.