The present invention is generally directed to computer text and, more specifically, to a technique for storing and displaying computer text.
Historically, hundreds of different encoding systems have been utilized within computer systems to assign numbers to computer text, e.g., characters of a given language. As is well known, computer systems deal with numbers and store letters and other characters by assigning a number for each character. Unfortunately, many encoding systems have conflicted with one another in that different encoding systems have used the same number for different characters or used different numbers for the same character. As such, in computer systems that have implemented different encoding systems, there has been a need to support multiple encodings such that when data was passed between different encodings or platforms the data could be utilized without risk of corruption.
To address this problem, a number of computer related manufacturers have adopted the Unicode standard, which has enabled a single software product or a single website to be targeted across multiple platforms, languages and countries without re-engineering. Unicode has provided a unique number for many language characters irrespective of the platform, program, or language that is implemented. In this manner, a computer system incorporating Unicode allows for data to be transported across different systems without corruption.
The Unicode Standard allocates 16 bits that are each either on or off to represent the numeric value of a character in the Unicode ordering. This allows for characters numbered from zero through 2^16-1, or 65,535. Not all of these 65,535 possible numeric values currently have a character associated with them. When a new character is added to the Unicode Standard, the numeric value that will represent the new character is chosen by the Unicode Technical Committee. The numeric value chosen for some Unicode characters is based on the character's ordering in some language. For example, the English language character ‘A’ has the numeric value of 65 in Unicode followed by the character ‘B’ with the numeric value of 66, followed by the character ‘C’ with the numeric value of 67, and so on through ‘Z’ with the numeric value of 90 in Unicode. However, there is no logical relationship between the shape of a character and its numerical value representation in Unicode. For this reason, most users of Unicode could not enter the correct numeric value of a new character shape that they wish to display without viewing a table that shows each Unicode character and its corresponding numeric value. Searching a table of 65,535 entries can be very time consuming and even after searching all 65,535 entries for a desired character shape, it is possible that the desired character shape has no associated numerical value in Unicode.
What is needed is a technique that addresses the need for time-consuming searches of numeric values to find a desired character shape. It would also be desirable for the technique to address the problem of some character shapes having no associated numerical value by using a process to generate a numeric value for every possible character, based on its shape.