1. Field of the Invention
This invention generally relates to coded character sets and more particularly to the problem faced by computer system users who have data encoded in a first coded character set and are required to migrate the data to a second coded character set.
2. Description of the Prior Art
As more of the non-English speaking world comes to rely on the automation and information processing power of computer technology, computer system providers can no longer assume that the end-user of a system will be fluent in English or in a language which can be transliterated into the English alphabet.
European governments are increasingly requiring that information in databases be represented with the character symbols for the language for that country. For example, it is becoming unacceptable to use the English characters "ue" in a data base or file to represent the German character "u" or to force the use of "ss" to represent the character ".beta.".
As a result of these government requirements and changes in user expectations, computer manufacturers are now required to support coded character sets that contain more than the standard English language characters. The most common extended crated character set is the "Latin-1" coded character set which is represented using the International Organization for Standardization (ISO) coded character set ISO-8859.1.
The ASCII coded character set has been in use in many American-made computers for several decades. It is a character encoding which uses 7-bits to represent 95 graphical characters:
&lt;space&gt; ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; &lt; = &gt; ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z[ .backslash. ] .sub.-- a b c d e f g h i j k l m n o p q r s t u v w x y z { .vertline. } .about . PA1 &lt;space&gt; ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; &lt; = &gt; ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ .backslash. ] .sub.-- a b c d e f g h i j k l m n o p q r s t u v w x y z { .vertline. } .about. .cent. .English Pound. .sunburst. .Yen. .linevert split. .sctn. .COPYRGT. .sup.a &lt;&lt; - .RTM. .sup.-o .+-..sup.23' .mu. .paragraph., .sup.1o &gt;&gt; 1/4 1/2 3/4 A A A A A .ANG. E E E E I I I I N O O O O O x .O slashed. U U U U Y .beta. a a a a a .ang. .ae butted. .cedilla. e e e e i i i i n o o o o o .div. .o slashed. u u u u y y
The Latin-1 coded character set uses 8-bits to represent 191 graphical characters:
The International Organization for Standardization (ISO) has standardized several coded character sets. The most widely used 7-bit sets are the ISO 646 family of coded character sets listed below:
ISO 646 US PA0 ISO 646 UK PA0 ISO 646 France PA0 ISO 646 Germany PA0 ISO 646 Italy PA0 ISO 646 Spain PA0 ISO 646 Sweden PA0 ISO 646 Denmark PA0 ISO 646 Norway PA0 ISO 8859.1 Latin Alphabet No. 1 PA0 ISO 8859.2 Latin Alphabet No. 2 PA0 ISO 8859.3 Latin Alphabet No. 3 PA0 ISO 8859.4 Latin Alphabet No. 4 PA0 ISO 8859.5 Latin/Cyrillic PA0 ISO 8859.6 Latin/Arabic PA0 ISO 8859.7 Latin/Greek PA0 ISO 8859.8 Latin/Hebrew PA0 ISO 8859.9 Latin Alphabet No. 5 PA0 ISO 8859.10 Box drawing set
The ISO 646 family of coded character sets have "National Replacement Characters". For example, in ISO 646 Germany, the code which, in ASCII (ISO 646 US), represents "]" instead represents "U" and the code which, in ASCII, represents "}" instead represents "u". The characters "]" and "}" cannot be represented at all using the ISO 646 Germany coded character set.
The most widely used 8-bit sets are the ISO 8859.n family of character sets:
The general problem faced by a computer system user having data encoded in a first coded character set and facing the requirement to migrate to a second coded character set may be exemplified as follows: Suppose a user has a large database file containing data encoded using an ISO 646 variant. If that user wants to begin using an ISO 8859.n character set for the encoding, the only choice available today is to make the database file unavailable, unload, transliterate (i.e., convert character codes from ISO 646 to ISO 8859.n), and reload the data, and then make the database file available again.
Many computer system users cannot allow critical database files to be unavailable for even a brief period of time. The database may be the heart of their business--for example in an on-line transaction processing environment such as an airline reservation system. Thus, these users require a way to convert their computer systems and software to use a new coded character set that does not severely impact their day-to-day business operations.