To date, the most widely used code standard for alphanumeric characters has been ASCII (American Standard Code for Information Interchange) which is a 7-bit binary code standardized by ANSI (American National Standards Institute). As the only letters that ASCII supports are the English letters, its implementation in information processing and interchange environments has been limited to English. As a result, a large number of computer systems today communicate in the English language only.
In recent years, the computer industry has recognized the need to support the non-English Latin-based languages in order to facilitate communication with a non-technical user who often is familiar with only his native language. Hence, a new 8-bit multilingual character set was defined by ISO (International Standards Organization) in 1986. That set has already gained a broad support from the industry and various national standard organizations. The name of the character set is Latin Alphabet #1 and it has been documented in the ISO Standard as ISO 8859/1. It supports 14 Western European and Western Hemisphere languages that are used in 45 countries around the world.
The set of languages and characters supported by the ISO standard ISO 8859/1--"Information Processing--8 bit single byte coded graphic character sets--Part 1: Latin Alphabet #1" is believed to include most of those that are used in North America, Western Europe and Western Hemisphere. They are listed below:
Danish, Dutch, English, Faeroese, Finnish, French, German, Icelandic, Irish, Italian, Norwegian, Portuguese, Spanish and Swedish.
These languages are believed used in at least the following countries:
______________________________________ Argentina Finland Panama Australia France Paraguay Austria Germany Peru Belgium Guatemala Portugal Bolize Guyana El Salvador Bolivia Honduras Spain Brazil Iceland Surinam Canada Ireland Sweden Chile Italy Switzerland Colombia Liechtenstein The Netherlands Costa Rica Luxembourg UK Cuba Mexico USA Denmark New Zealand Uruguay Ecuador Nicaragua Venezuela Faroe Islands Norway ______________________________________
Returning now to the ASCII Character set, the main advantage embodied by the English language with regard to sorting is that the alphabetical order of the letters in the English alphabet corresponds to the internal numerical collating sequence in the ASCII set. This special feature makes the sorting of English language strings relatively simple and in most cases efficient.
For example, to sort two characters, the following operations are performed:
1) Convert the cases of both characters into the same one (i.e. the characters become caseless).
2) Use straight comparison of codes (ordinal values) of both characters to determine the relative sort orders. The character whose ordinal value is smaller is collated first (in ascending order sorting).
The main advantage embodied by the English language alphabet (i.e. A to Z, no accented characters) with regard to data retrieval is that the matching process is basically unique (i.e. one-to-one mapping for all characters). In addition, as mentioned above, the ASCII sequences of the characters correspond to their sort order and hence alphabetically sorted data retrievals can be done relatively easily.
In addition, the full repertoire of the ASCII character set is normally represented in most cases by the users' terminals and hence problems of retrieving characters outside the keyboard repertoire does not normally arise.
In general, to insert a text string into an ordered database in ASCII, the following operations are performed:
1) Case conversion is done for the text string. This step is necessary for both upper and lower case versions of the same character to sort and match identically.
2) Use straight comparison of codes (ordinal values) of the case-converted text string against those existing in the database so as to find out the right insertion spot.
The retrieval operation usually goes through the following steps:
1) Find the matches based on the case-converted search key. The matches can be multiple and depends on whether the retrieval is by a unique key or associated with wildcard characters (e.g. find all entries beginning with "A").
2) Matched entries will be extracted and displayed to the user in sorted order since the data is stored in sorted order.