As computer software and hardware becomes used in an increasing number of countries, there is an increasing demand to make available in the user's language and character set, and with the user's conventions of date, time, territory and currency, all of the increasing amount of information at the user interface, and in files and documents. In the past this has either not been done at all, or has been customised so that any given document or screen page has been prepared with the required conventions buried in the document or file. As a result, such a national language document has been convenient for readers and user only in the particular location and using the peculiar representations of date, time, territory and currency that are appropriate for that one document. If it was desired to use the same document in another language or territory, the new users had either to take it as it was in the original form, or to perform language and character conversion and other culturally-based changes at the cost of considerable time and labour. In an effort to codify the representation of date, time, currency, territory and character set to enable them to be reproduced in the required combinations those needing "national language" representations developed various means. For example, selection of character sets has been done in the personal computer arena for a number of years by using the notion of code pages. Various international groups, notably the International Standards Organization, have developed and standardized two-character identifiers for country, for example, "CA" for Canada and "US" for the United States of America. Identifiers for language have also been defined, for example "fr" for French. Since many languages are spoken in more than one territory and many territories include more than one language, these have been combined to yield, for example, an identifier for French in Canada "fr.sub.-- CA", or for Spanish in Chile "es.sub.-- CL". Meanwhile some combinations are unique; for example, Afghanistan uses only one official language and is identified as "ps.sub.-- AF". Other information has also been standardized, for example currency. Moreover, character sets vary immensely as well, and some accents used in some countries do not appear in the same language as written in other countries. For example, Canadian French uses accents on capital letters, whereas French in France does not. Normally these differences are accommodated by the Coded Character Set Identifier (CCSD). In the personal computer world, the CCSID is represented by a decimal code that can be up to five decimal digits; for example Code 437 is the coded character set most regularly used in the USA, and 850 is frequently used internationally.
With increasing internationalization of computer applications, there is a need to represent all of the unique instances in the files being transferred to the computers that are processing the information for presentation to users and for printing, in different territories and with different languages and character sets.
Operating systems currently in use in the computer industry utilize many diverse file naming systems of varying degrees of restriction, for example, Unix.RTM., X/Open.TM., OS/2.RTM., and DOS. The most restrictive of these, DOS, uses a file naming convention having eight primary characters and three extension characters, which are not case-sensitive. Locale names cannot be readily shared across these file systems because of their unique naming support capabilities. Such locale names have been comprised of the language, territory or country and character encoding identifier, resulting in a text string of varying length, frequently eight or more characters.
There is no system known to date that automatically converts input text and data into output that recognizes not only the national language preference of a user, but also the territory and the encoding to be used for the graphic character data. Thus there remains a need to define unambiguous names for the various locales that will be implemented across different platforms. In order to provide these definitions, a scheme which can accommodate the needs of the users, the systems and the file systems is required. The current industry-accepted manner of specifying the national language preferences of the user is the announcement and definition mechanism provided by the "locale". To date, standards put forward by the International Standards Organization (ISO) have been used for language and territory, but no appropriate scheme has been standardized for the graphic character data. ISO has also standardized on coded character set IDs for a number of years, for example ISO 10646 denotes Unicode. The practice of using mixed case alphabetic letters to identify the language and country without any form of precise encoding identifier as a means of differentiating the encoding that is supported, has led to confusion as to the content of the locale and has also hindered the understanding of the specific properties of the locale. It would also be desirable to incorporate the complete set of identities, language, territory or country, and graphic character identifier, into a single token, thus unambiguously identifying all the variables that are required for any particular implementation of language, territory and character encoding.