A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
For software publishers, overseas markets comprise an ever-growing percentage of revenues for all major PC applications. Traditionally, however, software products have been designed with little or no thought toward portability, let alone translating software products for overseas markets. As non-English speaking countries are buying more and more software from U.S. publishers, there is keen interest in improving the process of enabling or xe2x80x9cinternationalizationxe2x80x9d, that is, designing and coding a software product so that it can be made to function for international use.
In the past, the process of providing National Language Support (i.e., accommodating a specific country""s language, conventions, and culture) was done on a more or less ad hoc basisxe2x80x94essentially retrofitting software to accommodate a particular locale. Merely separating the text in a user interface from one""s program is not an acceptable solution, however. Even after translating software prompts, help messages, and other textual information to the target languages, one still has to address basic issues of displaying and printing characters in the target language.
For instance, a target language will often include characters which are not defined by the default character set provided by the computer""s operating system. IBM-compatible PCs running MS-DOS, for example, can display and print up to 256 different characters, the first 128 characters of which include the well-known 7-bit ASCII character set. This, of course, is not enough characters to support all languages. Some languages will obviously require a different character set; thus, sufficient means must be provided for switching character sets.
Other issues to consider when developing a system for foreign users include keyboard layout and various format conventions applicable for a particular country. Any use of currency, date, time, and the like within one""s software must take into account these factors. For example, keyboards sold for European languages must include additional characters, such as letters with diacritics, and symbols, such as the British pound (£) sign.
Another potentially serious problem for localizing a program is the set of assumptions with which the underlying source code for the program was written. Assumptions made by English-speaking programmers, which were quite valid for the once-ubiquitous ASCII character set, often break down when dealing with a foreign language. For instance, the common programming technique of converting a character to uppercase by simply adding the number 32 to the character (numeric code) is often inappropriate for non-ASCII characters. Similarly, one cannot rely on standard C functions either. For instance, one cannot use simple string comparison functions like the C programming language""s strcmp( ) function. Does an xe2x80x9cxc3xa3xe2x80x9d (i.e., an xe2x80x9caxe2x80x9d with a diacritic) sort before or after a normal xe2x80x9caxe2x80x9d?
One of the first serious attempts at providing National Language Support (NLS) for PCs was Microsoft""s MS-DOS version 3.3. Since MS-DOS accommodates different sets of 256 characters for displaying and printing text, one may employ different characters by swapping in new character sets. Each such character set is referred to as a xe2x80x9ccode pagexe2x80x9d; the code page in use at any given time is called the xe2x80x9cactive code page.xe2x80x9d When installing operating system software, typically, a user may select a code page appropriate for his or her national language.
MS-DOS also includes an API (Application Programming Interface) having a variety of functions related to internationalization. Included are functions for inspecting code pages for determining and controlling how the keyboard, display, and printer handle characters. The API include functions, for instance, for inspecting and changing the current country code and obtaining information about the conventions associated with a current country code (e.g., how to display dates, currency, and the like).
Newer versions of MS-DOS also include support for character comparisons, through use of language-independent tables for sorting strings. Still, this is by no means a complete solution to the problem. Arabic languages, for instance, remain problematic. For one, Arabic is read and written right-to-left, not left-to-right. Also, Arabic characters require contextual analysis in order to determine which of four different shapes the Arabic characters should have (depending upon location in a word or phrase). Thus, a language may have its own special set of problems which must be addressed before international use.
To date, efforts at localization have been largely limited to ensuring that a particular program, such as an operating system or application software, is itself enabled for a particular country. When installing Microsoft(copyright) Windows, for instance, a user is asked to select a country from a list of supported countries. Windows, in turn, installs various keyboard, display, and print drivers appropriate for the selected country. This xe2x80x9cprogram centricxe2x80x9d approach is only a partial solution, however.
Consider the scenario of a corporation based in the U.S. receiving sales information from several foreign subsidiaries. Typically, such information would be transmitted as data files, such as spreadsheet or database files. In this instance, the information management system in the U.S. may be required to process data files created from a variety of foreign data processing systems, ones having character sets and conventions peculiar to a particular country. converting such data files from one language to another inevitably results in the loss of language-specific information. Once converted, the information cannot be processed (e.g., adding and deleting information records, generating reports, and the like) and then simply reconverted back to its original form. Moreover, should that information be inappropriately processed (e.g., sorting German information according to an English sort order), valuable data may be corrupted.
One approach to averting this problem is to agree, in advance, on a single data format (e.g., code page 437xe2x80x94the variant used in the United States and many European countries) to be used by all foreign offices of the corporation. However, this solution invites another problem: the foreign offices must forego their own National Language Support, thus compromising their own data processing needs all for the convenience of the U.S. office. And even with such an approach, the risk remains that an office may inadvertently mix data from its locale with the agreed-upon format, leading to corruption or loss of data. Needless to say, the approach is undesirable at best.
System and methods are needed which allow users of computer systems to create and freely exchange data files, irrespective of National Language Support requirements. In particular, such a system would permit a user to create an information file in his or her own locale without regard to the requirements of other systems which may need access to the very same data from that file. The present invention fulfills this and other needs.
The present invention comprises a National Language Support including a language configurator, for processing data objects in a manner which is appropriate for the language configuration of each object. The language configurator provides necessary support for a data object (which typically stores information in a particular language) so that the data object may be appropriately processed by the system.
The system of the present invention continually checks and maintains correct language configuration. A descriptor or Language Driver Identifier (LDID) (e.g., in the form of a system-comparable unit) is employed for storing in desired location(s) of a data object information specifying the language driver that was in use when the data object was created or modified. The LDID, which may be in the form of an ID byte, references a set of language driver values (e.g., lookup table of locales). This allows the system of the present invention to intelligently process data objects created or modified under one language driver with those created or modified by a different language driver. In the event of incompatibilities, the system provides error handling routines, including facilities for warning users of incompatible or otherwise illegal operations.
A data object is preferably constructed so that it embeds or stores the Language Driver Identifier directly within the object itself, so that the object is self-contained. In an exemplary construction of the data file, for instance, the file may include a header region for storing a Local Language Driver ID (xe2x80x9cLocal LDIDxe2x80x9d). This is followed by the actual information or data for the object.
The language configuration which the system currently operates under (i.e., during the current session) is also identified by the language configurator, which maintains an Active Language Driver ID (Active LDID) for referencing a Language Driver currently employed by the system (i.e., for the current session). In this manner, the Local LDID may be compared against the Active LDID, thus enabling the system to determine instances where the system is inappropriately configured for a data object about to be processed.
Actual language configuration is effected through one or more Language Drivers which, in turn, select the most appropriate language configuration table(s). Each driver is of a particular type (identified with an LDID value) and references an appropriate resource file and an appropriate character set or code page for achieving National Language Support.