1. Field of the Invention
The invention relates to the development and testing of software to be deployed internationally and more particularly to the development and testing of software for languages requiring a multibyte representation for characters.
2. Description of Related Art
As computer hardware and software vendors expand their markets to Europe and the Far East, they are required to modify the related operating system and applications software to accommodate the language, customs and culture of the individual target markets. Many of these companies now achieve half of their total revenues from such markets.
The development and deployment of international software is discussed in a book entitled Developing and Localizing International Software, by Tom Madell et al. published in 1994 by Prentis-Hall, Inc., of Englewood Cliffs, N.J.
Much software is designed and programmed taking into consideration only the needs of users in the particular country where it is developed. As a result, other international users of the software are forced to struggle with the language of development and its corresponding cultural representations in order to use the software or, software designers or engineers must redesign and recompile the software to create a new version for each unique language and local environment that uses it.
An approach that enhances software for worldwide distribution uses internationalization and localization. Internationalization is sometimes referred to as I18N, for the eighteen letters between the I and the N in "internationalization," and is a process of configuring a program to make localization easy. Similarly, localization is sometimes referred to as L10N and generally involves more than merely converting languages of messages and displays.
Developers for a worldwide audience must enable computer systems to read and write in the user's native language, that is, to understand and display characters and symbols that may be far different from the characters set of a single byte language such as American English. Further, the computer systems must be enabled to have the capability to process the characters and text according to the rules of the user's language. Many languages have characters in excess of the twenty-six characters of the English language set. Software to be used internationally must provide flexibility to modify output conventions to comply with customary local requirements for representations of currency, numeric data or time. Such software should also provide the ability to allow for the translation of interfaces, messages and prompts without necessitating many different language versions of the underlying software.
I18N then, is the process of building in the capabilities which facilitate adaptation to different countries and locales during the development or modification process.
While I18N is usually a process performed during development of the code, localization, or L10N, is most often carried out subsequent to development, often in the foreign location where the software will be utilized. L10N is a process of actually adapting the potentially useful internationalized software to meet the needs of one or more users in a particular geographical area. It includes not only the translation of messages but also the selection or creation of appropriate language tables containing the relevant local data for use on a given system. Localization activities are usually performed by the software manufacturer or its representative in a particular locale.
Much software has been developed for use with terminals that generate 128 ASCII characters which can be represented by 7 bits of a single byte. Some other languages require 256 characters which require all 8 bits of a single byte. Such languages can be described as single byte languages. Some code sets for other languages, such as some Asian languages, contain thousands of characters and require more than a single byte. These can be described as multibyte languages.
Thus, localization must be possible for multibyte languages. This resulted in the development of worldwide portability interfaces (WPI) as defined by X/OPEN in standard XPG4. To internationalize a program, developers no longer deal with character data in a language-sensitive way; nor do they need knowledge of any foreign language. They do not even need to be aware of the ways different languages and code sets can vary. This makes the programming effort easier and provides for consistent treatment across languages.
A central concept of internationalization and localization is that a developer should be able to create a single internationalized application, which is capable of being extended into any number of localized programs without the need for redesign or recompilation.
To enable this, an internationalization model includes three parts. Namely, a language independent program, message catalogs and language tables. FIG. 1 illustrates a model of internationalized software. A language independent program 100 achieves language independence by programmatic calls to a message catalog 110 and to language table 120. Rather than hard-coding messages such as prompts and error messages within the program itself, such messages are stored in external message catalogs with a different version of those catalogs for each supported language. Language tables contain all language-specific processing information and conventions unique to a particular locale, such as how characters are sorted and how output (such as numbers, times and dates) is formatted. At run time, generally in a development environment the program selects or "binds" a specific language table according to settings controlled by the user, the application developer, or system administrator. Thus, the same basic program 100 can be executed in different language "locales" by simply binding the appropriate message catalog and language table to the program at run time. The term "locale" will be utilized to refer to the language table component of an internationalized application.
Use of the internationalization model of FIG. 1 provides several advantages. First, software does not need to be recoded in different versions for different languages in order to localize it. As a result, only one version needs to be updated and maintained as well as manufactured, stocked and shipped.
Because all language-dependent information is kept external to the program source, neither programmers nor translators need to modify the program source code in order to localize program language behavior. As a result, the possibility of introducing bugs into the main or core program during localization of the software is eliminated. Instead of having to work with a huge amount of source code, translators can work from a discrete message file containing only the text to translate. This frees them from the need to have programming skills.
Since software can be localized more easily and inexpensively, such software should become more readily available and, as a result, fewer end users will be forced to deal with difficult foreign language representations.
By using external language tables, the structural and processing rules of each language are consolidated into one physical location which can be modified to meet even more specific local requirements. Once installed, such language tables can support other internationalized programs on the system.
Different cultures and countries have different rules for punctuation, word order, sorting, the order of items and addresses, currency measures and conversions, number formats and other local idiosyncracies. Many native languages and customs have different meanings for certain symbols used as computer icons as well as colors which may be used to indicate some special meaning.
Localization of a computer product from one locale to another to accommodate such differences more specifically involves:
1. Translation of software documentation into the new language; PA1 2. Translation of the textual messages embedded in the software into the new language; PA1 3. Incorporation of additional software facilities to make input and output of the new language and perhaps new characters possible; PA1 4. Adapting the software to accommodate the customs and conventions of the new locale; and PA1 5. Testing and assurance that the modified product works as intended in the new locale.
This process of localization is very labor intensive and requires people who know the native language of the new country as well as the basics of computer program architecture and construction.
FIG. 2 illustrates an internationalized program which has been localized into a plurality of languages. Program 200 has been internationalized, that is, written with the appropriate hooks so that a particular message catalog 210 and a corresponding language table 220 can either be bound to the program 200 at run time or selected by virtue of a software switch. Note that languages such as Korean (220C) and Japanese (220I) require a character set which cannot be represented with a single byte of data. A language table which requires that each character be represented by more than one byte is called a multibyte locale. Typically, a two byte representation or a wide character is utilized for each character in a multi-byte locale. Languages such as French (220A) and German (220B) are single byte languages which have a character set which can be represented in a single 8 bit byte. American English is also a single byte language which can be represented in 7 bits of an 8 bit byte and such a representation is referred to as USASCII.
FIG. 3 illustrates development of an internationalized computer program in the U.S.A. As the internationalized computer program 300 is developed, a catalog of English messages is concurrently developed (310). If a language table or locale 320 which reflects the customs of the locale of development, namely, the United States, has been developed, there is no need to repeat the development. Only USASCII need be supported and the time representations, dates, currency formats, sort order and the like are those in use in the U.S.A. A set of software development tools 340, permit the development, debugging and compiling of the I18N software 300 and the creation of the message catalog 310 and the USASCII English locale 320. In their simplest form, the development tools would include text editor for creating source code, message catalogs and locales, and a compiler.
FIG. 4 illustrates life cycle development and testing of an internationalized computer program developed in the United States. The development of an I18N program, English message catalog in U.S. ASCII English locale (400) proceeds concurrently as shown in FIG. 3. At various stages during development, the program is tested in its native locale (410). If bugs are found, they are corrected in the development phase 400. The local testing step 410 encompasses both informal testing done by the developer and formal testing as the result of a release. There may be several iterations of development testing and changes (400, 410) until the development is considered sufficiently stable to be passed to a localization team for localization to a non-U.S. locale (420). Development of the localized version 420 and testing of the localized version 430 may result in the discovery of software bugs unique to the localization process. These will be corrected by the localization team (420). However, such testing may also reveal problems with the development of the I18N core program itself. Such problems must, therefore, be referred back to the developers for correction (400).
The Problems
The development process outlined above has several deficiencies. First, many software bugs result when attempting to localize an internationalized software program to a multi-byte locale. These are not identified until testing of the localized version and as a result, feedback to developers occurs long after the introduction of the error and after the time and effort has been expended to release the software to a localization team. Such late identification of errors greatly increases the cost of correction of the software.
Further, enhancements of functionality and incorporation of engineering change orders into the software cannot be tested in the multi-byte version until release of the software to the localization team. This, too, increases the cost of development and maintenance of the software. There is thus a need for improving the development and testing process of internationalized software.