1. Statement of the Technical Field
The present invention relates to the internationalization of computer software, and more particularly, to testing multi-byte character handling in an application under test.
2. Description of the Related Art
Internationalizing computer software can be difficult and expensive. Yet, the internationalization of computer software can be critical to ensure the global success of computer software. In this regard, it has been estimated that worldwide business-to-business e-commerce will have grown to $30 billion by the early 21st century, while at the same time non-English speakers will constitute more than 50 percent of the world's online population. With more than half of the world's Internet users predicted to be non-native English speakers in the near future, going global is not merely a business advantage in the 21st century; it is a business imperative.
In the past, the process of accommodating a specific country's language, conventions, and culture was done on a more or less ad hoc basis—essentially retrofitting software to accommodate a particular locale. Merely separating the text in a user interface from one's program is not an acceptable solution, however. Even after translating software prompts, help messages, and other textual information to the target languages, one still has to address basic issues of displaying and printing characters in the target language.
Information interchange codes define character sets for national languages. The necessary symbols or characters are relatively few in number in most languages. English, for example, uses only 26 Roman letters, each of which has an upper case and a lower case representation, for 52 symbols. German requires the addition of only 7 symbols, allowing for three vowels receiving diacritics (both in upper case and lower case) and the Greek lower case beta symbol. In all it has been found sufficient to provide 256 bytes of 8 bits each to express all of these characters.
Unlike most languages, Chinese, Japanese, and Korean contain more than 256 characters. Traditional written Chinese utilizes in excess of 13,000 ideographs. Japanese utilizes between 3,000 and 8,000 ideographs (kanji characters) and several hundred other symbols for the numerical, hiragana and katakana characters. Conversion between interchange codes is further complicated by the fact that the ideograph sets for Japanese, Korean, Traditional Chinese and Simplified Chinese differ in content and size. To handle such large character bases, the interchange code sets for these languages use a double-byte of 16 bits for each character. This allows the expression of some 65,536 characters.
As more companies deploy software products world-wide, software testing must change to verify software products developed for deployment in non-English operating environments. To that end, the Global Verification Test (GVT) addresses the testing of software for international compatibility. GVT is a portion of the product functional verification test that addresses internationalization issues. GVT assures that software can run in non-US environments and after translation. The goal of GVT is to certify that a product is ready for world-wide distribution.
Some of the techniques utilized in GVT include verification through execution, pseudo translation environments and scanning. Verification through execution involves running the un-translated application to verify specific functional support such as bi-directional language support, Unicode character set support or multi-byte character set support for platforms that do not yet support Unicode. Third Party source scanning tools search source code for potential internationalization problems. Finally, pseudo translation tools incorporate several pseudo-languages and pseudo locales which disclose problems undetectable by code scanning tools such as data formatting, field expansion, column misalignment, and line truncation.
Notably, GVT ensures that text data having multi-byte characters can be input, handled and displayed without corruption. One of the most important international markets is the Far East in which many countries use text that requires multi-byte characters such as Japanese and Chinese. The testing of the ability of software to handle multi-byte character data currently requires that the testing personnel be able to read the language. This often can lead to expensive assignments as the normal functional tester is English speaking and cannot read foreign text. Pseudo translation tools have been developed that either use the full width ASCII equivalents of a multi-byte character, or that simply repeat a single Asian character and retain the English text as it was. These solutions only address the text that appears as part of a user interface and these solutions do not test the ability of the software under test to handle user data correctly.