Generally described, localizing resources for computer systems during software development involves transforming source data corresponding to one market into target data corresponding to a different market. For example, localization can involve translating source data in one language into target data in another language. Localization can also involve transforming data between markets in the same language, such as transforming source data corresponding to Japanese for children into target data corresponding to Japanese for adults. A resource is generally defined as an item of data or code that can be used by more than one program or in more than one place in a program, such as a dialog box. One example of a resource is an error message string used to alert a computer user of an error condition. Additionally, the error message can contain one or more placeholders to be replaced with the value of the placeholder before the message is displayed.
Various assumptions can be associated with a resource. For example, the author of an error message such as “File <PH> not found”, where “<PH>” is an example of a placeholder to be replaced with the name of a file, may assume that the file name will be provided at a later time and that the reader of the message understands the meaning of the term “file.” To use the error message in various markets, it may need to be translated into several languages. In a typical development environment, a word-for-word translation may be used to localize a resource. However, the resulting translation may not capture contextual data associated with the resource. For example, a word in a resource, such as the word “file”, can have more than one meaning and thus the context in which the word is used is needed to generate a correct translation. Additionally, functional items, such as placeholders, need to provide functionality in target data that corresponds to the functionality provided in source data. For example, the “<PH>” in the example error message needs to function such that it is replaced with the name of a file in any transformation of the error message.
One approach to capturing contextual and functional information during localization involves comparing each individual assumption associated with the source resource against the target resource to ensure that the target resource complies with every assumption. For example, one assumption associated with a source resource can be that invalid characters are “*” and “\”. An additional assumption associated with the same resource can be that invalid characters are ‘%’ and ‘\’. To validate the target resource using these assumptions, a validation engine could first check that the target string does not contain either ‘*’ or ‘\’. Next, the validation engine could check that the target string does not contain ‘%’ and ‘\’. However, checking each individual assumption is not efficient. Further, individual assumptions may be incompatible with other individual assumptions or may be redundant.
Pseudo-localization of a resource can be used to ensure that assumptions are correctly captured so that they can be preserved in a target. The process of pseudo-localization typically involves generating a random pseudo-translation of a source string. The pseudo-translation can then be tested, in a process generally known as validation, to ensure that assumptions from the source string are preserved in the pseudo-translation. However, typical tools that perform pseudo-localization of a source string for testing purposes do not use the same validation techniques as tools used to validate target translations. Thus, localized software is not tested as thoroughly as would be possible if pseudo-localized resources were able to be validated in the same manner.