The use of international character sets in data or text translation, for example, needs to be considered very carefully to make sure that the results of the translation appear to the translator or recipient as intended. In an ecommerce application, for example, a description of a commercial product in an English website might include English text in the body of the product description. The text and accompanying labels, commentary and so forth may call for translation if the website is ever internationalized for use in other global markets, for example. It is very likely that the translations of the product description may call for names, words or characters that contain letters from other languages. To complicate things further, international names are sometimes encoded using different character sets than the text of the original home language. Inconsistencies or inaccuracies in the translation and appearance of an internationalized website may frequently occur. Such errors can detract seriously from the general appeal and marketability of the product displayed in the internationalized website. A poor on-line appearance can easily translate into a strong perception of poor product quality and can significantly impact potential product sales. Accuracy in product description and overall visual appeal is thus very important in ecommerce applications, especially when global markets are targeted. It will be appreciated that leading, global ecommerce websites will have many millions of content documents associated with them. The detection of inconsistencies and inaccuracies in these files when translated can be enormously time consuming, inaccurate, and difficult to control. The present subject matter seeks to address these and other problems.