The present invention relates generally to network communication systems, and more particularly, to localizing information such as Web pages obtained over the Internet.
As its name implies, the World Wide Web is accessed by people from all over the world. Not long ago, almost all of the Web was in English because the United States was far ahead of the rest of the world when it came to on-line communications. Now, the rest of the world is quickly catching up. Japan, Germany, and China all have a large number of users, and the Spanish speaking population on the Web is increasing rapidly. The Web is becoming multicultural and multilingual and Web sites are becoming available in the native languages of many different regions of the world. This requires that the Web pages available on a company's Web site be localized so that people who speak different languages are able to read and understand the Web pages. Localization is the process of altering a Web page or other information or program so that it is appropriate for the area in which it is used. Localization may include, for example, the translation of strings and content. Text must be translated to the local language and things such as addresses, money formats, number formats, time formats, and date formats should be modified to conform to regional conventions.
Web pages are stored on Web servers on the Internet. Users request Web pages using HTTP (HyperText Transfer Protocol). HTTP provides users access to files which include text, graphics, and images using a language known as HTML (HyperText Markup Language). Web pages are typically accessed using an HTML compatible browser such as Netscape Navigator or Internet Explorer which specify a link to the Web server and specific Web page using a URL (Uniform Resource Locator).
Static HTML pages are relatively simple to understand and translate. These pages include HTML tags which may be used to identify plain text or attribute values which can then be translated. ASP (Active Server Pages) files, however, contain a large amount of scripting (e.g., VBScript or JavaScript) and localizable strings are often embedded in the scripting. It is often difficult to distinguish between localizable strings and functional strings in ASP pages or HTML pages with scripting. One option is to create a separate version of an HTML or ASP file for each language. However, if the scripts change over time, multiple files will need to changed, which can be time-consuming and error-prone. Another option is to conditionally include text for different languages. However, this significantly increases the size of the HTML or ASP file and makes the file harder to maintain. Yet another option is to pull strings from the file and put them into a resource dynamic link library (DLL). A translator tool is then used to localize the contents of the resource database to a desired language. One drawback to this method is that the resource DLLs need to be recompiled and are harder to deploy than HTML pages. Furthermore, the compiled translation may affect the functioning of the Web page. Thus, the recompiled code must be completely retested for even minor changes to insure that it functions properly. This adds considerable time and cost to the localization of Web pages.
Globally competitive companies need to provide versions of Web pages that are compatible with the requirements of each country in which they wish to compete. Companies often update their Web pages to provide information on new products or services. Any delays in providing a conformable Web page can reduce the market share in that particular country. It is thus, important to localize Web pages quickly and in an economical and efficient way.
There is, therefore, a need for a method and system for converting HTML and ASP files into a form that can be easily localized.