Software products, such as applications and operating systems, are often provided in many different language versions. The process of converting a software product from the initial language it was written in to other languages is known as ‘localisation’. Typically the localisation is done by translating all the string elements within the user interface (UI) of the product and any other language specific parts (e.g. hotkeys, coordinates, sizes) and then re-building the product to produce the language specific version. This localised product then requires extensive testing before it can be shipped to a customer. This is very expensive and results in slow delivery of localised versions of software.
A different localisation method has been developed in which localisation occurs at run-time. In this method, the base product (e.g. the original English version) is loaded and the translated resources are inserted by a resource interceptor which obtains them from a language specific glossary file. This is done in a way that the application is unaware of. Although this method may not translate the entire application, it provides a less-labour intensive and less expensive way of localising software products because it removes the need to build and test. It also enables third parties to create new language glossaries for use with a software product which can result in the product being localised into many additional languages.
In order for the resource interceptor to be able to translate the resources within an application, the resource interceptor must be able to identify the translatable strings within the resources. This is trivial for structured resources, such as a WIN32 dialog box or a WIN32 string table, because they have a structure which is defined, (e.g. in a standard), and so it is easy to find the various resources and modify/replace them. However, not all resources are structured. Unstructured (or stream) resources are textual resources which have no predefined structure and which are stored inside files or streams. Typically such resources are able to be viewed/edited using a text editor program, such as Microsoft (trade mark) Notepad. Examples of unstructured resources include HTML files (which comprise strings, tags and other formatting characters), Java Script, INI files, Registry files, Cascading Style Sheets (CSS) and XML files. There are a very large number of schemas which are used to write these unstructured resources and knowledge of the correct schema is required to identify translatable strings within the unstructured resource. Even within a particular resource type, there may be many different ways that strings and other localisation data (such as hotkeys, sizes and coordinates) may be identified, for example:
1st HTML sample:<P ID=Hello>Hello</P>2nd HTML sample:<P><!—ID=ID_Hello-->Hello<!—end--></P>1st Java Script sample:document.write(“Hello”)2nd Java Script sample:L_Hello_Message = (“Hello”)document.write(L_Hello_Message)In order for the resource loader to be able to identify the translatable string (“Hello”) in each of these examples, it must know exactly how the translatable strings have been identified in each case. As there is no defined structure, it is not feasible for the resource loader to know every possible schema, format, external configuration data and rule because there are an infinite number of these. Even if the resource loader did know the particular schema, format and rules used in a particular situation, parsing will still be slow and in many applications this would be unacceptable (e.g. if performed during resource loads in a running application). Furthermore, the identifier information (“ID=Hello” and “<!-ID=ID_Hello--> . . . <!-end-->” in the two HTML examples above) may be removed when the file (e.g. the HTML file) is built in order to optimise the file size and enable the applications loading the files to do so more efficiently. Alternatively (or in addition) the identifier information may be removed for confidentiality reasons (e.g. the commenting of a file may be confidential) or to create a valid file structure (e.g. the HTML may be invalid until the flags are removed). In these cases the unstructured resource may contain no information which identifies the translatable resources, for example:
3rd HTML sample: <P>Hello</P>
This means that it is impossible for the resource loader to identify the strings within the unstructured resource.
In addition to identifying the location of strings within an unstructured resource, it may also be necessary to determine the unique identifier for each string for use in cross-referencing against other data (e.g. against translations in a glossary).
The invention seeks to provide methods of parsing unstructured resources that mitigates problems of known parsing methods and also to provide improved parsing tools.