The present invention relates to a document processing apparatus and method which analyzes a tagged document, e.g., a Hyper Text Markup Language (HTML) document and forms another tagged document containing original sentences and translated sentences. The present invention also relates to a recording medium for recording such a tagged document.
With the recent proliferation of personal computers and communication apparatus, people have become able to use communication networks represented by the Internet, i.e., Internet protocol (IP) communication networks, and to easily obtain various sorts of information through the networks. Ordinarily, the World Wide Web in the Internet generally uses HTML as a language for describing information. Dynamic HTML (DHTML) and Extensible Markup Language (XML) are other languages presently used to form tagged documents.
Conventionally, to form home pages (also called Web pages) containing the same information described in different languages, e.g., English and Japanese, the process of separately forming each home page is required. That is, the steps of forming sentences separately in each language, pasting common images including graphs and figures, separately setting links from each of the English home page and the Japanese home page, etc., are required.
Home pages are open to the public on the Internet and can be read by people everywhere in the world. Therefore, people who set up their home pages for various readers to read are making home pages having the same format and contents but having sentences written in different languages.
The English and Japanese home pages formed as described above need scrupulous attention for maintenance because their English and Japanese sentences after correction must be equivalent in meaning and format.
To form such English and Japanese home pages, it is necessary to make and manage two kinds of HTML document files for the English home page and the Japanese home page. That is, the number of files to be managed is increased in proportion to the number of languages, and management and maintenance becomes difficult.
A user who wishes to read document information in an English home page in Japanese may translate the English document information by using Internet translation software. However, if the user wishes to edit the results of the translation displayed as an HTML document, he or she must give up the idea of doing so or is further required to translate the translation results by different translation software, because direct editing of the translated HTML document is impossible.
If the user dares to edit the translation-result HTML document, he or she must perform the steps of storing the translation-result HTML document on a local disk, opening the HTML document file stored on the local disk by using HTML document editing software, displaying the HTML document source, directly editing the HTML document source, and storing the results of the editing on the local disk. This process enables editing of the translation results to some effect. However, it is difficult to edit a document in which HTML tags, original sentences, and translated sentences are mixed.
Further, in a case where an HTML document intended as an object of translation is prepared in advance and, from this document, another HTML document described in a different language is formed by translation processing using Internet translation software, a need may arise to edit the HTML document in the second language formed by the translation processing and, if necessary, the translation-object HTML document, if the author of the HTML document is not satisfied with the results of the translation.
In this editing, it is difficult to determine document portions to be edited and to confirm the correspondence between original and translated sentences, since the translation-object HTML document and the translation-result HTML document exist in separate files. It is also possible that, through editing, the page configuration (format) of one document will become different from that of the other.
As described above, the conventional HTML document processing apparatus can be designed to enable translation of an original home page on the Internet using Internet translation software and visual display of original and translated sentences in a juxtaposed form. However, in editing translation results, an HTML document itself cannot be edited. There is a way to directly edit the HTML document source, but editing in such a way is extremely troublesome and not satisfactorily effective.