(1) Field of the Invention
This invention pertains to a data processing apparatus which translates text data written in one language into another language or converts text data written in one dialect into text of another dialect in the same language, or, more precisely, pertains to an apparatus which conducts translation/conversion while maintaining the display properties attached to the text data before translation/conversion.
(2) Related Art
It is becoming more and more common these days to send and receive text data to and from abroad via international communication networks like the Internet. Text data is normally made up of tag symbols and the text body without tag symbols. Tag symbols are composed of start tags and end tags. Start tags are formed by entering the tag name between a "&lt;" and a "&gt;", while end tags have a "/" before the tag name. For example, HTML uses B for bold, I for italic, and U for underline. Also, in text data received from the Internet, anchor tags can be used as start tags to show a pointer to another file. Anchor tags are written in the format &lt;A HREF="link destination text"&gt;.
When text data is received from abroad through the Internet, the text body written in a foreign language must be translated into one's native language. Machine translation software is used for this.
Japanese Laid-open Patent Application #6-44296 is a well-known conventional machine translation apparatus. This conventional apparatus is made of a separation unit which separates the text data received from the Internet into the text body and the tag symbols, a memory unit which stores the tag symbol and associates the symbol with its accompanying word, a dictionary lookup/morphological analysis unit which conducts dictionary lookups and morphological analyses on the text body, a syntactic analysis unit which conducts syntactic analyses on the text body after morphological analysis, a conversion unit which converts the result of syntactic analysis and generates a parsing tree of the target language, and a translation text generation unit which refers to the contents of the memory unit in order to generate a translated text in the target language based on the parsing tree of the target language, with tags inserted.
However, there is a drawback with the conventional machine translation apparatus. The apparatus attaches tag symbols to the target language word Corresponding to the word with tag symbols in the source language. As long as the display properties of the source language are the same as those of the target language, there is no incongruity. But often there are times when the tag symbol is attached to only some of the letters in a word. When this happens the display properties attached to text data in the source language are ignored in the text date of the target language, and therefore fails to be displayed. For example, when the text data of the source language is "I&lt;B&gt;h&lt;/B&gt;ave a pen.", the tag symbols in text data of the target language are dropped, so that the text body becomes without the tag symbols. This results in an unnatural translation.
Also, text data received from the Internet contains pointers as anchor tags which display links to other files. If the anchor tag is attached to only some of the letters in the word, then the tag symbols are dropped in the document data of the translation, so one cannot move to the link destination file using the translated text.
Although the above problem appears when translating from one language to another, a similar problem may appear when converting one dialect into another dialect of the same language.