1. Field of the Invention
The present invention relates generally to translating machines, and more specifically, to a translating machine capable of translating a document including markup signs for computer typesetting from one language into another language.
2. Description of the Related Art
Conventional translating machines in practical use include the following. A conventional translating machine inputs source language documents into a translation module initially using, for example, a keyboard under the control of a CPU (Central processing Unit). The translation module analyzes the input source language text utilizing a group of dictionaries (such as a basic dictionary stored in memory and a user dictionary prepared by user registration) and then produces a parsing tree from the analysis. Similarly, the parsing tree of the source language text is transformed into a parsing tree in a target language utilizing rules for transforming tree structures from a source language tree to a target language tree prestored in memory. An appropriate translation is given to each word, and then necessary additional parts are supplied to produce a final text in the target language.
In recent years, systems have been widely developed by which block copies for printing are produced utilizing small size computers. Therefore, additional information for printing (such as specifications for typesetting) is sometimes included in a document text. Such information includes information for designating a title, the font to be used, the size of the font, and the words to be employed as index entries.
These pieces of information are conventionally mixed into the text of the document to be processed in the form of markup signs. By including such markup signs in the document, the document can be automatically printed utilizing a format, a font, and a font size according to the markup information. When index entries are designated, the index can be readily produced by listing those words or groups of words attached to the text with such markup signs.
Markup languages have been developed as systems of markup signs. One example of such a language is the SGML (Standard Generalized Markup Language) established by the ISO (International Standardization Organization). SGML is used for designating a logical structure for a document such as chapters, paragraphs, and itemization. When a document produced in accordance with SGML is actually printed, a markup language is often used for more specifically deciding a format. One example of such a markup language is called TeX.
As the number of documents having designations for printing utilizing markup languages increased the demand for a technique for translating these documents into another language has also increased.
A document including markup signs as described above cannot be properly translated in a conventional translating machine. In some cases, the document cannot be translated at all. Alternatively, a mistranslation sometimes occurs because the markup signs are different from the source language included in the document. Conventionally, it was therefore necessary to manually check whether or not markup signs were included in an input text utilizing an editor or the like before inputting the text into a translating machine. Once all the markup signs were deleted one after another, the text could then be input into the translating machine. Accordingly, efficiency in translating a document including markup signs utilizing a conventional translating machine was very slow.
To overcome such disadvantages, a system for processing documents without consideration of non-language data (such as format information included in the document) is disclosed in Japanese Patent Laid-Open No. 4-259057. According to the system disclosed in this document, only language data is extracted from document data in which language and non-language data are mixed, and a prescribed editing processing is performed on the extracted language data. The language data edited by this editing processing is compared to the language data in the originally input document data for determining a corresponding relation between their positions. The language data of the input document data is replaced with the corresponding language data after the editing. This permits editing of document data in which the language data is mixed with format information by ignoring the presence of the non-language data.
However, various rules are necessary for determining the corresponding relationship between the edited language data and the input document data. One cannot immediately judge whether such rules are truly effective rules or not except by trial and error. And yet an effective corresponding rule is not necessarily present for every case. Employing such a rule mistakenly could even degrade the quality of an eventually obtained translated document.