1. Field of the Invention
The present invention relates to a machine translation system, method and program.
2. Description of the Related Art
In a machine translation system in which sentences written in a first language (source language) are translated into those written in a second language (target language) using a computer, firstly, an input sentence is divided into predetermined translation units (such as words and phrases) by morpheme analysis or sentence structure analysis. Subsequently, a translation dictionary is searched in units of processing to determine a translation rule to be applied, thereby determining corresponding translation words (phrase). The determined words (phrases) are connected in accordance with a predetermined translation rule, thereby acquiring a translation corresponding to the input sentence.
In a document having a structure, such as a table structure or enumerated structure in which words or sentences are arranged in order, if the table structure or enumerated structure is translated, a word or sentence in each cell of the table, or an enumerated word or sentence is extracted and input for translation to such a machine translation system as the above.
Sentences, such as sentences in cells of a table, or enumerated ones, which are regularly arranged as structural elements, may well have no grammatically correct structures or may well be very short. Accordingly, the above-mentioned method, in which sentences are extracted one by one from a table structure or enumerated structure and subjected to machine translation, does not provide much information that can be used as a key to analysis for translation or to selection of one from possible translations, resulting in degradation of translation accuracy.
When translating standard sentences, co-occurrence information in a sentence or in context is utilized (see, for example, Jpn. Pat. Appln. KOKAI Publication No. 3-175573). However, if this method is directly applied to translation of data of a table structure or enumerated structure, it is difficult to output stable translation results since the manner of co-occurrence may vary depending upon the arrangement of sentences in the structure.
Specifically, in the case of, for example, an enumerated structure of   Japanese characters  and  belong to a category of “order”. Therefore, it is desirable that Japanese characters  and  be translated into numerals (1), (2) and (3), or alphabets (a), (b) and (c), respectively.  are Japanese words.  means Monday (Getsu) or moon (Tsuki),  means Tuesday (Ka) or fire (Hi), and  means Wednesday (Sui) or water (Mizu). There is a similar enumerated structure of   In this case, Japanese characters  and  also belong to the category of “order”. In this case, it is desirable that Japanese characters  and  be translated into numerals (1), (2) and (3), or alphabets (a), (b) and (c), respectively. Thus, in the former case, Japanese character  should be translated into (1) or (a), whereas in the latter case, the same Japanese character  should be translated into (2) or (b). Moreover, there may be even a case where enumerated structures are included in a nested structure, in which two or more ambiguous characters, such as the above-mentioned character  may well appear. In the conventional translation using co-occurrence information, translation cannot be performed in consideration of the rule of an enumerated structure, under which the characters are arranged regularly. Accordingly, characters, which are ambiguous like the above-mentioned character  may be translated wrongly, which degrades the quality of translation. Further, the set of Japanese characters (words)  (Getsu),  (Ka),  (Sui) . . . included in the enumerated structure indicates that these words actually belong to a category of “a day of the week”. Accordingly, these words should be translated into “Monday”, “Tuesday”, “Wednesday”, . . . , respectively. However, Japanese words  also mean “moon”, “fire”, “water”, . . . , respectively. In the prior art translation technique in which translation cannot be performed in consideration of the rule of such an enumerated structure as the above, under which the words are arranged regularly, words  . . . may well be translated into such standard (default) translation words as “moon”, “fire”, “water”, . . . , respectively.
The same can be said of a table structure. Specifically, when in the table structure, cells that store Japanese words  exist at the index rows (columns) of each column (row), these words should be translated into “Monday”, “Tuesday”, “Wednesday”, . . . since the set of words  indicates that these words belong to the category of “day of the week”. In the prior art technique, however, words  may well be translated into such standard (default) translation words as “moon”, “fire”, “water”, . . . , respectively, as in the case of the enumerated structure, because translation cannot be performed in consideration of the rule of the row (column) direction of the table structure, under which the words are arranged regularly. If the translation method utilizing co-occurrence information is applied to the translation of a table, it is possible that the translation of a certain cell may be influenced by other cells in the table having a low degree of relationship with respect to the certain cell (e.g., the cells located obliquely above or below). Thus, stable translation is still difficult.
As described above, the prior art techniques cannot accurately translate a document having a table structure or enumerated structure in which words or sentences are regularly arranged.