1. Field of the Invention
This invention relates to a translating apparatus and a translating method, and particularly to a translating apparatus and a translating method for translating a first language sentence expressed in a first language into a second language sentence expressed in a second language using examples.
2. Description of the Related Art
Translating apparatuses for translating a first language sentence expressed in a first language into a second language sentence expressed in a second language can be generally divided into the three classes: rule-driven translation apparatuses, example-using translation apparatuses, and example-driven translation apparatuses.
FIG. 1 shows the construction of an example of a rule-driven translation apparatus. In FIG. 1, an inputting part 1 consists of for example a keyboard or a voice recognition device or a character recognition device; a first language sentence is inputted into it and is converted into a form such as text data and outputted. That is, when the inputting part 1 consists of a keyboard, a first language sentence is inputted by the keyboard being operated and test data corresponding to that operation is outputted. When the inputting part 1 is a voice recognition device, a first language sentence is inputted as voice and voice-recognized. Text data corresponding to the results of the voice recognition is then outputted. When the inputting part 1 is a character recognition device, a first language sentence written for example on paper is inputted (read) and this first language sentence is character-recognized. Text data corresponding to the results of this character recognition is then outputted.
The text data corresponding to the first language sentence outputted from the inputting part 1 is supplied to an analyzing part 31. In the analyzing part 31, the first language sentence from the inputting part 1 is language-processed (analyzed) on the basis of knowledge and rules relating to the first language, and the results of this analysis are supplied to a converting part 32. The converting part 32 converts the analysis results from the analyzing part 31 into an intermediate language sentence of a prescribed intermediate language on the basis of knowledge and rules relating to the first language and a second language, and outputs this to a generating part 33. The generating part 33 generates from the intermediate language sentence supplied from the converting part 32 a corresponding second language sentence, that is, a translation (second language sentence) consisting of the first language sentence translated into the second language, on the basis of knowledge and rules relating to the second language.
An outputting part 16 is made up of for example a display or a voice synthesizing device and a speaker or the like, and displays or outputs in a synthesized voice the second language sentence supplied from the generating part 33.
FIG. 2 shows the construction of an example of an example-using translating apparatus. Parts in the figure corresponding to parts in FIG. 1 have been given the same reference numerals and will not be described in the following. Apart from being provided with a collating part 41 and a replacing part 42, this example-using translating apparatus is of the same construction as the rule-driven translation apparatus of FIG. 1.
In this example-using translating apparatus, examples in sentence units expressed in a first language and corresponding translations consisting of the examples translated into a second language (hereinafter for convenience these examples and their corresponding translations will be called translation example data) are stored for example in the form of parameters, and in the collating part 41 the first language sentence outputted by the inputting part 1 is collated with the examples and any example matching the first language sentence is thereby detected. When there is an example which matches the first language sentence, the collating part 41 controls the replacing part 42 to replace the first language sentence with the translation corresponding to the example matching it. Accordingly, in the replacing part 42, the first language sentence is replaced with the example matching it and supplied to the outputting part 16.
When on the other hand there is no example which matches the first language sentence, the collating part 41 outputs the first language sentence to the analyzing part 31. Thereafter, in the analyzing part 31, the converting part 32 and the generating part 33, the same processing as in the case shown in FIG. 1 is carried out, and the second language sentence obtained as a result is supplied to the outputting part 16.
Details of this kind of example-using translating apparatus are disclosed for example in Japanese Unexamined Patent Publication No. H.6-68134.
Next, FIG. 3 shows the construction of an example of an example-driven translating apparatus. Parts in FIG. 3 the same as parts in FIG. 1 have been given the same reference numerals and will not be described in the following.
In this example-driven translating apparatus, a first language sentence outputted by the inputting part 1 is supplied to a converting part 51, and when the converting part 51 receives the first language sentence from the inputting part 1 it controls a searching part 52 to search for the example most similar to that first language sentence.
That is, translation example data is stored in sentence units in a translation example memory 53, and the searching part 52 first refers to the translation example memory 53 and searches for an example which matches the first language sentence. When it finds an example which matches the first language sentence, the searching part 52 outputs this example and its corresponding translation to the converting part 51. In this case, the converting part 51 supplies the translation from the searching part 52 unchanged to the outputting part 16 as a second language sentence.
When on the other hand it cannot find an example which matches the first language sentence, the searching part 52 successively reads the examples stored in the translation example memory 53 and supplies them to a similarity degree calculating part 54. The searching part 52 makes the similarity degree calculating part 54 calculate a similarity degree expressing the conceptual similarity (the similarity in meaning) between each of the examples and the first language sentence using external knowledge such as for example a thesaurus.
That is, a thesaurus wherein words are classified on the basis of their concepts in a tree structure is stored in a thesaurus memory part 55. In the thesaurus, nodes of the tree structure are equivalent to meaning concepts and so-called leaf parts are equivalent to words. Referring to this thesaurus, the similarity degree calculating part 54 calculates a degree of similarity between the first language sentence and the examples on the basis of the classes to which concepts common to words constituting the first language sentence and words constituting the examples belong. The searching part 52 then finds in the translation example memory 53 the example of which the similarity degree calculated by the similarity degree calculating part 54 is the highest and supplies the translation corresponding to that example to the converting part 51.
When the converting part 51 receives the translation from the searching part 52, it replaces those words of the translation which do not match (correspond with) words of the first language sentence with translations of those words and outputs this to the outputting part 16 as a second language sentence.
Details of this kind of example-driven translating apparatus are disclosed for example in Japanese Unexamined Patent Publication No. H.3-276367. Also, details of methods of calculating the degree of similarity between a first language sentence and an example are also disclosed in for example Japanese Unexamined Patent Publication No. H. 4-188276, Japanese Unexamined Patent Publication No. H. 6-274546 and Japanese Unexamined Patent Publication No. H.6-274548 as well as the aforementioned Japanese Unexamined Patent Publication No. H.3-276367.
However, in the kinds of translating apparatus described above, there have been the following problems.
That is, in rule-driven translation apparatuses there has been the problem that information required for translation is sometimes lost in the process of analysis of the first language sentence, and consequently results of translation of greetings and other set phrases have become word-for-word in tone. Also, when the first language sentence includes for example proverbs or other idiomatic expressions (idioms), translating these correctly has been difficult. Also, building and maintaining as a data base the knowledge and rules relating to the first language and the second language used in rule-driven translation apparatuses has not been easy.
With example-using translating apparatuses, on the other hand, because the first language sentence is translated using a translation corresponding to an example matching it, if greetings and other set phrases and idioms and so on are stored as examples, it is possible to obtain a second language sentence which is a natural expression. However, in an example-using translating apparatus, when no example matching the first language sentence is stored, because processing similar to that of a rule-driven translation apparatus is carried out after all, the kinds of problem mentioned above arise, and furthermore storing examples completely matching the character strings of all first language sentences that might be inputted has been problematic.
In an example-driven translating apparatus, because translation is carried out using examples whatever the nature of the first language sentence, the problems of the rule-driven translation apparatus do not arise, and also it is not necessary to store examples completely matching all possible first language sentences.
However, in an example-driven translating apparatus, when there is no example matching the first language sentence, the example most similar to it is found and those of the words constituting the corresponding translation which do not match words constituting the first language sentence are simply replaced with translations of those words and this becomes the translation result, i.e. the second language sentence. Therefore, in an example-driven translating apparatus, linguistic knowledge and rules are not reflected in a generalized form and consequently the quality of the translation (the accuracy of the translation and the naturalness of the translation result) can only be increased by increasing the number of examples stored in the translation example memory 53 (whereas the quality of the translations of a rule-driven translation apparatus can also be improved through the way in which knowledge and rules relating to the languages are described). Increasing the number of examples stored in the translation example memory 53 increases the time required for the processing of searching for an example matching or similar to the first language sentence.
The calculation of the similarity degree D (I,E) between a first language sentence I and an example phrase E in an example-driven translating apparatus, when the words constituting the first language sentence I are expressed i.sub.1, i.sub.2, . . . , i.sub.t it and the words constituting the example phrase E are expressed e.sub.1, e.sub.2, l l l , e.sub.t (where t represents the number of words constituting the first language sentence and the example respectively) has been carried out as shown in Exp. (1) by finding the distance in meaning (the conceptual distance) word-distance(i.sub.k,e.sub.k) between each word i.sub.k constituting the first language sentence I and the corresponding word e.sub.k in the example phrase E, assigning to it a weight.sub.k corresponding to the importance of the word e.sub.k, and obtaining the sum total of these weighted distances. ##EQU1##
With this kind of similarity degree calculating method there have been the following kinds of problem.
That is, for example when the meaning content of a certain example is spoken, although sometimes the example itself is spoken, in most cases a word such as the word constituting the subject in the example is omitted or extra words are inserted. With the calculation method described above, because it is presupposed that the first language sentence and the example are -made up of the same number of words, when the numbers of words in the two are different, it has been unclear whether or not it is possible to obtain a similarity degree correctly reflecting the similarity of the example to the first language sentence.
Also, reversely, even if the first language sentence and the example are made up of the same number of words, when their word orders are different, again it has been unclear whether or not it is possible to obtain a similarity degree correctly reflecting the similarity of the example to the first language sentence. That is, if the first language is a language wherein words can be lined up in a relatively free order like for example Japanese, when for example `RINGO O WATASHI WA TABETAI` (`an apple--I want to eat`) is inputted as the first language sentence, even if for example `WATASHI WA MIKAN O TABETAI` (`I want to eat an orange`) is stored as an example, because the `RINGO` (`apple`) and the `WATASHI` (`I`) of the first language sentence respectively correspond to the `WATASHI` (`I`) and the `MIKAN` (`orange`) of the example, recognizing that the two sentences are similar has been problematic.
Also, when for example the inputting part 1 consists of a voice recognition device, it sometimes happens that the speaker does not speak clearly and correct recognition is not achieved in the voice recognition device and text data wherein particles and the like are missing is outputted. That is, it sometimes happens for example that although the first language sentence intended by the speaker was `WATASHI WA, ANATA NO MIKAN GA TABETAI` (`I want to eat your orange`) the output of the inputting part 1 becomes `WATASHI, ANATA, MIKAN, TABETAI`(`I, you, orange, want to eat`). In this kind of case, when for example `WATASHI WA MIKAN O TABETAI` (`I want to eat an orange`) is stored as an example, the similarity degree should be calculated with the compound noun `ANATA NO MIKAN` (`your orange`), consisting of the `ANATA` and the `MIKAN` in the output of the inputting part 1 joined together, being made to correspond with the `MIKAN` (`orange`) in the example, but with a method using Exp. (1) this has been difficult.
To handle an input whose meaning content matches that of an example but which differs from the example as it were superficially, as described above, it is necessary to add examples corresponding to these superficial variations to the translation example memory 53, and as mentioned above this increases the time required for the searching process.
Also, the distance word-distance(i.sub.k,e.sub.k) between the word i.sub.k and the word e.sub.k has been determined on the basis of the class to which a concept common to the word i.sub.k and the word e.sub.k (the concept which belongs to the lowest class among the concepts including the word i.sub.k and the word e.sub.k) belongs in a thesaurus stored in the thesaurus memory part 55. That is, the thesaurus is constructed by for example making the class to which the largest concepts belong the 0th class and then classifying smaller concepts into 1st through 3rd progressively lower classes and classifying words by assigning them to the concepts in the 3rd class to which they correspond, and with reference to this kind of thesaurus it is detected which of the 0th through 3rd classes the lowest concept common to the word i.sub.k and the word e.sub.k belongs (hereinafter for convenience the C in Cth class will be referred to as the level). Then, according to whether that concept belongs to the 0th, the 1st, the 2nd or the 3rd class, the distance word-distance(i.sub.k,e.sub.k) is determined to be 0, 1/3, 2/3 or 1 respectively.
In this case, it is necessary that when the higher classes are seen from the words there are the same number of classes (in the case described above, the four classes of 0th through 3rd) above each word, and therefore the thesaurus as a whole has had to have an as it were regular structure.
Also, because the distance word-distance(i.sub.k,e.sub.k) between two words is determined on the basis of the level of the class to which the common concept of the two words belongs (because it is determined on the basis of which of the 0th through 3rd classes the class to which concept belongs is), the thesaurus must be so constructed that the distance between two words having a concept belonging to a certain class as their common concept and the distance between two words having a different concept belonging to the same class as their common concept is the same.
FIG. 16 shows an example of a conventional thesaurus wherein concepts of words are classified into three classes, 0th through 2nd. In the figure, the rectangles represent concepts and the circles represent words.
In this kind of thesaurus, the concept C1 common to the words Wa and Wb is different from the concept C2 common to the words Wc and Wd, but since the concepts C1 and C2 both belong to the 1st class, according to the method of calculating the distance word-distance(i.sub.k,e.sub.k) of the related art described above, the distance between the words Wa and Wb and the distance between the words Wc and Wd are the same. This means that the distance in meaning between concepts connected by lines is the same between any two concepts, and therefore it has been necessary for the thesaurus to be so constructed that this is the case.
However, constructing a thesaurus regularly and also so that the distance between concepts is the same, as described above, although it may be possible with a small-scale thesaurus, is very difficult in a thesaurus of any substantial size. That is, if the thesaurus is constructed regularly it is difficult to make the distance between concepts fixed, and if the thesaurus is constructed so that the distance between concepts is fixed it becomes difficult to maintain its regularity.
Therefore, a way of making possible high-quality translation using a non-regular thesaurus or a thesaurus wherein the distance between concepts is not fixed has been being sought.