1. Field of the Invention
This invention relates to an improvement of case-based machine translation (hereafter abbreviated to CBMT), a novel knowledge base for translation not restricted to CBMT, and a translation process using the translation acknowledge contained in said knowledge base.
2. Description of the Prior Art
Various machine translation systems have appeared as commercial products. Most are based on translation rules given by human beings (that is, they operate according to the principle of rule-based machine translation, hereafter abbreviated to RBMT). In order to make such a system, complicated cases must be collected, and the collection of such cases and the development of rules or dictionaries require a great deal of human labor. As a result, it is difficult to predict the side-effects among rules made by many persons, and the cost of maintenance is enormous. Nevertheless, it is difficult to fully cover all exceptional cases with rules. In addition, it is not certain that rules directed to general cases are always adequate.
In an attempt to overcome the drawbacks of rule-based systems, translation systems based on actual translation examples (translation cases) in lieu of rules have been proposed by many papers, including "A framework for mechanical translation between Japanese and English by the analogy principle" by M. Nagao, Artificial and Human Intelligence, ed. A. Elithorn and R. Baenrtji, pp. 173-180, North-Holland, 1984.
FIG. 19 shows an arrangement of a conventional CBMT system. As shown in the figure, in translation by conventional CBMT, a translation case base containing a huge number of translation cases and a thesaurus is prepared and accessed. As a simple example, the handling of "N ni V" in Japanese-to-English machine translation is discussed below. Assume that the case base has a case C in which preposition "in" was selected as a translation of the word "ni" in a Japanese sentence containing "ichigatsu ni (in January)" and "kuru (come)," and that a Japanese sentence Q that contains "shigatsu ni (in April)" and "kuru (come)" but that does not exist in the case base is input to the CBMT system. The system searches the translation case base for cases similar to Q in order to translate "N ni V". While using the thesaurus, the system computes the distances from Q of all cases in the translation case base. Since "ichigatsu (January)" and "shigatsu (April)" are included in the same conceptual category in the thesaurus, the system finds C to be the most similar case to the input Q, and translates the input "ni" as "in". In this manner, CBMT simulates the process of translation by a human being whereby a sentence that does not exist in the case base is translated by analogy with a known translation of a similar sentence. This is a remarkable system that ensures reliable translation and overcomes the limitations of RBMT without giving any rules, as long as reliable translation cases are collected.
3. Statement of Problems with the Prior Art
Conventional CBMT, however, involves the following problems. First, cases to be stored in the translation case base are collected arbitrarily. CBMT's advantage of being able to handle exceptions derives from this. As a result, however, the number of accumulated cases becomes enormous and many of the cases become redundant. For example, for "shigatsu ni" (in April) to be translated adequately, it is sufficient that one example of "ichigatsu ni (in January)" exists; nevertheless, many cases of "X gatsu ni" usually exist. Hence the system accesses the translation case base and tries to find the most similar case in a vast search space every time a sentence is to be translated. Moreover, to this end, the system sequentially calculates the distances from a word in the input sentence to the words in the cases, and obtains a case with the minimum distance (the best match). As a result, a number of cases must be checked. Therefore, the system is not efficient and takes along time to translate a passage.
One solution would categorize words in detail in order to compute distances efficiently; however, if the thesaurus is updated only to cope with specific translation patterns, it will be usable only for machine translation. Maintenance of a specialized thesaurus for machine translation relies on the users of the translation system, and requires a great deal of labor.
Since, of course, a huge translation case base must be maintained for translation, the problem of restricted storage resources of a translation system cannot be disregarded.