1. Field of the Invention
The present invention relates generally to machine translation, and more specifically to a method and apparatus for generating new information to be added to a dictionary used in a machine translation system: for example, to a user-maintained translation pattern dictionary in a machine translation system that employs translation patterns.
2. Description of the Related Art
As the translation quality of machine translation systems improves, improvements and enhancements of the translation dictionaries in these systems have become essential. At present, therefore, the proliferation of machine translation systems is accompanied by a proliferation of specialized dictionaries and the like for translating documents in specific fields. To provide a user with the translation result he or she wants, however, adding specialized dictionaries does not suffice in cases in which the document to be translated includes its own special expressions that the machine translation system cannot analyze, or when the translations of words need to be adjusted at the individual user level.
Users of machine translation systems therefore conventionally pre-edit the documents input to a machine translation system so that the system can analyze them, and post-edit the translation result. If these pre-editing and post-editing tasks are independent of the machine translation system, however, they fail to have any effect on the machine translation process. If the same or similar source text appears repeatedly when a document is translated, the user must repeat the necessary editing tasks each time, making the editing work extremely tedious and troublesome.
Japanese Unexamined Patent Application Publication No. H6-119378 addresses this problem by proposing that the results of pre-editing and post-editing be incorporated into a dictionary. Specifically, it proposes a means of adjusting the translation algorithm of the machine translation system by using a source text and a model translation thereof, a pre-edited text and the machine translation result, or a source text and the post-edited machine translation result. In the last of these three cases, if the machine translation result and the post-edited result differ, a dictionary entry or a syntax rule is derived from the post-edited result and added to the existing word dictionary or syntax-rule dictionary used by the machine translation system. A syntax rule in this context is a pattern in which a notation indicating a text category such as ‘sentence’ or ‘phrase’ appears on the left, and a string of words constituting an object in the indicated category appears on the right.
In the basic scheme, the exact result of post-editing becomes the added dictionary or syntax-rule entry. For example, if a user post-edits a Japanese machine translation result to obtain sentence A below, the added pattern B will consist of the source sentence and sentence A.
Source Sentence:
The class has a black board.
Machine Translation:
Sono kyoshitsu wa, kuroi ita wo motteiru.
Post-Edited Sentence (A):
Sono kyoshitsu wa, kokuban wo motteiru.
Pattern (B)
[Sentence: The class has a black board.]
[Sentence: Sono kyoshitsu wa, kokuban wo motteiru.]
As a result, if a sentence such as “The class has two black boards” is encountered, it does not fit pattern B, so the desired translation of “black board” (‘kokuban’) cannot be obtained.
The above patent application also proposes a way to enhance the versatility of the added syntax rule. If there are several combinations of source sentences and model translated sentences, a pattern can be obtained from them by partial abstraction, on the basis of similarities between the source sentences and the model translated sentences. An example of such a partially abstracted rule is:
[Sentence: The class has $1 black board.]
[Sentence: Sono kyoshitsu wa, $1 kokuban wo motteiru.]
If this pattern is added, a correct translation of “The class has three black boards” can be obtained. “There is a black board in my class”, however, differs from “The class has three black boards” in the text both preceding and following “black board”, so when “There is a black board in my class” is translated, the correct translation of “black board” still is not obtained.
When dictionary entries and syntax rules are added in this way, based on entire source sentences, even though the adding of sentence entries uses up large amounts of memory, the rate of reuse of the added entries is low, so this entry method cannot be said to be efficient. As to the above method of making abstracted patterns, similar source sentences and translated model sentences are needed. When there are only a few translated model sentences, the probability that there will be similar translated model sentences is low, and abstraction is unlikely to be possible.
Therefore, there is a need for a still more efficient and versatile method and apparatus for deriving new information to be added to a dictionary used for machine translation, especially when relatively few model translated sentences are available.