1. Technical Field
This invention relates to automated machine translation, and more particularly to a method and system for machine translating text between source and destination languages.
2. Background Information
Throughout this application, various publications, patents and published patent applications are referred to by an identifying citation. The disclosures of the publications, patents and published patent applications referenced in this application are hereby incorporated by reference into the present disclosure.
Transfer-based translation systems have existed for more than 25 years as exemplified in Bennett and Slocum, 1985, (“The LRC Machine Translation System,” Computational Linguistics, Volume 11, Numbers 2-3, April-September”). Such systems perform translation in three steps. First, the source language text is parsed to determine its syntactic structure. Second, the parse tree is rearranged based on a set of syntactic rules to match the natural order of the destination language. Third, the individual source language words are translated into destination language words. The resultant word sequence is a destination language translation of the source language sentence. A limitation of systems such as this is that all of the knowledge is hand-coded in a complex set of rules and dictionaries, requiring considerable time and effort by computational linguists for each language pair. Furthermore, the rules may interact in unpredictable ways, sometimes preventing any translation from being produced and sometimes producing incorrect translations. It is difficult to control the interactions between rules so that all sentences produce translations and only correct translations are produced. These difficulties are commonly referred to as the problems of “coverage” and “overgeneration.”
A statistical approach to translation, referred to as a source-channel model, is described in Brown et al., 1995 (U.S. Pat. No. 5,477,451). (In standard descriptions of source-channel models, the terms source and destination are reversed from their usage in most of this document; “destination language” in the source-channel context refers to the language that is input to the translation device and “source language” is the language produced as output. In keeping with standard terminology, these terms are used in this reversed sense in discussing source-channel models, but this usage is restricted to source-channel models herein.) In this approach, knowledge is acquired automatically from examples of translated sentences, eliminating the need for hand-crafted rules and dictionaries. Furthermore, each possible translation is assigned a probability value. Therefore, there is no need to arrange rules so that only the correct translation is produced; multiple translations may be produced and the translation with e.g., the highest probability score is selected. Nevertheless, such systems have limitations. One limitation is that the channel model—which contains distortion parameters describing how word order differs between source and destination languages—does not to capture basic grammatical regularities. These regularities may be highly informative when transforming grammatical structures between languages (e.g. the transformation of an SVO (subject-verb-object order) language to a VSO (verb-subject-object order) language). A consequence of this limitation is that a large and computationally-expensive search is necessary to determine the ordering of words in the translated sentence. Furthermore, there is little assurance that the selected order is truly grammatical. A second limitation of these approaches involves the source model, which is typically an n-gram language model. Such models account only for local agreement between nearby words and provide no way to determine if entire phrases are grammatical. For those phrases that are indeed grammatical, there is no way to determine if the relationships expressed are plausible. Even if these approaches were combined, the limitations of the source and channel models yield a substantial likelihood of ungrammatical translations or of grammatical translations that are non-sensical. Such behavior may be particularly problematic when downstream automated systems attempt further processing of the translated texts (e.g. to extract a database of facts), since such systems often rely on being able to syntactically parse and analyze their inputs.
Yamada and Knight, 2003 (U.S. Patent Application 20030023423) describe a technique that attempts to overcome the aforementioned limitations by introducing an alternative channel model while retaining a source-channel formulation. Like previous work, this formulation includes a probability table that describes how words are translated from the source language to the destination language. In contrast to previous methods, this formulation assumes that the source is a syntactic parse tree rather than a simple sequence of words. A probability table is then introduced to model the possible permutations of tree nodes to account for word-order differences between languages. An additional probability table is introduced to model the insertion of destination language words that have no source language counterpart. A decoder based on this model is described in Yamada and Knight, 2002 (“A decoder for syntax-based MT,” Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics”). This decoder retains an n-gram language model that accounts only for local agreement between nearby words.
Other techniques depart entirely from the source-and-channel formulation of the above methods, instead taking the view that “translation is parsing” and defining a single-stage parsing model that describes the entire translation process. Examples of this approach include Wu, 1997 (“Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora,” Computational Linguistics, 23(3):377-403”), Alshawi, 2001 (U.S. Pat. No. 6,233,544), and Melamed, 2004 (“Statistical Machine Translation by Parsing,” Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics”).
A common characteristic of these approaches is the use of specialized grammar formalisms that synchronously express sentences in the source and destination languages: in the case of Wu, 1997, “Inversion Transduction Grammars”; in the case of Alshawi, 2001, “Collections of Head Transducers”; and in the case of Melamed, 2004, “Multitext Grammars”.
A need exists for a machine translation system and method that addresses the drawbacks of the prior art.