Machine translation (MT) is the automatic translation, for example, using a computer system, from a first language (e.g., French) into another language (e.g., English). Systems that perform MT techniques are said to “decode” the source language into the target language. From the end-user's perspective, the MT process is relatively straight-forward. As shown in FIG. 1A, the MT 102 receives as input a source sentence 100, for example, in French (e.g., “ce ne est pas juste”), and after processing the input sentence, outputs the equivalent decoded sentence in the target language—in this example, English (“it is not fair”).
One type of conventional MT decoder is the “stack decoder” such as described in U.S. Pat. No. 5,477,451 (Brown et al.), entitled “Method and System for Natural Language Translation.” In a stack decoder, the universe of possible translations are organized into a graph structure and then exhaustively searched until an optimal solution (translation) is found. Although stack decoders tend to produce good results, they do so at a significant cost—namely, maintaining and searching a large potential solution space such as used by stack decoders is expensive, both computationally and space-wise (e.g., in terms of computer memory). Accordingly, the present inventor recognized that an iterative, incremental decoding technique could produce optimal, or near optimal, results while considerably reducing the computational and space requirements. This decoder is referred to herein as a “greedy” decoder or, equivalently, as a “fast decoder.” The term “greedy” refers to techniques that produce solutions based on myopic optimization—that is, given a partial solution, produce as a next estimate a new solution that improves the objective the most. Put another way, a greedy algorithm typically starts out with an approximate solution and then tries to improve it incrementally until a satisfactory solution is reached.
Implementations of the greedy decoder may include various combinations of the following features.
In one aspect, machine translation (MT) decoding involves receiving as input a text segment (e.g., a clause, a sentence, a paragraph or a treatise) in a source language to be translated into a target language, generating an initial translation (e.g., either a word-for-word or phrase-for-phrase gloss) as a current target language translation, applying one or more modification operators to the current target language translation to generate one or more modified target language translations, determining whether one or more of the modified target language translations represents an improved translation in comparison with the current target language translation, setting a modified target language translation as the current target language translation, and repeating these steps until occurrence of a termination condition.
Applying one or more modification operators may involve changing the translation of one or two words in the current target language translation the translation. Alternatively, or in addition, applying one or more modification operators may include (i) changing a translation of a word in the current target language translation and concurrently (ii) inserting another word at a position that yields an alignment of highest probability between the source language text segment and the current target language translation. The inserted other word may have a high probability of having a zero-value fertility.
Applying one or more modification operators may include deleting from the current target language translation a word having a zero-value fertility; and/or modifying an alignment between the source language text segment and the current target language translation by swapping non-overlapping target language word segments in the current target language translation; and/or modifying an alignment between the source language text segment and the current target language translation by (i) eliminating a target language word from the current target language translation and (ii) linking words in the source language text segment.
In various embodiments, applying modification operators may include applying two or more of the following: (i) changing the translation of one or two words in the current target language translation; (ii) changing a translation of a word in the current target language translation and concurrently inserting another word at a position that yields an alignment of highest probability between the source language text segment and the current target language translation, the inserted other word having a high probability of having a zero-value fertility; (iii) deleting from the current target language translation a word having a zero-value fertility; (iv) modifying an alignment between the source language text segment and the current target language translation by swapping non-overlapping target language word segments in the current target language translation; and/or (v) modifying an alignment between the source language text segment and the current target language translation by eliminating a target language word from the current target language translation and linking words in the source language text segment.
Determining whether one or more of the modified target language translations represents an improved translation in comparison with the current target language translation may include calculating a probability of correctness for each of the modified target language translations.
The termination condition may include a determination that a probability of correctness of a modified target language translation is no greater than a probability of correctness of the current target language translation. The termination condition may be the occurrence of a completion of a predetermined number of iterations and/or the lapse of a predetermined amount of time.
In another aspect, a computer-implemented machine translation decoding method may, for example, implement a greedy decoding algorithm that iteratively modifies a target language translation of a source language text segment (e.g., a clause, a sentence, a paragraph, or a treatise) until an occurrence of a termination condition (e.g., completion of a predetermined number of iterations, lapse of a predetermined period of time, and/or a determination that a probability of correctness of a modified translation is no greater than a probability of correctness of a previous translation.)
The MT decoding method may start with an approximate target language translation and iteratively improve the translation with each successive iteration. The approximate target language translation may be, for example, a word-for-word or phrase-for-phrase gloss, or the approximate target language translation may be a predetermined translation selected from among a plurality of predetermined translations.
Iteratively modifying the translation may include incrementally improving the translation with each iteration, for example, by applying one or more modification operations on the translation.
The one or more modification operations comprises one or more of the following operations: (i) changing one or two words in the translation; (ii) changing a translation of a word and concurrently inserting another word at a position that yields an alignment of highest probability between the source language text segment and the translation, the inserted other word having a high probability of having a zero-value fertility; (iii) deleting from the translation a word having a zero-value fertility; (iv) modifying an alignment between the source language text segment and the translation by swapping non-overlapping target language word segments in the translation; and (v) modifying an alignment between the source language text segment and the translation by eliminating a target language word from the translation and linking words in the source language text segment.
In another aspect, a machine translation decoder may include a decoding engine comprising one or more modification operators to be applied to a current target language translation to generate one or more modified target language translations; and a process loop to iteratively modify the current target language translation using the one or more modification operators. The process loop may terminate upon occurrence of a termination condition. The process loop may control the decoding engine to incrementally improve the current target language translation with each iteration.
The MT decoder may further include a module (including, for example, a language model and a translation model) for determining a probability of correctness for a translation. The process loop may terminate upon a determination that a probability of correctness of a modified translation is no greater than a probability of correctness of a previous translation, and/or upon completion of a predetermined number of iterations; and/or after lapse of a predetermined period of time.
One or more of the following advantages may be provided by the greedy decoder as described herein. The techniques and methods described here may result in a MT decoder that performs with high accuracy, high speed and relatively low computational and space costs. The greedy decoder can be modified as desired to perform a full set of sentence modification operations or any subset thereof. This gives a system designer and/or end-user considerable flexibility to tailor the decoder's speed, accuracy and/or other performance characteristics to match desired objectives or constraints. The use of a set of basic modification operations, each able to be used as a standalone operator or in conjunction with the others, further enhances this flexibility. Moreover, the use of independent standalone operators as constituents of the decoding engine makes the decoder extensible and scalable. That is, different or additional modification operators can be used to suit the objectives or constraints of the system designer and/or end-user.
In conjunction with MT research and related areas in computational linguistics, researchers have developed and frequently use various types of tree structures to graphically represent the structure of a text segment (e.g., clause, sentence, paragraph or entire treatise). Two basic tree types include (1) the syntactic tree, which can be used to graphically represent the syntactic relations among components of a text segment, and (2) the rhetorical tree (equivalently, the rhetorical structure tree (RST) or the discourse tree), which can be used to graph the rhetorical relationships among components of a text segment. Rhetorical structure trees (also referred to as discourse trees) are discussed in detail in William C. Mann and Sandra A. Thompson, “Rhetorical structure theory: Toward a functional theory of text organization,” Text, 8(3):243-281 (1988).
The example shown in FIG. 6 illustrates the types of structures that may be present in a rhetorical structure tree for a text fragment. The leaves of the tree correspond to elementary discourse units (“edus”) and the internal nodes correspond to contiguous text spans. Each node in a rhetorical structure tree is characterized by a “status” (i.e., either “nucleus” or “satellite”) and a “rhetorical relation,” which is a relation that holds between two non-overlapping text spans. In FIG. 6, nuclei are represented by straight lines while satellites are represented by arcs.
The present inventor recognized that significant differences exist between the rhetorical structures of translations of a text in different languages (e.g., Japanese and English). Accordingly, to improve MT quality, and as a component of a larger MT system, the present inventor developed techniques for automatically rewriting (e.g., using a computer system) rhetorical structures from one language into another, for example, rewriting a rhetorical tree for a text segment in Japanese into a rhetorical tree for a counterpart text segment in English.
Implementations of the disclosed tree rewriting techniques may include various combinations of the following features.
In one aspect, automatically generating a tree (e.g., either a syntactic tree or a discourse tree) involves receiving as input a tree corresponding to a source language text segment, and applying one or more decision rules to the received input to generate a tree corresponding to a target language text segment.
In another aspect, a computer-implemented tree generation method may include receiving as input a tree corresponding to a source language text segment (e.g., clause, sentence, paragraph, or treatise), and applying one or more decision rules (e.g., a sequence of decision rules that collectively represent a transfer function) to the received input to generate a tree corresponding to a target language text segment, which potentially may be a different type of text segment.
The tree generation method further may include automatically determining the one or more decision rules based on a training set, for example, a plurality of input-output tree pairs and a mapping between each of the input-output tree pairs. The mapping between each of the input-output tree pairs may be a mapping between leaves of the input tree and leaves of the paired output tree. Mappings between leaves of input-output tree pairs can be one-to-one, one-to-many, many-to-one, or many-to-many.
Automatically determining the one or more decision rules may include determining a sequence of operations that generates an output tree when applied to the paired input tree. Determining a sequence of operations may include using a plurality of predefined operations that collectively are sufficient to render any input tree into the input tree's paired output tree. The plurality of predefined operations comprise one or more of the following: a shift operation that transfers an elementary discourse tree (edt) from an input list into a stack; a reduce operation that pops two edts from a top of the stack, combines the two popped edts into a new tree, and pushes the new tree on the top of the stack; a break operation that breaks an edt into a predetermined number of units; a create-next operation that creates a target language discourse constituent that has no correspondent in the source language tree; a fuse operation that fuses an edt at the top of the stack into the preceding edt; a swap operation that swaps positions of edts in the input list; and an assignType operation that assigns one or more of the following types to edts: Unit, MultiUnit, Sentence, Paragraph, MultiParagraph, and Text.
The plurality of predefined operations may represent a closed set that includes the shift operation, the reduce operation, the break operation, the create-next operation, the fuse operation, the swap operation and the assignType operation.
Determining a sequence of operations may result in a plurality of learning cases, one learning case for each input-output tree pair. In that case, the tree generation method may further include associating one or more features with each of the plurality of learning cases based on context. The associated features may include one or more of the following: operational and discourse features, correspondence-based features, and lexical features.
The tree generation method may further include applying a learning program (e.g., C4.5) to the plurality of learning cases to generate the one or more decision rules
In another aspect, a computer-implemented tree generation module may include a predetermined set of decision rules that, when applied to a tree (e.g., syntactic or discourse) corresponding to a source language text segment, generate a tree corresponding to a target language text segment. The predetermined set of decision rules may define a transfer function between source language trees and target language trees
In another aspect, determining a transfer function between trees (e.g., syntactic or discourse) of different types may include generating a training set comprising a plurality of tree pairs and a mapping between each tree pair, each tree pair comprises a source tree and a corresponding target tree, and generating a plurality of learning cases by determining, for each tree pair, a sequence of operations that result in the target tree when applied to the source tree; and generating a plurality of decision rules by applying a learning algorithm to the plurality of learning cases.
Determining a transfer function between trees of different types further may include, prior to generating the plurality of decision rules, associating one or more features with each of the learning cases based on context.
In another aspect, a computer-implemented discourse-based machine translation system may include a discourse parser that parses the discourse structure of a source language text segment and generates a source language discourse tree for the text segment; a discourse-structure transfer module that accepts the source language discourse tree as input and generates as output a target language discourse tree; and a mapping module that maps the target language discourse tree into a target text segment. The discourse-structure transfer module may include a plurality of decision rules generated from a training set of source language-target language tree pairs.
One or more of the following advantages may be provided by tree rewriting as described herein. The techniques and methods described here may result in a tree rewriting capability that allows users (e.g., human end-users such as linguistic researchers or computer processes such as MT systems) to automatically have a tree for a text segment in a source language rewritten, or translated, into a tree for the text segment translated into a target language. This functionality is useful both in its standalone form and as a component of a larger system, such as in a discourse-based machine translation system. Moreover, because the tree rewriter described here automatically learns how to rewrite trees from one language into another, the system is easy and convenient to use.
The mapping scheme used in training the tree rewriter also provides several advantages. For example, by allowing any arbitrary groupings (e.g., one-to-one, one-to-many, many-to-one, many-to-many) between leaves in the source and target trees, the flexibility, richness and robustness of the resulting mappings are enhanced.
The enhanced shift-reduce operations used in training the tree rewriter also provide several advantages. For example, the set of basic operations that collectively are sufficient to render any input tree into its paired output tree provides a powerful yet compact tool for rewriting tree structures.
The details of one or more embodiments are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the invention will be apparent from the description and drawings, and from the claims.