Perfect and automatic translation between two natural languages, i.e. a source natural language and a target natural language, by a computer is highly desirable in today's global community and is the goal of many computational systems. Here natural language can be any language that is written (textual) or spoken by humans.
One of the main methods for producing automatic translation is the transfer-based method of Machine Translation. A transfer-based MT system typically takes a source text (the text in the original natural language, e.g. English), segments it into natural language segments (e.g. sentences or phrases) which we abbreviate as "segments", and performs source analysis, transfer, and target generation to arrive at the target text (the translated text).
Source analysis can be performed in any one or more well-known ways. Typically, source analysis is dependent on a syntactic theory of the structure of natural language. For example, in rule-based grammars there are rules for the natural language structure, and they are used by the source analysis to parse the given natural language text or input into one or more parse structures. For example, in the rule-based grammar system Slot Grammar, there are rules for filling and ordering so-called slots; slots are grammatical relations, e.g. subject, direct object, and indirect object. A further explanation of source analysis is given in McCord, M. C. "Slot Grammars," Computational Linguistics, vol. 6, pp. 31-43, 1980 and McCord, M. C. "Slot Grammar: A System for Simpler Construction of Practical Natural Language Grammars," in R. Studer (Ed.), Natural Language and Logic: International Scientific Symposium, Lecture Notes in Computer Science, Springer Verlag, Berlin, pp. 118-145, 1990, which are herein incorporated by reference in their entirety.
The source analysis produces a parse structure that is a formal representation of one of the source segments. The parse structure includes elements like word senses (e.g. choice between homonyms), morphological features (such as parts of speech), surface syntactic structure, and deep syntactic structure, and relates these elements to one another according to the rules of the grammar (e.g. syntactic and semantic relationships) used to parse the given natural language input. Parse structures such as those of Slot Grammar may also include information on such things as punctuation (e.g. occurrences of commas and periods), and formatting tags (e.g. SGML tags).
The transfer step typically transfers the source elements from the source natural language to target elements in the target natural language, producing an initial transfer structure. The transfer step then iteratively performs structural transformations, starting with the initial transfer structure, until the desired syntactic structure for the target language is obtained, thus producing the target structure. A further explanation of transfer is given in M. C. McCord, "Design of LMT: A Prolog-based Machine Translation System", Computational Linguistics, vol. 15, pp. 33-52, which is herein incorporated by reference in its entirety.
The target generation step typically inflects each word sense in the target structure, taking into account the inflectional features marked on each word, and then outputs the resulting structure as a natural language sentence in the target language. A further explanation of target generation is given in M. C. McCord and S. Wolff, "The Lexicon and Morphology for LMT, a Prolog-based MT system," IBM Research Report RC 13403, 1988, and G. Arrarte, I. Zapata, and M. C. McCord, "Spanish Generation Morphology for an English-Spanish Machine Translation System," IBM Research Report RC 17058, 1991, which are herein incorporated by reference in their entirety.
LMT is an example of a transfer-based MT (machine translation) system, and it uses steps like those outlined above to translate a natural language text. The McCord reference ("Prolog-based Machine Translation") gives an overview of these steps for translating a sentence from English to German.
In the preceding reference, the example sentence is: The woman gives a book to the man. The source parse structure shows how the various parts of the sentence fit together: The head of the sentence is the verb gives, which has the morphological features third person, singular, present, and indicative. The verb gives has three slots, subject, which is filled by the word sense woman, object, which is filled by the word sense book, and prepositional object, which is filled by the word sense man.
Next, the initial transfer structure shows the structure right after lexical transfer. Each word sense in the source parse structure has been transferred to the corresponding German word sense, e.g. the English woman has been transferred to German frau. In addition, the correct transfer features have been marked on each word, e.g. the subject is marked nominative, and the object is marked accusative. The order of the words in the initial transfer structure is the same as in the source parse structure.
Then a transformation applies to the initial transfer structure to produce the target language structure that represents the correct word order for German. The transformation moves the indirect object noun phrase the man from its position after the object, the book, to a position before the object, thus producing a target language structure with word order like that in The woman gives the man a book.
Finally, each word sense in the tree is inflected as required by its features, and the result of the translation output as a string with appropriate capitalization and punctuation: Die Frau gibt dem Mann ein Buch.
A further explanation of LMT is given in M. C. McCord, "LMT", Proceedings of MT Summit II, pp. 94-99, Deutsche Gesellschaft fur Dokumentation, Frankfurt, and in H. Lehmann (1995), "Machine Translation for Home and Business Users", Proceedings of MT Summit V, Luxembourg, July 10-13, which are herein incorporated by reference in their entirety.