After several decades of development, the field of automatic (machine) translation of text from a source language to a target language with a minimum of human intervention has developed to a rudimentary level where machine translation systems with limited vocabularies or limited language environments can produce a basic level of acceptably translated text. Some current systems can produce translations for unconstrained input in a selected language pair, i.e., from a chosen source language to a chosen target language, that is perhaps 50% acceptable to a native writer in the target language (using an arbitrary scale measure). When the translation system is constrained to a particular vocabulary or syntax style of a limited area of application (referred to as a "sublanguage"), the results that can now be achieved may approach a level 90% acceptable to a native writer. The wide difference in results is attributable to the difficulty of producing accurate translation when the system must encompass a wide variability in vocabulary use, syntax, and expression, as compared to the limited vocabularies and translation equivalents of a chosen sublanguage.
One example of a machine translation system limited to a specific sublanguage application is the TAUM-METEO system developed by the University of Montreal for translating weather reports issued by the Canadian Environment Department from English into French. TAUM-METEO uses the transfer method of translation, which consists basically of the three steps of: (1) analyzing the sequence and morphological forms of input words of the source language and determining their phrase and sentence structure, (2) transferring (directly translating) the input text into sentences of equivalent words of the target language using dictionary look-up and a developed set of transfer rules for word and/or phrase selections; then (3) synthesizing an acceptable output text in the target language using developed rules for target language syntax and grammar. TAUM-METEO was designed to operate for English-to-French translation in the narrow sublanguage of meteorology (1,500 dictionary entries, with several hundred place names; text having no tensed verbs). It can obtain high levels of translation accuracy of 80% to 90% by avoiding the need for any significant level of morphological analysis of input words, by analyzing input texts for domain-specific word markers which narrow the range of choices for output word selection and syntax structure, and by using ad hoc transfer rules for output word and phrase selections.
Another example of a sublanguage translation system is the METAL system developed by the Linguistics Research Center at the University of Texas at Austin for large-volume translations from German into English of texts in the field of telecommunications. The METAL system also uses the transfer method, but adds a fourth step called "integration" between the analysis and transfer steps. The integration step attempts to reduce the variability of output word selection and syntax by performing tests on the constituent words of the input text strings and constraining their application based upon developed grammar and phrase structure rules. Transfer dictionaries typically consist of roughly 10,000 word pairs. In terms of translation quality, the METAL system is reported to have achieved between 45% and 85% correct translations.
A strategy competing with the transfer approach is the "interlingua" approach which attempts to decompile input texts of a source language into an intermediate language which represents their "meaning" or semantic content, and then convert the semantic structures into equivalent output sentences of a target language by using a knowledge base of contextual, lexical, and syntactic rules. Historically, transfer systems lacking a comprehensive knowledge base and limited to translation of sentences in isolation have had the central problem of obtaining accurate word and phrase selections in the face of ambiguities presented by homonyms, polysemic phrases, and anaphoric references. The interlingua approach is favored because its representation of text meaning within a context larger than single sentences can, in theory, greatly reduce ambiguity in the analysis of input texts. Also, once the input text has been decompiled into a semantic structure, it can theoretically be translated into multiple target languages using the linguistic and semantic rules developed for each target language. In practice, however, the interlingua approach has proven difficult to implement because it requires the development of a universal symbolic language for representing "meaning" and comprehensive knowledge bases for making the conversions from source to intermediate and then to target languages. Examples of interlingua systems include the Distributed Translation Language (DLT) undertaken in Utrecht, the Netherlands, and the Knowledge-Based Machine Translation (KBMT) system of the Center for Machine Translation at Carnegie-Mellon University.
Other machine translation systems have been developed or are under development using modifications or hybrids of the transfer and interlingua approaches. For example, some systems use human pre-editing and/or post-editing to reduce text ambiguity and improve the correctness of word and phrase selections. Other systems attempt to combine a basic transfer approach with knowledge bases and artificial intelligence techniques for machine editing and enhancement. Another approach is to combine decompilation to a syntactically-based intermediate structure with transfer to equivalent output phrases and sentences. For a further discussion of current developments in machine translation, reference is made to Machine Translation, Theoretical and Methodological Issues, edited by Sergei Nirenberg, published by Cambridge University Press, 1987, and "Proceedings of The Third International Conference on Theoretical and Methodological Issues in Machine Translation of Natural Language", published by Linguistics Research Center, University of Texas at Austin, Jun. 1990.
It is expected that machine translation (MT) systems will develop in time to provide higher levels of translation accuracy and utility. However, current MT techniques using a basic transfer approach can produce acceptable translation accuracy in a selected sublanguage, yet they are not in widespread use. One reason for the limited use of MT systems is that most current systems are designed for a single, specific application, environment and language pair context. The requirements of that context motivate the design and development of the grammar, dictionary structure, and parsing algorithms. Thus, the utility of the system becomes confined to that particular context. This approach greatly limits the range of applications and the audience of users which can be productively served by such application- and language-specific MT systems.