After several decades of development, the field of automatic (machine) translation of text from a source language to a target language with a minimum of human intervention has developed to a rudimentary level where machine translation systems with limited vocabularies or limited language environments can produce a basic level of acceptably translated text. Some current systems can produce translations for unconstrained input in a selected language pair, i.e., from a chosen source language to a chosen target language, that is perhaps 50% acceptable to a native writer in the target language (using an arbitrary scale measure). When the translation system is constrained to a particular vocabulary or syntax style of a limited area of application (referred to as a "sublanguage"), the results that can now be achieved may approach a 90% acceptable level to a native speaker. The wide difference in results is attributable to the difficulty of producing acceptable translations when the system must encompass a vast body of translation equivalents due to wide variability in vocabulary use, syntax, and expression, as compared to the limited vocabularies and translation equivalents of a chosen sublanguage.
One example of a machine translation system limited to a specific sublanguage application is the TAUM-METEO system developed by the University of Montreal for translating weather reports issued by the Canadian Environment Department from English into French. TAUM-METEO uses the transfer method of translation, which consists basically of the three steps of (1) analyzing the sequence and morphological forms of input words of the source language and determining their phrase and sentence structure, (2) transferring (directly translating) the input text into sentences of equivalent words of the target language using dictionary look-up and a developed set of transfer rules for word and/or phrase selections, then (3) synthesizing an acceptable output text in the target language using developed rules for target language syntax and grammar. TAUM-METEO was designed to operate for the English-French language pair in the narrow sublanguage of meteorology (1,500 dictionary entries, including several hundred place names; input texts containing no tensed verbs). It therefore can obtain high levels of translation accuracy of 80% to 90% by avoiding the need for any significant level of morphological analysis of input words, by analyzing input texts for domain-specific word markers which narrow the range of choices for output word selection and syntax structure, and by using ad hoc transfer rules for output word and phrase selections.
Another example of a sublanguage translation system is the METAL system developed by the Linguistics Research Center at the University of Texas at Austin for large-volume translations from German into English of texts in the field of telecommunications. The METAL system also uses the transfer method, but adds a fourth step called "integration" between the analysis and transfer steps. The integration step attempts to reduce the variability of output word selection and syntax by performing tests on the constituent words of the input text strings and constraining their application based upon developed grammar and phrase rules. Transfer dictionaries typically consist of the order of 10,000 word pairs. In terms of translation quality, the METAL system is reported to have achieved between 45% and 85% correct translations.
A competing strategy to the transfer approach is the "interlingua" approach which attempts to decompile input texts of a source language into an intermediate language which represents their "meaning" or symbolic content, and then convert the symbolically-represented structures into equivalent output sentences of a target language by using a knowledge base of contextual, lexical, syntactical, and grammatical rules. Historically, systems based on the transfer approach have the central problem of obtaining accurate word and phrase selections in the face of ambiguities presented by homonyms, polysemic phrases, and anaphoric references. The interlingua approach is favored because its representation of text meaning can, in theory, greatly reduce ambiguity in the analysis of input texts. Also, once the input text has been decompiled into a symbolically-represented structure, it can theoretically be translated into multiple target languages using the linguistic and semantic rules developed for each target language. In practice, however, the interlingua approach has proven difficult to implement because it requires the development of a universal symbolic language for representing "meaning" and comprehensive knowledge bases for making the conversions from source to intermediate then to target languages. Examples of interlingua systems include the Distributed Translation Language (DLT) undertaken in Utrecht, Netherlands, and the Knowledge-Based Machine Translation (KBMT) system of the Center for Machine Translation at Carnegie-Mellon University.
Other machine translation systems have been developed or are under development using modifications or hybrids of the transfer and interlingua approaches. For example, some systems use human pre-editing and/or post-editing to reduce text ambiguity and improve the correctness of word and phrase selections. Other systems attempt to combine a core transfer approach with knowledge bases and artificial intelligence techniques for machine editing and enhancement. Still another approach is to employ decompilation to a syntactically-based intermediate structure in combination with transfer to equivalent output phrases and sentences. For a more complete discussion of current developments in the field of machine translation, reference is made to Machine Translation, Theoretical and Methodological Issues, edited by Sergei Nirenberg, published by Cambridge University Press, 1987, and Proceedings of "The Third International Conference on Theoretical and Methodological Issues in Machine Translation of Natural Language", published by the Linguistics Research Center, University of Texas at Austin, June 1990.
It is expected that machine translation (MT) systems will develop in time to provide higher levels of translation accuracy and utility. However, current MT techniques using a basic transfer approach can produce acceptable translation accuracy in a selected sublanguage, yet they are not in widespread use. One reason for the lack of fruitful use of MT systems is that most current systems are designed as standalone systems which are fed source language input and provide target language output to a single user whose application bridges the source-target language pair. When a specific sublanguage or use environment is selected by the standalone user, the MT system becomes confined to the chosen sublanguage or use. This standalone approach greatly limits the range of applications and the audience of users which can be productively served by MT systems.