An alternative to human translation of expression from one language into another, machine translation has been applied to increase translation throughput. In this regard, computing power has advanced exponentially over the past three decades to the point where intelligence analysts and linguists now can use powerful tools to assist them in processing large volumes of disparate data from multiple sources in real time. Machine translation (MT) is a software-based technology that can aid linguists and analysts in processing the volumes of incoming information whether from print, electronic, audio and video sources. There are two distinct methods for machine translation, rule-based and statistical. Each one has its advantages, but no product exists that combines the best of both worlds for a hybrid machine translation solution.
In this application, we describe a hybrid machine translation system, based on a statistical transfer approach using statistical and linguistic features and highlight the system's capabilities on applications of o machine translation in different tasks:
(a) Translation of one language into another for very large vocabulary broadcast, newswire and web texts. Their input is either captured from the Internet or is recorded form a satellite feed and recognized using a speech recognition system; and
(b) Translation of one Language into another for medium to large vocabulary speech-to-speech translation. The input is recorded through a telephone channel and recognized using an automatic speech recognition system.
The recognized utterances and the text captured from the internet are normalized, using statistical machine translation that is based on finite state automata. The output of this interlingua is then translated by a hybrid machine translation system, combining statistical and rule-based features. This application introduces also a hybrid interlingua approach that gives better results for dialect speech input compared to a direct machine translation system based on a statistical approach and a direct machine translation based on a rule-based approach.
Applying Machine Translation
The current process for handling information is largely a manual one. Intelligence operations are highly reliant on the skills of the people performing foreign language translations and on those analyzing and interpreting the data while the volume of data grows exponentially and the pool of qualified people continues to shrink. Machine translation tools exist to assist linguists and analysts in doing their job.
What is Machine Translation?
Machine translation (MT) involves the use of computer software to translate one natural human language into another. MT takes into account the grammatical structure of each language, and uses contextual rules to select among multiple meanings, in order to transfer sentences from the source language (to be translated) into the target language (translated).
MT refers to the process of translating a variety of media (speech, text, audio/video, web pages, etc.) from one language to another using computers and software. MT is designed to support and assist intelligence analysts and linguists with their human translation tasks.
Translation, in its simplest definition, involves: decoding the meaning of the source text; and re-encoding this meaning in the target language. A translator decodes the meaning of the source text in its entirety. This means that the translator must interpret and analyze all the features of the text by applying in-depth knowledge of the grammar, semantics, syntax, idioms, and the like of the source language, as well as the culture of its speakers. At the same time, the translator needs an equivalent in-depth knowledge to re-encode the meaning in the target language.
Foreign language translation can be difficult even for a skilled linguist. Performing the same translations using machine translation increases the accuracy and speed of translating text and identifying key points of interest. The question is: How do you program a computer to “understand” a text just as a person does, and also to “create” a new text in the target language that “sounds” as if it has been written by a person? Machine translation software is designed to address this problem through two main approaches: a rules-based approach, and a statistical approach.
Rule-Based Machine Translation
A rules-based approach is a method based on linguistic rules meaning that words will be translated in a linguistic way, that is, the most suitable (orally speaking) words of the target language will replace the ones in the source language.
Generally, rule-based methods parse a text, usually creating an intermediary, symbolic representation, from which the text in the target language is generated. These methods require extensive lexicons with morphological, syntactic, and semantic information, and large sets of rules.
With sufficient data, MT programs often work well enough for a native speaker of one language to get the approximate meaning of what is written by the other native speaker. The difficulty is getting enough data of the right kind to support the particular method.
Rule-based translation approaches have the advantage of a high abstraction level, and allow understandable translations for a high coverage, i.e., the “informativeness” (the accurate translation of information) of the translation is higher for a higher coverage of domains and types of texts.
A prime motivation for creating a hybrid machine translation system is to take advantage of the strengths of both rule-based and statistical approaches, while mitigating their weaknesses. Thus, for example, a rule that covers a rare word combination or construction should take precedence over statistics that were derived from sparse data (and therefore is not very reliable). Additionally, rules covering long-distance dependencies and embedded structures should be weighted favorably, since these constructions are more difficult to process in statistical machine translation.
Statistical Machine Translation
Statistical machine translation tries to generate translations using statistical methods based on a large body of bilingual text. Such an example is the Canadian Hansard corpus, the English-French record of the Canadian parliament. Ambiguity of some words can change the meaning and subsequent translation. Today, both “shallow” and “deep” approaches are used to overcome this problem. Shallow approaches assume no knowledge of the text; they simply apply statistical methods to the words surrounding the ambiguous word. Deep approaches presume a comprehensive knowledge of the word. Thus, a statistical approach should take precedence in situations where large numbers of relevant dependencies are available, novel input is encountered or high-frequency word combinations occur.
Today, no single system provides a “fully-automatic, high-quality machine translation.” Rule based machine translation applications have come the closest so far, however, there are some advantages of statistical machine translation that are not fully realized in pure rule based machine translation systems.
Accordingly, there remains a need for an optimal machine translation system. There is a further need for systems to perform translation using the best available models for translation of given source and target languages.