1. Field of the Invention
The present invention is generally directed to the field of machine translation.
2. Background
Translation of text from one human language into another is important in many commercial and governmental activities, as well as having personal applications. Translation of text by human translators is time-consuming and expensive. There is a substantial need for automated means of carrying out the translation function. Numerous approaches have been applied in software for automated machine translation. However, as will be described in more detail below, the quality of the output from contemporary machine translation systems is generally well short of desired performance.
Machine translation software converts text from one human language (the source-language) into another (the target-language). Despite 50 years of development, the capabilities of automated machine translation systems are still discouragingly limited, as discussed in Machine Translation: an Introductory Guide, NCC Blackwell, London, 1994, ISBN: 1855542-17x. Major approaches applied in machine translation are: (i) rule-based systems; (ii) example-based systems, and (iii) statistical machine translation.
Even for the simplest of language pairs (for example, English and Spanish), complex sentences and idiomatic expressions are often poorly handled. For more difficult language pairs (for example, English and Arabic), the meaning of sentences is often garbled. With the present state-of-the-art, the applicability of machine translation is limited.
A key problem in machine translation is the lack of fidelity with which translated text reflects the meaning and tone of source text. For example, machine translation systems have problems in several areas, including:
1. Word sense disambiguation. In human languages, many words have multiple meanings. For example, the English word “strike” has dozens of common meanings. Examples of poor machine translation typically involve an incorrect choice of word sense.
2. Idiomatic expressions. Better capabilities should be developed to deal with idiomatic expressions, such as “kicked the bucket” or “good as gold.”
3. Anaphora resolution. Machine translation systems have difficulties resolving ambiguous references.
4. Logical decomposition. Machine translation systems have difficulties decomposing long sentences into coherent textual elements, particularly for languages such as Arabic.
Therefore, what is needed is a system and method for improving the performance of machine translations. For example, the improvement should more effectively deal with word sense ambiguity, idiomatic expressions, anaphora resolution, and logical decomposition.