There has long been a desire to have machines capable of translating text from one language into text in another language. Such machines would make it much easier for humans who speak different languages to communicate with one another. The present invention uses statistical techniques to attack the problem of machine translation. Related statistical techniques have long been used in the field of automatic speech recognition. The background for the invention is thus in two different fields: automatic speech recognition and machine translation.
The central problem in speech recognition is to recover from an acoustic signal the word sequence which gave rise to it. Prior to 1970, most speech recognition systems were built around a set of hand-written rules of syntax, semantics and acoustic-phonetics. To construct such a system it is necessary to firstly discover a set of linguistic rules that can account for the vast complexity of language, and, secondly to construct a coherent framework in which these rules can be assembled to recognize speed. Both of these problems proved insurmountable. It proved too difficult to write down by hand a set of rules that adequately covered the vast scope of natural language and to construct by hand a set of weights, priorities and if-then statements that can regulate interactions among its many facets.
This impasse was overcome in the early 1970's with the introduction of statistical techniques to speech recognition. In the statistical approach, linguistic rules are extracted automatically using statistical techniques from large databases of speech and text. Different types of linguistic information are combined via the formal laws of probability theory. Today, almost all speech recognition systems are based on statistical techniques.
Speech recognition has benefited by using statistical language models which exploit the fact that not all word sequences occur naturally with equal probability. One simple model is the trigram model of English, in which it is assumed that the probability that a word will be spoken depends only on the previous two words that have been spoken. Although trigram models are simple-minded, they have proven extremely powerful in their ability to predict words as they occur in natural language, and in their ability to improve the performance of natural-language speech recognition. In recent years more sophisticated language models based on probabilistic decision-trees, stochastic context-free grammars and automatically discovered classes of words have also been used.
In the early days of speech recognition, acoustic models were created by linguistic experts, who expressed their knowledge of acoustic-phonetic rules in programs which analyzed an input speech signal and produced as output a sequence of phonemes. It was thought to be a simple matter to decode a word sequence from a sequence of phonemes. It turns out, however, to be a very difficult job to determine an accurate phoneme sequence from a speech signal. Although human experts certainly do exist, it has proven extremely difficult to formalize their knowledge. In the alternative statistical approach, statistical models, most predominantly hidden Markov models, capable of learning acoustic-phonetic knowledge from samples of speech are employed.
The present approaches to machine translation are similar in their reliance on hand-written rules to the approaches to speech recognition twenty years ago. Roughly speaking, the present approaches to machine translation can be classified into one of three categories: direct, interlingual, and transfer. In the direct approach a series of deterministic linguistic transformations is performed. These transformations directly convert a passage of source text into a passage of target text. In the transfer approach, translation is performed in three stages: analysis, transfer, and synthesis. The analysis stage produces a structural representation which captures various relationships between syntactic and semantic aspects of the source text. In the transfer stage, this structural representation of the source text is then transferred by a series of deterministic rules into a structural representation of the target text. Finally, in the synthesis stage, target text is synthesized from the structural representation of the target text. The interlingual approach to translation is similar to the transfer approach except that in the interlingual approach an internal structural representation that is language independent is used. Translation in the interlingual approach takes place in two stages, the first analyzes the source text into this language-independent interlingual representation, the second synthesizes the target text from the interlingual representation. All these approaches use hand-written deterministic rules.
Statistical techniques in speech recognition provide two advantages over the rule-based approach. First, they provide means for automatically extracting information from large bodies of acoustic and textural data, and second, they provide, via the formal rules of probability theory, a systematic way of combining information acquired from different sources. The problem of machine translation between natural languages is an entirely different problem than that of speech recognition. In particular, the main area of research in speech recognition, acoustic modeling, has no place in machine translation. Machine translation does face the difficult problem of coping with the complexities of natural language. It is natural to wonder whether this problem won't also yield to an attack by statistical methods, much as the problem of coping with the complexities of natural speech has been yielding to such an attack. Although the statistical models needed would be of a very different nature, the principles of acquiring rules automatically and combining them in a mathematically principled fashion might apply as well to machine translation as they have to speech recognition.