Machine translation systems require large amounts of parallel training data to achieve high levels of accuracy. Generally, it is difficult to obtain large amounts of parallel data for languages spoken by fewer people than languages spoken by many people. For example, a majority of text found on the internet is English, whereas the amount of text found in languages such as Japanese or Korean is lower. This makes obtaining parallel data for smaller languages challenging.
Traditional machine translation systems overcome this problem by bridging translations between smaller languages through a third language, namely translating a portion of text in a first language to a third language, then translating from the third language into a second language. Such a bridging process suffers from many problems, including propagation of errors, increased latency and increased system complexity.