There are lots of machine translation (MT) systems currently. Some of them are Online MT systems, like Google MT, Baidu MT, Youdao MT, Systran, etc. Although the translation quality of these MT systems is not as good as expected, it is helpful for common translation requirements.
Inventors of the invention have found that, when we use an MT system to translate specialized documents, it is hard to get better translation results. The reason is that the existing statistical MT systems are all obtained based on a training corpus, and it is impossible to collect enough training corpus covering all domains or all possible sentences of human expression. So the existing statistical MT systems encounter some problems of better translation for in-domain and worse translation for out-of-domain. As to test set of in-domain, some fragments in training corpus can more or less hit fragments in test set or even hit a whole sentence. As to out-of-domain test set, almost no fragment can hit each other between training corpus and test set. This will produce a large number of OOVs (out of vocabulary) in decoding process. Based on this, the translation result of out-of-domain is very bad. Specialized documents generally belong to out-of-domain.