Some machine translation systems utilize in-domain language and translation models and out-of-domain language and translation models. In order to build these models, it is typically necessary to have in-domain training data and out-of-domain training data. The determination as to whether available training data is in-domain or out-of-domain is commonly done manually (e.g. by obtaining information regarding the provenance of the in-domain training data) or by experimentation.
It is not unusual, however, for the provenance of the training data to be unknown or unreliable. It is also not unusual for in-domain training data to include out-of-domain training data. As a result, it is not always possible to make an accurate division between in-domain training data and out-of-domain training data. Consequently, the quality of machine translations performed by machine translation systems trained with such data can be negatively impacted.
Additionally, it has also been common practice to train machine translation systems to translate input segments in a particular domain. As a result, in order to translate input segments in many different domains, it has been necessary to build independent machine translation systems, one for each domain to be translated from a source language to a target language. This results in a significant inefficiency in the optimization of the computing resources utilized to implement such machine translation systems.
The disclosure made herein is presented with respect to these and other considerations.